The term “Modern Data Stack” is gathering steam and being used in a few different contexts. However, when people talk about it, they generally refer to the Modern Data Stack for Analytics that comprises tools that enable the following:
- Data Ingestion: extracting data from databases and third-party tools using an ELT solution
- Data Warehousing: storing this data in a cloud data warehouse
- Data Transformation: building data models and preparing the data for analysis
- Business Intelligence: building reports to analyze the data in a BI tool
However, this stack only partially serves the needs of Product and Growth teams as it ignores the following:
- Collecting event data from primary or first-party data sources (website and apps)
- Storing this data in a cloud data warehouse
- Analyzing this data to understand user behavior and identify points of friction
- Activating the data in downstream tools to build personalized customer experiences
Therefore, as an extension of the modern data stack for analytics, a Modern Data Stack for Growth includes tools that enable teams to do the above.
The goal behind this guide is to offer a thorough breakdown of the data stack that empowers go-to-marketing teams to focus on their core KPIs and not worry about missing data, or bad data, or simply a lack of data.
Let’s jump in!
Collecting Event Data
Data collection tools, also referred to as tracking tools, fall under two main categories — Customer Data Platform or CDP and Customer Data Infrastructure or CDI.
There is some confusion between CDI and CDP primarily because a CDP is essentially a CDI that also has identity resolution and audience-building capabilities.
Additionally, CDPs and some CDIs solutions offer the ability to ingest data from third-party tools into a data warehouse. However, this is not their core offering, and ELT tools like Fivetran, Stitch, Xplenty, Airbyte, and Meltano are purpose-built for this.
Customer Data Platform
While CDPs offer data collection capabilities, a CDP’s core capability is identity resolution which makes it possible to create user segments or audiences by combining customer data from multiple sources and syncing those segments to external tools.
The most popular industry-agnostic CDPs are as follows:
Segment offers a range of products and makes it easy to understand the differences between CDI and CDP. Connections, Segment’s core product, is essentially a CDI solution whereas Personas is Segment’s CDP offering that is sold as an add-on to their core product.
Segment Connections has long been the go-to tracking solution for companies of all sizes. so even if you don’t need CDP capabilities, Segment Connections can take care of your tracking needs.
Segment also offers a data governance tool called Protocols which is also sold as an add-on.
mParticle is one of the most popular horizontal CDPs on the market with its core offering being a CDI solution to collect data and sync it to third-party tools and data warehouses. Its CDP capabilities — audience building and identity resolution — are available as add-ons along with data governance tools.
mParticle primarily caters to B2C brands and is used by some of the largest tech companies today.
Customer Data Infrastructure
The core premise of a CDI is to track customer data from first-party data sources — your website and apps — and sync the data to a data warehouse as well as to third-party tools.
Let’s look at popular CDI solutions worth exploring:
Avo enables you to create a smart tracking plan using an easy-to-use interface and then generates code for your developers to implement tracking based on the events and properties defined. This gives product managers more control over the taxonomy and eliminates the scope for errors during implementation.
Iteratively is an alternative to Avo and they both have a similar approach to tracking. They both offer limited integrations but integrate with Segment as a destination, allowing you to further send data to all the destinations supported by Segment.
Iteratively and Avo can replace Segment Connections but due to their limited destinations library, it makes sense to use either of them in conjunction with Connections. Avo and Iteratively also support custom destinations, allowing you to integrate with internal tools as well.
As mentioned above, Connections is Segment’s CDI offering and is available as a standalone product, allowing you to use it even if you don’t need a full-blown CDP (Segment Personas).
As mentioned above, you can use mParticle’s CDI offering even if you don’t need CDP capabilities.
Rudderstack has a bunch of capabilities but its core product is exactly like Segment Connections. In addition to its cloud offering, RudderStack is available as an open-source product, allowing you to deploy it on your own servers.
MetaRouter is a relatively new solution that is completely focused on server-side tracking of data. It only offers private cloud instances and on-premise installations both of which are ideal for larger companies with stricter norms for data privacy, security, and compliance.
Snowplow calls itself a behavioral data collection platform that is open source and also offers a cloud version. Snowplow’s approach is different in that it only syncs data to data warehouses and doesn’t support any other cloud destinations.
Also, implementing Snowplow requires expertise in its proprietary technology and is therefore only suitable if your company has a dedicated data team.
Freshpaint offers a hybrid solution wherein besides tracking data via code (which is always recommended), you can set up auto-tracking that gathers data without code (also known as implicit tracking). The hybrid approach brings more flexibility to teams with limited engineering resources.
Data Collection is the first step towards adopting a modern data stack for growth and getting it right is critical as everything else hereon hinges on having access to clean, accurate customer data.
And with so many different tools and technologies to collect data, it is reasonable to spend some time or even get expert guidance to figure out the most appropriate data collection technology based on your needs and resources, and of course, to choose the right tools.
Warehousing Event Data
A Data Warehouse is essentially a database designed for the purpose of analytics and is used to store data from all possible data sources — first-party apps, third-party tools, as well as production databases.
Irrespective of the technology you choose to collect event data, it is essential to store a copy of your data in a data warehouse such as Snowflake, Google BigQuery, or AWS Redshift — the three frontrunners of cloud data warehousing.
Besides owning your data, having access to raw data in a warehouse allows you to do all kinds of things with it — from transforming and combining event data with data from other sources to enriching the data before analyzing it or activating it. 👍
And when you’re ready to build a data team and hire data scientists to extract the juice from years worth of data and build predictive models to reduce churn or build a recommendation engine, you will be glad that your data is housed in your warehouse rather than being locked inside a third-party vendor’s warehouse.
Storing raw data in a data warehouse should be non-negotiable, especially when data warehouses have become so affordable and can be spun up in a matter of hours without deep technical know-how.
Once data has been collected and stored, it is time to analyze it and activate it and if executed well, these two stages of the data pipeline can make a big difference in your growth trajectory.
Analyzing Event Data
A Product Analytics (PA) tool is a purpose-built analysis tool meant to analyze event data and understand how users interact with your product.
Customer Analytics and Behavioral Analytics are also terms to describe Product Analytics tools and it makes sense because these tools allow you to analyze user behavior by visualizing how one moves through the customer journey.
It is evident but it’s worth reiterating that the data powering Product Analytics tools comes in the form of event data from your website and apps. You can also send event data from external apps but you need to be very intentional about it as this can lead to a PA tool being bloated with too many events which can result in data quality issues.
Product Analytics tools primarily cater to Product and Growth teams where:
- Product teams are able to improve product development by prioritizing features that are used and killing the ones that are not
- And Growth teams are able to improve onboarding, activation, and retention by visualizing the user journey and identifying points of friction
Amplitude, Heap, and Mixpanel are leaders amongst PA tools whereas Indicative, PostHog, and Rakam are gaining steam. They all serve the same purpose and all of them integrate with popular tracking solutions (like Segment and RudderStack).
PostHog is an open-source alternative to Heap as they both offer codeless event tracking, also known as implicit tracking or autocapture.
Indicative and Rakam, on the other hand, are warehouse-centric with support for popular data warehouses (like Snowflake, Google BigQuery, and AWS Redshift) as well as Snowplow as data sources.
With the rapid adoption of the data warehouse, more and more companies are experiencing the benefits of making the data warehouse the source of data for Product Analytics tools.
If you’re already storing data in a data warehouse (you must do it if you aren’t), then it just makes sense to sync clean, modelled data from the warehouse to your Product Analytics tool.
Another noteworthy benefit of this approach is that companies using a BI tool like Looker or Mode alongside a Product Analytics tool will not experience any data inconsistency between the two since the same data from the warehouse will be synced to both tools. 👍
Activating Event Data
Data Activation is the process of personalizing the customer experience using accurate data in the tools used to acquire, engage, and support customers.
In other words, Data Activation takes place when customer interactions are powered by data, resulting in customer experiences that are timely and relevant.
Analyzing data and gathering insights is only good if you can take action on those insights by activating the data in the tools that you use to engage with prospects and customers. By activating data efficiently, you can go beyond looking at dashboards and can utilize data in a meaningful manner.
Providing superior customer experiences rely on every interaction to be powered by data — from outbound emails and support conversations to social media ads and in-app experiences. 👍
However, if accurate data is not made available in these tools, there’s really not much you can do besides building linear experiences where every customer experiences the same messages, emails, and ads, irrespective of the actions they’ve taken inside your product or the interactions they’ve had with your brand. 👎
To make data actionable, customer data needs to be made available in downstream tools used for sales, marketing, advertising, and support where data is eventually activated. And there are a few different tools and technologies that you can adopt to do this:
- CDI tools to sync raw data to downstream tools
- CDPs to combine data from multiple sources to build user segments and sync them to downstream tools
- Reverse ETL (also referred to as Operational Analytics) to sync modelled data from a data warehouse to downstream tools
- Destination Integrations offered by Product Analytics tools to sync user cohorts to downstream tools
All of the above are means to the same end — enabling all your teams to build personalized, data-led experiences in the tools they use to engage with prospects and customers.
CDI solutions that support third-party tools as destinations can be used to sync raw data to those tools, enabling you to activate the data in those downstream tools. It’s important to keep in mind that you can only sync raw data and depending on the CDI solution, there might be little to no transformation and modeling capabilities.
CDPs go a step further, allowing you to combine data from multiple sources (assuming that you are ingesting data from sources other than your website and apps), build audiences or user segments, and sync those audiences to downstream tools supported by the CDP.
It’s worth mentioning that while CDPs have largely been focused on collecting and moving data, some CDPs also offer capabilities to activate data and orchestrate campaigns across multiple channels.
Exponea is one such CDP that has inbuilt functionality to build personalized, data-led experiences. And after its acquisition by Twilio, Segment is also moving in this direction and already integrates deeply with Twilio for SMS and Twilio-owned SendGrid for emails.
The rapid adoption of cloud data warehouses has given rise to Reverse ETL — a new paradigm in data integration that enables activating data that is already stored in the data warehouse by syncing it to third-party tools. This process and the underlying technology that makes it possible is also referred to as Operational Analytics.
Companies with dedicated data teams are investing heavily into consolidating customer data from various sources in the data warehouse. Reverse ETL tools like Census, Grouparoo, Hightouch, and Omnata enable data teams to build data models on top of the data stored in the warehouse (or bring their existing models by integrating tools like dbt) and sync those models to downstream tools.
While some Reverse ETL tools offer a visual interface to query data and build audiences, the primary method to do so is via SQL.
Companies that already have a data warehouse are discovering the benefits of Reverse ETL by syncing modelled data from the warehouse to downstream tools instead of syncing raw data using CDI or CDP solutions which offer very limited data transformation capabilities. 👍
However, implementing and maintaining a Reverse ETL tool requires an in-house data team or at least a dedicated data engineer — this is often not feasible for early-stage startups or even mid-size companies. 👎
Destinations Integrations by Product Analytics
Popular Product Analytics tools like Amplitude, Heap, and Mixpanel support a range of tools as data destinations, allowing you to build and sync user cohorts to downstream tools where data is activated. These integrations are evolving constantly with different tools offering varying capabilities.
All the different ways of moving data for analysis and activation purposes
It’s interesting to note that Heap and Amplitude have spun out their integration capabilities into separate products — Heap Act and Amplitude Recommend. These are essentially native integrations with third-party tools where data is eventually activated. And in fact, there are some benefits of this approach that are worth mentioning:
- Instead of building cohorts in a Product Analytics tool to analyze the data and then building the same audiences in the CDP/CDI/Reverse ETL tool, it can all be done in one integrated system
- Two-way integrations with engagement tools allow you to analyze your campaign metrics directly inside the Product Analytics tool, enabling you to measure the true impact of your engagement campaigns
Companies with limited data engineering resources can definitely derive a lot of value from such integrated solutions by enabling their Product and Growth teams to move fast without relying on data teams. 👍
Moreover, larger companies can also benefit from this approach due to the limited availability of data engineering resources and a massive backlog of data requests from various teams.
Maintaining a Modern Data Stack for Growth is not necessarily the responsibility of a growth team but knowing how data is collected and stored makes it a lot easier for growth professionals to derive insights from the data and act upon those insights effectively.
Moreover, while growth and product people are the ones who use a bulk of the tools in this stack, the benefits of a modern data stack can be experienced across an organization — from demand generation and sales development to support and customer success.
Giving teams access to the tools that enable them to be more productive also allows them to upskill themselves and do more meaningful work — the two most important components that help attract and retain the best talent. 👍
Have thoughts you’d like to share? Join the conversation on LinkedIn.
The original guide was updated after the mParticle team clarified that they don’t only offer a bundled solution and that one can only opt for their CDI solution too. My goal is to provide accurate and unbiased content and I sincerely regret the oversight.