DLA 2.0 is live! 

What is Customer Data?

Table of Contents

This is part 1 of the 5-part series on Customer Data. Here are parts 2, 3, 4, and 5.


A typical conversation about data invariably brings up the privacy practices of big tech. Too much data is being gathered by FAGMA — Facebook, Apple, Google, Microsoft and Amazon — and the growing concerns over the opaqueness of their data policies have given birth to stringent privacy laws such as the EU’s GDPR and California’s CCPA.

These laws are making companies more accountable, forcing them to take a hard look at their data collection practices. As a result, a snowball effect is taking place — companies are embracing transparency and creativity while trying to stay compliant, and the awareness about data is increasing amongst individuals. 

Data is still at the heart of business strategy, enabling personalization and automation at scale.

The data being alluded to here is essentially what is referred to as customer data which can be broken into two main types as follows: 

  • User data: It provides context on the user and their traits, and is referred to as entity data
  • Interaction data: It provides context on how the user interacts with a product and is referred to as event data

There is, of course, a lot more to customer data, especially third-party customer data gathered when users interact with your brand outside of your core product experience — via external media (search, social, etc.), communication channels (email, SMS, chat, etc.), or even offline.

However, this guide focuses on first-party customer data that comprises entity data and event data.

Entity Data

Entity Data includes personally identifiable information (PII) such as name, email, phone number, as well as other details such as age, country, and preferences. 

It is often referred to as user data since a user is the main entity or object. It comprises user properties or user attributes, each of which stores information or traits about a user. 

Entity data is stored in tables wherein columns represent user properties like name and email, while each row represents an entity or user. One of the properties acts as an identifier and has to contain a unique value for each user.  

In the table above, email can act as an identifier by ensuring no two users have the same email. However, it is a better practice to assign a unique ID to each user since an email address can change but the user_id remains fixed. 

Accounts or Groups as Entities

It is good to keep in mind that besides referring to users, entities can also refer to accounts. In most B2B products, accounts are known as organizations or workspaces.

From a hierarchical point of view, accounts are groups that comprise users. Thus, the data about an account or group comprises group properties. Defining group properties is essential to store information about an account such as the subscription type or the number of users.

For instance, if accounts are known as organizations, the associated properties are referred to as organization properties.

It is not uncommon to gather data about both users and groups at the same time. This is particularly true for B2B SaaS products where a user is part of an account or organization with multiple users.

Gathering Entity Data

Entity data, wherein a user is an entity or object, is shared by users directly or indirectly. 

Users share data directly when they fill up a registration form or a survey. When interacting with conversational interfaces like chatbots and voice bots, users share data via text and voice messages respectively.   

On the other hand, users share data indirectly when they use a product. For instance, by listening to music on Spotify, a user shares data about their music preferences such as genres or artists they like. Or when a user creates reports on a tool like Mixpanel, they share data about the type of reports they find most useful. 

Since Mixpanel is a B2B SaaS tool where multiple users are part of an organization, the number of reports created under an organization is data associated with an organization and not a particular user. 

Hence, in this case, an organization is also an entity, number of reports is an organization property, and the value of this property changes when a user creates or deletes a report.

It’s important to not confuse entity data that changes as a result of product usage (number of reports) with event data that is generated when a user interacts with a product. 

Event Data

An event refers to a unique action performed by a user while interacting with a product, and the data generated in the process is called event data or interaction data. 

Clicks and hovers on the web, taps and swipes on mobile, and text or voice commands on chat and voice interfaces — all such interactions are actions performed by a user or events that take place inside an app. 

Event data enables you to understand user behaviour and is therefore sometimes referred to as behavioural data. Additionally, it enables behavior-based contextual messaging (in-app or email) via campaigns triggered by those events.

Event data comprises three key elements: 

  • The action or the event that took place
  • The timestamp or the precise date and time when the event took place
  • The state or all other properties associated with the event (known as event properties)

Add to Cart, Buy Now, and Complete Payment are all actions or events. The exact moment when an event takes place is recorded as a timestamp.

The properties that provide more context about the event Add to Cart could be user_id, product_id, price, and quantity — all of which provide information related to the event or the state of the event. 

Gathering Event Data

Gathering event data requires you to decide which events to track, and then get your engineering team to implement event tracking using one of the following: 

There are other event-streaming technologies such as Apache Kafka (developed by LinkedIn and open-sourced) and Pub/Sub by Google Cloud that are suitable for companies generating a massive amount of data that needs to be tracked and processed in real-time.

What to Track vs How to Track

While it is good to know about the event-tracking process, as a data-led professional, you should focus on what to track rather than how to do it.

Why so? 

You might argue that defining what data is to be tracked and the tracking process itself are equally important and I don’t disagree. However, these two activities should ideally be owned by different folks and maybe even different teams. 

To Engineer or Not to Engineer

Typically, a data engineer takes care of implementing the tracking and collaborates with the data team (if there is one) to decide which tools and technologies to use. The company stage, the scope of work and rework, available resources, priorities and several factors influence this decision.

Most companies leave the entire tracking process to engineering, keeping marketing and product folks out of the loop.  Doing so invariably results in data redundancy and inconsistency.

Deciding what to track is simply not an engineer’s job and expecting the engineering team to know how other teams wish to utilize the data is just naive. 

The Rise of the Tracking Manager

Ideally,  the ownership of what to track should be on an individual who acts as an intermediary between engineering and all other teams that utilize this data — marketing, product, customer success and so on.

For the lack of a better name, this person is referred to as the Tracking Manager and is responsible for the end to end process of gathering requirements from different teams, defining the events (and their associated properties), and getting them implemented by the data engineering team. 

As a data-led professional, your job is to make the tracking manager’s life easier by clearly defining the events you wish to track and describing how you plan to utilize them. Doing so ensures that you get what you need where you need it.

Next Steps

Now that you know what Customer Data is and what your role is in the process of gathering Customer Data, the next step is to be able to answer the following questions:

Lucky for you, each question above links to a guide in the 5-part series on Customer Data. 

Once you have answers to the above, you will be equipped to gain a clear understanding of how to create a data tracking plan and will be able to do the following with confidence:

  • Lead the implementation of event-driven analytics and engagement tools with confidence
  • Gather clean and consistent customer data and overcome challenges that crop up along the way
  • Ask the right questions of your data in order to better understand user behaviour
  • Identify opportunities to collect and act upon data to elevate the customer experience
  • Build better products, provide better experiences, and have better conversations

Lastly, it is incredibly useful to have a good understanding of various data types before you begin working on your tracking plan, so whenever that is, this guide on data types will be helpful

For now, move on to part 2 to understand why event data is gathered and the purposes it serves.

Share on facebook
Share on linkedin
Share on twitter

Continue Learning...

Modern Data Stack for Growth is to product and growth people what Modern Data Stack for Analytics is to data people — an enabler for one to do their best work.
Clean and accurate behavioral data is critical to improve the customer experience across the customer journey, from acquisition to retention and beyond.
This guides explains the common data types and their importance in the context of gathering data.
Share on facebook
Share on linkedin
Share on twitter