Customer Data comprises Event Data and Entity Data — you already know that if you have gone through the previous parts of this series.
In this guide, you will learn about the components of event data, the preferred naming convention to define events, the categories of entity data, and the two main types of entities.
Since you have certainly bought stuff online, let’s start with an ecommerce example.
When interacting with an ecommerce website or app, you typically buy a product by adding it to your cart, proceeding to checkout, and completing the payment. From the product’s perspective, these are events that took place when you went through the process of buying an item.
Needless to say, there could be several other events that take place during the buying process such as:
- A product is viewed
- The cart is viewed
- A product is removed from the cart
- A coupon is applied
- An address is chosen
- A payment method is chosen
- An order is completed
And so on.
Events that come to mind immediately are Add to Cart, Proceed to Checkout, and Make Payment. However, in order to understand user behavior, you would also need to track other important events like the ones mentioned above.
Deciding which events to track and naming the events using a proper naming convention are the first two steps in the process of gathering event data.
What are the next two steps?
Glad you asked!
Each event is accompanied by event properties (or event attributes) that provide more context about an event. Deciding which properties to associate with each event and naming those properties are the next two steps in the process of gathering event data.
What’s In a Name?
When it comes to data, everything! A proper naming convention is what makes good data stand out from bad data, making it easy for all stakeholders to understand what they are looking at. Not maintaining a standardized taxonomy is one of the main causes that lead to data sets being skewed or bloated with redundancy.
Also, a good naming convention comes with strict casing guidelines. When working with customer data, not maintaining uniform casing when naming events and event properties is one of the biggest mistakes you can make — one that can have long-term ramifications.
Add to Cart, added_to_cart, productAdded, add to cart, Added to cart, Product Added are different ways to define the same event.
There are no set rules or the right way when it comes to naming events and properties. However, there are best practices laid out by companies leading the customer data space.
Segment is one such company and it developed the object-action framework — a naming convention that has become the industry standard. Take a look at this short guide explaining the framework and covering the best practices of naming conventions.
Components of Event Data
Entity data (such as user_id) provides more information about who performs an event. In the absence of a unique identifier like user_id, you can only have anonymous event data as there will be no way to know who performs an event.
However, when an event takes place, there are many other pieces of related information that can be gathered.
For instance, when a product is purchased, besides knowing who made the purchase, at the very least, you also need to know what product was purchased at what price, and when.
Those additional pieces of information are gathered in the form of event properties.
In the guide to Customer Data, it was mentioned that event data comprises three key elements:
- The action or the event that took place
- The timestamp or the precise date and time when the event took place
- The state or all other properties associated with the event (known as event properties)
Let’s look at the event Product Added — the name for Add to Cart as per the object-action framework in the Proper Case — and assume that it was performed by a user on Jan 1, 2020, at 10 AM UTC. The data gathered when the event took place includes the following:
- The action: Product Added
- The timestamp: 1577872800 (Unix timestamp for Jan 1, 2020, 10 AM UTC)
- The state: 0123 (user_id), ABZ (product_id), 7.99 (price), and 2 (quantity)
As per this example, the properties associated with the event Product Added are user_id, product_id, price, and quantity, each of which provides more information about the event. And to know when the event took place, the timestamp is associated with it.
It is also useful to specify a name for the timestamp which is essentially an event property. It’s not mandatory to do so as it is standard practice to associate the timestamp as timestamp with every event when sending data to third-party tools.
However, specifying a distinct name for the property that stores the timestamp can be helpful in the long run when you need to work with historical event data.
The recommended taxonomy for timestamps is the event name followed by “at” — product_added_at for the event Product Added.
You might have already noticed that the snake_case is being used to define event properties which makes it easy to distinguish event names from event properties. That said, remember, there is no right way and you should choose whatever works best for you and your team.
Here’s a final look at the properties associated with the event Product Added and the data types of each of those properties:
Specifying the data type for each property ensures consistency of data and makes the instrumentation process easier.
Side note: It’s good to keep in mind that user_id is a user property (entity data) that acts as the identifier for an event and is therefore passed as an event property. Similarly, there can be other entity data that you might want to pass with certain events which is covered under Entity Data below.
It should now be clear that gathering event data comprises the following steps:
- Deciding which events to track
- Naming those events using a proper naming convention
- Deciding which properties to associate with each event
- Naming those properties using a proper naming convention
The next and the last part of this series covers the process of deciding which events to track and what data to gather.
However, you should have a good idea of what to expect when looking at event data (whether in a tracking plan before instrumentation or inside a data destination such as your product analytics tool). The process of creating a tracking plan is covered in extreme detail in the guide titled “What the hell is a Tracking Plan and How Can I Create One?”
Some Common Events and Their Properties
Before moving on, take a look at a few common events and properties that are tracked by most tech products.
It’s time to take an in-depth look at different entities and their properties.
If you haven’t already, go through this guide to understand how entity data relates to event data.
In the first part of this series, it was mentioned that data shared by users is entity data. While that is true, not all entity data is shared by users themselves — entity data can also be generated.
Entity data essentially comprises properties associated with the entity. If User is the entity, all information about a user is gathered in the form of user properties.
In order to identify users, first a user_id is generated for every user by default. Also, you know that user_id is associated with events and acts as an identifier for every event.
For the time being, forget about events and think about the different pieces of information that relate exclusively to users and tell you about their traits.
Categories of Entity Data
Entity Data can be categorized into the following buckets (also mentioned in part 3 of this series):
- Personally identifiable information such as name, email, and phone
- Demographics such as age, gender, and location
- Persona such as industry, job role, and goal.
- Preferences such as brands, genres, and product categories.
- Product-specific data such as products purchased, apps used, time spent, and subscription type.
The various pieces of data under each bucket fall under user properties. In other words, user properties store various details and traits about users, enabling you to identify them and know more about them.
While most of the info comes from the user directly, product-specific data about a user is automatically generated over time as a result of product usage.
But isn’t event data also generated due to product usage?
It sure is and often user data is gathered when an event takes place. Let’s take a look at the Signed Up event and its properties:
As you can see, all the properties associated with this event give you details about users — details that are either shared by users themselves (first_name, last_name, email, phone, country) or details that are generated automatically (signed_up_at, user_id).
It is helpful to keep in mind the following:
- Some events like Signed Up or Email Verified are performed only once by every user and the various pieces of data gathered from such events translate into user properties.
- Most user properties barring timestamps and identifiers are subject to change. A user can change their name, email, phone, location, industry, job role, etc. But the time of signing up (signed_up_at) or the unique identifier (user_id) cannot be changed by the user.
User Properties vs Organization Properties
Product-specific data, which is generated as a result of product usage, is tied to a user or an organization (or group), making them the two main types of entities about which data is gathered.
There might be other entities such as team or project with certain pieces of data tied to them, as is the case with many B2B SaaS apps. However, they are also group entities and the process of gathering data is the same as that of organization.
The time spent on an app, products purchased, songs played, or videos watched are all updated with an increase in usage. These properties are associated with the user, and are therefore stored as user properties.
Let’s take a look at some common user properties relevant to B2B SaaS products:
When a user is part of an organization, as in the case of B2B SaaS products, many important pieces of information are tied to the organization and not the user.
Some common organization properties (also referred to as group properties) are as follows:
It is important to keep in mind that the organization_id also acts as an identifier and can be associated with events to know under which organization did a user perform an event.
Keeping the following statements in mind can help differentiate between user properties and organization or group properties:
- Every piece of information that helps define user cohorts — where they come from, who they are, what their objective is, or what they do inside a product — is stored as a user property.
- Every piece of information that helps segment accounts or organizations — the account type, the revenue it generates, the products or features it uses, the resources it consumes, or the number of users who are part of it — is stored as an organization property (or group property).
Once you can differentiate between the above, it becomes easier to bring new entities (such as teams or projects) into the mix.
You should now have a clear understanding of how to define events and their associated properties, as well as specify the properties of each entity (user and organization).
You should now proceed to Part 5 of this series which lays out a framework to help you decide which events to track and what data to gather, as well as touches upon the various approaches you can take when specifying events.