Data type is an attribute associated with a piece of data that tells a computer system how to interpret its value. Understanding data types ensures that data is collected in the preferred format and the value of each property is as expected.
Please keep in mind that data types are not to be confused with the two types of data together referred to as customer data — entity data and event data.
A good understanding of data types is required to properly define event properties and entity properties. A well-defined tracking plan must contain the data type of every property to ensure data accuracy and prevent data loss.
Before jumping into the importance of data types, let’s take a look at some of the common data types.
Common Data Types
It is the most common numeric data type used to store numbers without a fractional component (-707, 0, 707).
Floating Point (float)
It is also a numeric data type used to store numbers that may have a fractional component, like monetary values do (707.07, 0.7, 707.00).
Please note that number is often used as a data type that includes both int and float types.
It is used to store a single letter, digit, punctuation mark, symbol, or blank space.
String (str or text)
It is a sequence of characters and the most commonly used data type to store text. Additionally, a string can also include digits and symbols, however, it is always treated as text.
A phone number is usually stored as a string (+1-999-666-3333) but can also be stored as an integer (9996663333) .
It represents the values true and false. When working with the boolean data type, it is helpful to keep in mind that sometimes a boolean value is also represented as 0 (for false) and 1 (for true).
Enumerated type (enum)
It contains a small set of predefined unique values (also known as elements or enumerators) that can be compared and assigned to a variable of enumerated data type.
The values of an enumerated type can be text-based or numerical. In fact, the boolean data type is a pre-defined enumeration of the values true and false.
For example, if rock and jazz are the enumerators, an enumerated type variable genre can be assigned either of the two values, but not both.
Assuming that you are asked to fill in your preferences on a music app and are asked to choose either one of the two genres via a dropdown menu, the variable genre will store either rock or jazz.
With enumerated type, values can be stored and retrieved as numeric indices (0, 1, 2) or strings.
Also known as a list, an array is a data type that stores a number of elements in a specific order, typically all of the same type.
Since an array stores multiple elements or values, the structure of data stored by an array is referred to as an array data structure.
Each element of an array can be retrieved using an integer index (0, 1, 2,…), and the total number of elements in an array represents the length of an array.
For example, an array variable genre can store one or more of the elements rock, jazz, and blues. The indices of the three values are 0 (rock), 1 (jazz), and 2 (blues), and the length of the array is 3 (since it contains three elements).
Continuing on the example of the music app, if you are asked to choose one or more of the three genres and you happen to like all three (cheers to that), the variable genre will store all three elements (rock, jazz, blues).
Needs no explanation; typically stores a date in the YYYY-MM-DD format (ISO 8601 syntax).
Stores a time in the hh:mm:ss format. Besides the time of the day, it can also be used to store the time elapsed or the time interval between two events which could be more than 24 hours. For example, the time elapsed since an event took place could be 72+ hours (72:00:59).
Stores a value containing both date and time together in the YYYY-MM-DD hh:mm:ss format.
Typically represented in Unix time, a timestamp represents the number of seconds that have elapsed since midnight (00:00:00 UTC), 1st January 1970.
It is typically used by computer systems to log the precise date and time of an event, down to the number of seconds, in a format that is unaffected by time zones. Therefore unlike datetime, the timestamp remains the same irrespective of your geographical location.
If you think about it, each one of us carries a timestamp — enter the date and time of your birth here to see your own.
Example and Recap
Different programming languages offer various other data types for a variety of purposes, however, the most commonly used data types that you need to know to become data-led have been covered.
A good way to think about data types is when you come across any form or survey.
Looking at a standard registration form, you should keep in mind that each field accepts values of a particular data type.
A text field stores the input as a string while a number field typically accepts an integer.
Names and email addresses are always of the type string, while numbers can be stored as a numerical type or as string since a string is a set of characters including digits.
In single option or multiple option fields, where one has to select from predefined options, data types enumerated type and arrays come into play.
In the Facebook sign up form above, the Birthday field has 3 sub-fields, each of enumerated type asking you to choose one option for day, month, and year respectively.
Similarly, the Gender field wants you to choose from the two predefined choices or add a custom one, the input of which is stored as string.
Strings like passwords are always hashed or encrypted (or at least should be).
Now let’s look at the importance of data types.
Importance of Data Types
You might be wondering why it’s important to know about all these data types when you are mainly concerned with understanding how to leverage customer data. There is only one main reason — to gather clean and consistent data.
Your knowledge of data types will come handy in two stages of your data collection efforts as described below.
The process of tracking and sending data to external tools and systems is known as instrumentation, and the very first step in that process is to create a data tracking plan. Everything you need to know about a tracking plan is covered in this guide.
When deciding which events to track and what properties to collect (both event and entity properties), specifying the data type of each property in the tracking plan makes the instrumentation process a lot more efficient and leaves little room for error.
This is particularly helpful for engineers who are tasked with the implementation. By making sure that each property is sent with the correct data type, data inconsistency can be avoided.
As a data-led professional, it is likely that you will gather data from your customers via surveys throughout the customer journey — from onboarding to churn.
The questions you ask in a survey could be open-ended (text or number) or come with predefined choices like a drop-down list (enum), checkboxes (array), radio buttons (boolean), or even a slider (depends).
For each field in your survey, you need to specify the property name that stores the value (industry_name, job_role, cancellation_reason, is_satisfied, etc) and its data type (string, number, boolean, etc.)
By doing so, you can maintain data consistency and make analysis easier. It’s good to keep in mind that open-ended questions make for tougher analysis as you cannot aggregate the responses unless you automatically parse each response and extract the text that matches a preset rule.
With predefined choices, analysis is straightforward and is not affected even if you change the choices at a later stage (refer to enum and array data types).
Application of your knowledge on data types is not limited to data collection or instrumentation; other activities such as data integration and internal application development (using no-code or low-code tools) should also become a lot easier now that you understand the various data types.