What is deterministic data? | AudienceProject Help Center

Deterministic data: Information about people that is known for sure

Deterministic data consists of digital facts about people that we trust are 100% true. Crucially, these facts will never change and the probability that they are true will always be 100%, thus, they provide a solid foundation for a multitude of applications in online marketing. For example, if we know from a reliable source that a person was a 20-year-old female last year then that will always be true. We can even be clever and deduce that this year the person is a 21-year-old female. Knowing a person’s true age and gender is certainly of high relevance to online marketeers. Going beyond basic demographic information, deterministic data can take infinitely many forms, such as a person's interests, friends, geographical whereabouts etc. In practice, all these facts are linked to something that identifies a person, such as an email address or a cookie ID, which then becomes the real lingua franca of the online marketing industry.

Why is it important to have deterministic data? In a nutshell, deterministic data forms a “ground truth” about users that is both useful on its own and has many important downstream applications in online marketing. On its own, we can use deterministic data to create granular custom segments. For example, we can create a segment of people, who we know, share an interest in golf. Now, we could go ahead and target these golf enthusiasts with relevant online campaigns. The more deterministic data we have, the larger segments we can create.

Another use case for deterministic data is campaign validation. Let’s look at this use case in more detail. After a campaign has ran its course, online marketeers may ask themselves whether the campaign was successful. Was it able to reach its intended audience? What was the ratio of hits to misses? How did the campaign perform with respect to the target group on individual publisher websites? All these questions can be answered if we have deterministic data for a sufficiently large subset of the exposed users.

Finally, prediction is yet another important use case for deterministic data. Prediction involves making educated guesses about a user property that we do not know a priori. For example, we might try to guess the age, gender or interests of a user in order to create probabilistic segments. Prediction is great and a necessity, but it is also a source of inaccuracy. The more deterministic data (stuff you know) you have as a training set for your algorithms, the higher combination of accuracy and reach can theoretically be achieved, leading to more impressions you will deliver on target. After you train a probabilistic model, you also need to validate if the model was successful or whether it requires more tweaking. In other words, you can have all the behavioural data the internet has to offer, but without a solid base of deterministic data you are unlikely to deliver precision in your predictions. Many publishers will nod in disappointment to this, as they have experienced how their data products/partners were unable to help their business in the way they expected. Without a large volume of deterministic data to validate your model up against, you are flying blind. This is why trying to predict audience segments based on behavioural data alone or small pools of first party user data (e.g. 1000 user surveys) makes it very hard to generate reach without compromising on precision.

You may ask yourself where all this deterministic data comes from? The answer is that deterministic data comes from a multitude of sources, which includes online questionnaires, e-commerce sites, and social media. For example, web sites frequently ask their users to fill out questionnaires with details about their satisfaction level along with demographic information. E-commerce sites collect facts about people over time, such as the items they have bought and their shipping details. Social media encourage people to share facts, i.e. deterministic data, about themselves, such as their interests, employment history, and education level. All this data flows into a pipeline of deterministic data that is exchanged between different platforms on the internet, either directly or via services that are derived from the data. Crucially, we must remain critical of the sources from which deterministic data is gathered, since we promote this data to the level of digital facts about people with big consequences for targeting, campaign validation, and algorithmic segment creation.

In conclusion, deterministic data forms the valuable “ground truth” about the online population, which all other applications in online marketing are based on, that is unless we are willing to guess at random. While deterministic data offers value on its own, e.g as the basis for granular custom segments, it also forms the foundation for applications such as campaign validation and probabilistic segments, which potentially offer much bigger reach than deterministic segments. We gather deterministic data from a multitude of reliable online platforms that range from e-commerce sites to social media and questionnaires. We help publishers and agencies validate campaigns, create custom segments and predict with precision by providing high quality deterministic data panels.