What is derived data?

Derived data is new information created by processing and combining existing raw data sets. This process involves cross-referencing different data sets and performing advanced statistical analysis, which reveals insights not immediately obvious from the original data. Derived data is not simply a summary or reformatting of existing data; it provides entirely new insights. For example, by combining demographic information with buying preferences, businesses can derive new data about purchasing behaviors by age, gender, and education level. Derived data can come from observational, experimental, or simulation data, but not from previously derived data. While valuable, it also brings challenges related to accuracy, privacy, and ownership.

Why is derived data valuable? 

If you own a business, the answer is essential. Your existing data contains useful information, but you gain further insights when you combine it with other information to create derived data. So how do you create derived data and how can you use it in your business? 

Derived data key takeaways

  • Derived data is new data created by combining and processing existing raw data

  • Derived data can be created from observational, experimental, and simulation data – but not previously derived data

  • Derived data provides new insights not available from existing data – but comes with its own set of accuracy, privacy and ownership issues

What is derived data?

Statista estimates that 79 trillion gigabytes of data were generated in 2021 – and that's just the raw data. Companies and researchers worldwide are deriving even more data from this raw information – what we call derived data. 

Derived data is computed or extrapolated from other existing data. It typically is the result of cross-referencing or otherwise synthesizing different data sets and performing advanced statistical analysis on that combined material. Because of this, the information revealed within derived data isn't readily obvious from observing the original data. It doesn't exist until it is created.

diagram, venn diagram

Image Source: Optimizely

As a simple example of derived data, consider two different sets of data analytics. The first data set contains basic demographic information about a set of customers. The second data set contains buying preferences about those same customers. By combining and cross-referencing the two sets of data, new insights about buying preferences by age, gender and education level can be revealed. This more detailed information is derived data that is not apparent in either of the original data sets. 

Data can be derived in several different algorithms, including:

  • Extracting data

  • Restructuring data

  • Augmenting data

  • Inferencing new insights

  • Generating models

Copying, reformatting or repackaging data does not create derived data, nor does it simply summarize existing data. Derived data contains new information that is not in the original data.

What are the different types of data?

Researchers group data into four basic types: observational, experimental, simulation and derived. The first three types of data are sometimes referred to as direct data, distinctly different from derived data.

  1. Observational data

    Observational data is captured by observing an activity or surveying a person about an activity. For example, counting customer traffic is observational data.
  2. Experimental data

    Experimental data is collected when a researcher actively intervenes in a given activity and measures the resulting changes. For example, a study supplying an experimental drug to some subjects and a placebo to others is experimental data.
  3. Simulation data

    Simulation data is generated by mimicking a real-world process using test models. For example, running a computer simulation of stress levels on a new product is simulation data. 
  4. Derived data

    As you've learned, derived data is created by transforming existing data points to create new insights. For example, combining population data with geographic data to create population density data is considered derived data. 

    Derived data can be extracted from any of the other three types of data – but should not be derived from other derived data. When creating derived data, researchers follow a series of best practices that describe the input data, how that data is processed, and the accuracy of the derived results.

What issues are associated with derived data?

As useful as derived data is, it comes with its unique issues that result from how it is created. 

Accuracy issues

Derived data is extrapolated from existing data and thus not as exact as the raw data. Queries made on derived data may generate less-exact results than queries made on the original data. Accuracy can become an issue if derived data is subsequently processed with other derived data to create a new level of data. The scenario is similar to the challenge of making a copy of a copy of a photograph, which seldom retains the integrity of the original. (For this reason, it's prudent to store the original data instead of or in addition to the derived data.)

Privacy issues

Since derived data is often extracted from the analysis of existing data supplied with the explicit permission of individuals, those individuals are typically unaware of the new information revealed in the derived data. The question remains as to whether the permission given to use the base information implies permission to use data derived from but not explicitly contained within the original data. 

Ownership issues

Concurrent with privacy and usage matters is the issue of who owns the derived data. The original data typically comes from an identified source but the act of combining and transforming that data creates wholly new data sets. Do the original data owners have ownership claims on the derived data or are the derived data wholly owned by the entity that processed the original data? The law is not clear on this point. 

How can you use derived data in your business?

Derived data provides critical insights not readily apparent in the original data. Instead of being limited to the static observations of direct data, derived data moves beyond the raw data to make new connections and extrapolates new use cases.

Using derived data gives your business a distinct competitive advantage over other companies using more traditional data models. The use of derived data can help your business:

  • Better understand your customers' wants, needs and buying patterns

  • Identify your most valuable customers

  • Create personalized experiences and products for your most valued customers

  • Provide better customer service

  • Improve efficiency and reduce costs by better targeting your efforts

In short, when you want to move beyond the raw data you collect, use available analytic techniques to synthesize new, derived data. This derived data provides your company with advanced insights about your customers, your market, and your business that is not available from the original data. 

Let Optimizely help you reap the benefits from derived data

Optimizely's Digital Experience Platform synthesizes your existing data to create derived data to help drive your business. This creates actionable insights you can use to better define your target audience, provide personalized customer experiences, and fine-tune your ecommerce activities. Partner with Optimizely to get the most out of all your valuable data.