Skip to content

Data Calibration and Back Casting

February 1, 2021

Data Tracking Best Practices

As Insights professionals, we know that tracking programs come with data nuances and managing those nuances is critical. Our clients expect trackers to provide trendable insights wave after wave. Executing a successful tracking program requires diligent attention to many controllable factors across the entire project including panel providers, questionnaire changes (or lack of changes), sample criteria and consistent data collection timelines. Any slight deviation in any of these areas could impact results and thus, the insights derived from the data. However, there are also external factors that cannot be controlled as easily and may require various data calibration methods to be implemented to ensure data consistencies. Some of these external factors that would require calibration or back casting include:

  • Change in data collection type (e.g. phone to online, online to mobile)
  • Change in sample provider
  • Frequency of sample release
  • Change in respondent qualifications
  • Reminders added
  • Change in vendor
  • Influential changes to surve

We will dive into several best practices to ensure clients’ trackers stay on track and the data is consistent with previous waves.


Parallel Testing

One of the best ways to calibrate data is to run a parallel test. In a parallel test we collect data leveraging the previous methodology as well as the new methodology. This allows us to get direct comparisons and help isolate any methodological differences to better inform the calibration.

The length of a parallel test depends on the category and nature of the tracker. At a minimum it is one wave, but several factors should be considered when determining the optimal length:

  • Is seasonality a consideration? (e.g., buying more in summer, Q4 enrollment has new members vs. Q1, etc.)
  • Consider the industry and purchase habits in terms of applicable timeframe.
    • COVID-19 has had considerable impact on consumers’ lifestyle and buying habits. Considering these impacts and changes in purchase patterns may also influence the length of a parallel test.
  • Is the base size large enough from one wave to detect differences, include all groups? n=500 is the recommended minimum
  • Any external market factors such as new market entrants, advertising campaigns or product launches?

The findings of the parallel test will guide next steps. Differences in demographics or key quota groups a simple weighting scheme can adjust, or are the differences more isolated to how similarly sized quota groups answer key metrics and calibration of scales are needed?



If demographics or other key quota groups are not in alignment with previous waves, a weighting scheme could adjust and bring core metrics in line with trending results. A previous blog post discusses weighting best practices and can be reviewed here as needed.



Calibration is an exercise in reviewing the data and looking at the differences of the previous data and systematically adjusting it to align more with the current and likely future waves. Assuming the factors discussed above are in line and the methodology of the current wave was executed properly and in the spirit of the desired methodology going forward, there are several ways to back cast or calibrate the data.

There are two general approaches to take when calibrating data. The data can be calibrated at the aggregate level, where the data can be examined in total (means, T2B%, etc) or it can be calibrated at the respondent level.


Respondent Level Calibration

Respondent level calibration consists of two recommended approaches: 1) Weighting and 2) Clustering.

  • Weighting

We will not dive deep into weighting best practices as that has been covered in a previous blog.

  • Clustering and Normalization

When we switch methodologies or change panel providers for data collection, we know respondents within different panels use scales differently, so the change contributes to a shift in scores across the board.​ Therefore, by normalizing and clustering respondents we can systematically “back-cast” previous data to estimate what results would look like if data had been collected using a similar panel or methodology.​


What steps are needed to perform this “back-cast”:

  1. For every wave of data, we normalize1each respondent’s scores across a series of ratings​.
  2. Next, we group respondents together in clusters and micro-clusters based on their normalized response patterns2.  ​
  3. If there is a parallel test, we use raw (non-normalized) data to create previous wave adjustment factors for each group of respondents.  Otherwise, raw data from the previous waves are used.​
  4. We apply these adjustments to each variable3for each respondent in the previous data sets for each micro-cluster.​
  5. Simple subtraction is used to determine the adjustment (apply to each comparable variable from parallel test or from each previous wave vs. the current).


Aggregate Calibration

Aggregate calibration may be necessary if you don’t have access to the previous wave’s respondent level data, or you need a simple solution for timing reasons or cost implications. In an aggregate calibration methodology, you group similar questions together. Questions with the same scale and that are thematically similar are grouped together. Once grouped, the average difference between scores in the current and parallel methodology are found. The average differences are applied to past data to shift the past data to be in line with the current methodology. These groupings help prevent over fitting of the adjustments by minimizing the nature sampling error each question could have and honing in on methodical differences across the questions.

Data consistency in tracking work is essential and the ability to be agile and adapt to maintain data consistency when external factors have changed is critical. While calibration and back casting may not be ideal, it is a reality we face. The ability to onboard new tracking work, shift sampling methodology or change the sample criteria will impact data trends and with these best practices we can assure our clients their data can still be leveraged as a historical reference point.


Written by Mike Miller, Vice President & Team Lead, Data Science, at Big Village Insights.

1Normalization – For each rating question, subtract the respondent’s average score across series of questions, and divide by the standard deviation. This maintains the respondent’s relative highs and lows within the series, but controls for tendencies to use different parts of the scale.​

2Form clusters with similar response patterns across all waves. Further divide clusters into micro-clusters (e.g., by Region, Brand (Client vs. all other), Age etc.)​

3Some variables are not included in the adjustment process due to a large amount of missing data, numerous confounding changes, or very small bases <30.​