3 Step Guide for Better Research Data Quality


Good insight relies on good data. Yet we aren’t always convinced we’ve collected good quality data. In particular sample quality is an on-going concern, with nearly half of researchers citing it as a primary frustration in our annual Research Trends study. So, what can be done to ensure the integrity of your data?

Improving data quality can be approached in three ways:

Reduce Response Burden
Include Data Quality Question
Clean Survey Data

Let’s look at each in turn.

Reduce Response Burden

Response Burden

Response burden within surveys is generally defined and measured as the time it takes a participant to complete a questionnaire. This makes sense. The longer the survey, the more burden on the participant. On that basis, as the number of questions is increased or decreased, the burden will increase or decrease accordingly.

This approach is a reasonable rule of thumb and certainly goes a long way to estimate the amount of burden placed on participants for a particular survey. However, in our never-ending quest to increase survey participation, minimize dropout rates, and ensure optimal data quality, it is worth considering all the elements that play into response burden.

The Theory

In the late 1970s, academic survey researchers began moving away from this time-oriented indicator of response burden – actual burden – and became interested in the notion of perceived burden, which considers other factors prevalent while completing a questionnaire.

In a 1978 breakthrough paper published in the Proceedings of the Survey Research Methods Section of The American Statistical Association, Norman Bradburn offers one of the first frameworks of response burden in this manner, outlining four factors that can increase burden within a survey:

The length of the interview (or questionnaire length for self-administered surveys),

The amount of effort required of the participant,

The amount of stress on the participant, and

The frequency with which the participant is interviewed.

A fundamental premise running throughout Bradburn’s paper surrounds the participant’s perception of the importance of the research, and he concludes that participants “seem to be willing to accept high levels of burden if they are convinced that the data are important.”

Bradburn suggests that these factors impact participants differently, with some individuals perceiving certain aspects as more burdensome than others. Later research talks about how elements from within the questionnaire itself influence these factors, such as repetitive questioning, long lists, grids, open-end questions, as well as external factors, such as socioeconomic, interests, motivations, and attitudes.

Putting Theory into Practice

Putting Theory into Practice

While we cannot control all aspects of response burden, we can address one of the largest contributors to it – the questionnaire. Factors such as poor question construction, cluttered/complicated appearance, and insufficient/over-complicated instructions all increase the participant’s burden and ultimately reduce the quality of their responses.

Making the survey easier to take and making it more enjoyable can reduce the perceived burden even if the actual burden (time to take the survey) remains the same.

How to do this?

Design short, mobile-first questionnaires. Short can be challenging:

If possible, approach your research questions iteratively, building upon the body of knowledge about your (or your client’s) brand, product, universe, one short study at a time.

Additionally, wherever possible, make use of existing sample profiling data to avoid asking known information.

Additional Resources:

Include Data Quality Questions 01
Include Data Quality Questions

Include Data
Quality Questions

With concerns of rogue respondents running high coupled with increasing possibilities of survey ‘bots’ negatively impacting data, adding specific quality questions within the questionnaire is standard practice.

There are a few ways to do this.

Attention Checks FocusVision

Attention Checks

Attention check questions are an overt way of checking whether the participant is paying attention to the questions and instructions. For example, “What color is the sky?” The instruction tells participants to select an incorrect response such as yellow.

Attention Checks FocusVision

Red Herrings

A more subtle approach is to add red herrings to a response list. For example, include one or two fake brands or products within the response list.

Attention Checks FocusVision

Duplicate Questions

Another option could be to ask the same question at different places in the questionnaire or ask for the same information in different ways to check for consistency.

While these additions may provide peace of mind to researchers that there are tangible ways to check the data, research-on-research indicates that overall, they may not do what they need to detect bogus respondents. Furthermore, they can be harmful to genuine participants who take umbrage at the questioning of their intent.

A less intrusive and perhaps more candid tactic is to ask about a series of real-life activities and include a few that have an extremely low incidence.

For example:
In the past six months, have you done the following?

  • Visited a grocery store
  • Vacationed in South Africa
  • Shopped online
  • Attended a ballet performance
  • Dined at a restaurant
  • Became a parent

Some of those options naturally have a higher incidence than others. When combined, it is less likely for someone to have said visiting a far-off country, attended a cultural performance, and became a parent all within six months. Bear in mind, however, that it is not impossible!

Additional Resources:

For more on quality checks, read this recent research-on-research study conducted by Pew Research: Assessing the Risks to Online Polls from Bogus Respondents

Clean Survey Data

Survey Data

Once you have your survey data, you can undertake various checks to identify inattentive and bogus respondents to remove them from your dataset.

Include Clean Survey Data

Three common assessments are straightlining, speeders, and gibberish responses.

Straightlining is when participants


Straightlining is when participants select the same or patterned response for a set of questions.

This is most common in traditional matrix grid type questions. For example, the question asks participants to rate their feelings about several brands on a scale ranging from very positive to very negative; they may select ‘positive’ for all the brands. Alternatively, they could engage in patterned straightlining, alternating between two response options: very positive and positive.

Straightlining most often occurs when fatigue sets in, especially in long surveys with multiple lengthy grids. It is best to address this in your questionnaire design and limit repetitive grids. Beyond this, it is useful to check collected data for repeat offenders. A word of caution, in many cases, it is entirely possible the participant has answered truthfully, yet the response pattern indicates straightlining.

Therefore, it’s essential to consider whether this is the case for the questions at hand and only exclude the data when specific criteria are met. For example, if the participant exhibited straightlining on three separate questions.

Speeders are participants that move through the questionnaire


Speeders are participants that move through the questionnaire at a rapid pace.

This quick pace is an indication they didn’t provide due consideration to the survey-taking process, and their data is of poor quality. To identify speeders, you’ll first need to investigate the average length of interview (LOI) and establish a ‘normal’ range. The LOI for speeders is an editorial judgment. Once that has been determined, then outliers – those completing the survey faster than the minimum acceptable time, can then be excluded from the study.

Gibberish Responses

Gibberish Responses

A third data check is to seek gibberish responses to open-end questions.

While this seems straightforward, there are editorial judgments around what is considered a nonsensical answer.

A response such as ‘qwerty’ or ‘aflgahgh’ is clearly not a valid response to the open-ended question. However, pause before immediately eliminating that participant from the survey. Is it possible that they didn’t have anything to say for that open-end question? If they answered other open ends thoughtfully, then it’s fair to assume that is the case. This is also the case for ‘nothing’ or ‘none’ responses.

Pro Tip:

Imperium Real Answer is integrated within FocusVision Decipher so you can easily evaluate participant’s open-end responses with their Real Answer Score™.

Concluding Thought

As we’ve seen, you can take several steps to improve survey data quality. We strongly recommended that all of these steps are employed simultaneously to address quality holistically.

See for yourself: Begin your journey to better customer insights

Request a demo

Sign up to receive news from FocusVision