Good insight relies on good data. Yet we aren’t always convinced we’ve collected good quality data. In particular sample quality is an on-going concern, with nearly half of researchers citing it as a primary frustration in our annual Research Trends study. So, what can be done to ensure the integrity of your data?
Improving data quality can be approached in three ways:
Let’s look at each in turn.
Response burden within surveys is generally defined and measured as the time it takes a participant to complete a questionnaire. This makes sense. The longer the survey, the more burden on the participant. On that basis, as the number of questions is increased or decreased, the burden will increase or decrease accordingly.
This approach is a reasonable rule of thumb and certainly goes a long way to estimate the amount of burden placed on participants for a particular survey. However, in our never-ending quest to increase survey participation, minimize dropout rates, and ensure optimal data quality, it is worth considering all the elements that play into response burden.
In the late 1970s, academic survey researchers began moving away from this time-oriented indicator of response burden – actual burden – and became interested in the notion of perceived burden, which considers other factors prevalent while completing a questionnaire.
In a 1978 breakthrough paper published in the Proceedings of the Survey Research Methods Section of The American Statistical Association, Norman Bradburn offers one of the first frameworks of response burden in this manner, outlining four factors that can increase burden within a survey:
The length of the interview (or questionnaire length for self-administered surveys),
The amount of effort required of the participant,
The amount of stress on the participant, and
The frequency with which the participant is interviewed.
A fundamental premise running throughout Bradburn’s paper surrounds the participant’s perception of the importance of the research, and he concludes that participants “seem to be willing to accept high levels of burden if they are convinced that the data are important.”
Bradburn suggests that these factors impact participants differently, with some individuals perceiving certain aspects as more burdensome than others. Later research talks about how elements from within the questionnaire itself influence these factors, such as repetitive questioning, long lists, grids, open-end questions, as well as external factors, such as socioeconomic, interests, motivations, and attitudes.
Putting Theory into Practice
While we cannot control all aspects of response burden, we can address one of the largest contributors to it – the questionnaire. Factors such as poor question construction, cluttered/complicated appearance, and insufficient/over-complicated instructions all increase the participant’s burden and ultimately reduce the quality of their responses.
Making the survey easier to take and making it more enjoyable can reduce the perceived burden even if the actual burden (time to take the survey) remains the same.
How to do this?
Design short, mobile-first questionnaires. Short can be challenging:
If possible, approach your research questions iteratively, building upon the body of knowledge about your (or your client’s) brand, product, universe, one short study at a time.
Additionally, wherever possible, make use of existing sample profiling data to avoid asking known information.
For help creating your survey, see The Definitive Guide to Effective Online Surveys and the accompanying 10 Tips to Creating Great Survey Questions infographic.
For research-on-research, read the study Mobilize Me! Design Techniques to Improve Survey Participation and Data Quality
With concerns of rogue respondents running high coupled with increasing possibilities of survey ‘bots’ negatively impacting data, adding specific quality questions within the questionnaire is standard practice.
There are a few ways to do this.
Attention check questions are an overt way of checking whether the participant is paying attention to the questions and instructions. For example, “What color is the sky?” The instruction tells participants to select an incorrect response such as yellow.
A more subtle approach is to add red herrings to a response list. For example, include one or two fake brands or products within the response list.
Another option could be to ask the same question at different places in the questionnaire or ask for the same information in different ways to check for consistency.
While these additions may provide peace of mind to researchers that there are tangible ways to check the data, research-on-research indicates that overall, they may not do what they need to detect bogus respondents. Furthermore, they can be harmful to genuine participants who take umbrage at the questioning of their intent.
A less intrusive and perhaps more candid tactic is to ask about a series of real-life activities and include a few that have an extremely low incidence.
In the past six months, have you done the following?
- Visited a grocery store
- Vacationed in South Africa
- Shopped online
- Attended a ballet performance
- Dined at a restaurant
- Became a parent
Some of those options naturally have a higher incidence than others. When combined, it is less likely for someone to have said visiting a far-off country, attended a cultural performance, and became a parent all within six months. Bear in mind, however, that it is not impossible!
For more on quality checks, read this recent research-on-research study conducted by Pew Research: Assessing the Risks to Online Polls from Bogus Respondents
Once you have your survey data, you can undertake various checks to identify inattentive and bogus respondents to remove them from your dataset.
Three common assessments are straightlining, speeders, and gibberish responses.
Straightlining is when participants select the same or patterned response for a set of questions.
This is most common in traditional matrix grid type questions. For example, the question asks participants to rate their feelings about several brands on a scale ranging from very positive to very negative; they may select ‘positive’ for all the brands. Alternatively, they could engage in patterned straightlining, alternating between two response options: very positive and positive.
Straightlining most often occurs when fatigue sets in, especially in long surveys with multiple lengthy grids. It is best to address this in your questionnaire design and limit repetitive grids. Beyond this, it is useful to check collected data for repeat offenders. A word of caution, in many cases, it is entirely possible the participant has answered truthfully, yet the response pattern indicates straightlining.
Therefore, it’s essential to consider whether this is the case for the questions at hand and only exclude the data when specific criteria are met. For example, if the participant exhibited straightlining on three separate questions.
Speeders are participants that move through the questionnaire at a rapid pace.
This quick pace is an indication they didn’t provide due consideration to the survey-taking process, and their data is of poor quality. To identify speeders, you’ll first need to investigate the average length of interview (LOI) and establish a ‘normal’ range. The LOI for speeders is an editorial judgment. Once that has been determined, then outliers – those completing the survey faster than the minimum acceptable time, can then be excluded from the study.
A third data check is to seek gibberish responses to open-end questions.
While this seems straightforward, there are editorial judgments around what is considered a nonsensical answer.
A response such as ‘qwerty’ or ‘aflgahgh’ is clearly not a valid response to the open-ended question. However, pause before immediately eliminating that participant from the survey. Is it possible that they didn’t have anything to say for that open-end question? If they answered other open ends thoughtfully, then it’s fair to assume that is the case. This is also the case for ‘nothing’ or ‘none’ responses.
Imperium Real Answer is integrated within FocusVision Decipher so you can easily evaluate participant’s open-end responses with their Real Answer Score™.