A significance test is one of the most widely used statistical procedures in survey research. It is a tool that can help investigate differences between segments within your data and provide a way to focus on data findings that are meaningful.
Perhaps due to its popularity and ease of use (most data reporting packages perform significance tests automatically), significance tests are often used and abused as a test of “truth” to validate data findings. In reality, it does no such thing. While a significance test can be used as a means towards data insight, what it actually uncovers is very narrow in scope.
Example Significance Test
A significance test is commonly used in survey research as part of a crosstab analysis. In a crosstab analysis, you split your data in order to investigate differences between segments. Take the following data table. It shows the age ranges for three different segments: PC, Phone, and Tablet users. You can see from this crosstab analysis, that tablet users tend to skew older while PC and phone users are more likely to fall in the younger age ranges. However, as with any survey, there’s the possibility that the segment differences seen are due to sampling error rather than being a true difference within the population.
|PC User (a)||Smartphone User (b)||Tablet User (c)|
|18-24 years old||5% C||5% C||2%|
|25-34||45% C||49% AC||30%|
|35-44||29%||33% A||40% AB|
|45-54||15% B||11%||22% AB||55-64||5% B||2%||5% B|
Sampling error occurs whenever you conduct a survey. That’s because, in every survey, you sample only a subset of the population of interest. Imagine conducting a study to find your NPS score. You will typically survey a subset of your customers as a way to gain an understanding of your entire customer base. The NPS score derived from your survey will be different compared to if you had actually been able to ask all of your customers. This difference is called sampling error. High sampling error means the data from your survey does a poor job reflecting the opinions of your customer base.
Using a Significance Test
Back to the data table. It shows that tablet users tend to skew older compared to PC and phone users. Are these segment differences due to high sampling error? If so, that will invalidate the findings and you cannot conclude that tablet users skew older in the general population. To check, you can perform a significance test. The percentages in each column are tested against all the other columns. Anytime a red letter appears, that indicates statistical significance at 95% confidence.
For example, in column C, you see the statistic “40%” for 35-44-year-olds. Underneath there are the letters “AB” This says that the 40% is significantly higher than the 29% (column A) and 33% (column B) of the same row. Similarly, in column C the “5%” for 55-64-year-olds is significantly higher than 2% (column B). The significance test reveals there is a high probability (95% confidence) that these differences in the data are not due to sampling error.
Other than providing statistical confidence for your data, having the red letters show up in data tables serve as signposts to draw your focus when sifting through a large deck of data tables. The red letters flag a potential relationship between the age of the person and what kind of device they use. That may have product and marketing implications. In order to reach the typical tablet consumer, you may want to appeal to the middle-aged set. Most phone users, on the other hand, fall in the 25-34 age range.
Meaning and Insight
Whenever you see statistical significance in your data, it may be tempting to automatically conclude that the segment differences observed are “real” or “meaningful.” That may very well be the case, but remember, a significance test reveals no such thing. It only rules out sampling error as a threat to the validity of your findings. What if you had a poorly written questionnaire or made fielding/programming mistakes in your survey? These problems can invalidate your data or even be the reason for the segment differences observed. There are different types of errors in research. Sampling error is just one that you want to try and reduce.
Even with statistically significant results, you must still gauge whether your data is meaningful or practical. If you find females take an average of 90 seconds to fill out a survey, while males take 88 seconds—- with a high sample size, that difference will be statistically significant. But do you care about it? Probably not. In another scenario, you might sample 20 customers in one week and another 20 a week later. You might find a 30% point increase in dissatisfaction. While that week to week difference is not statistically significant, it probably shouldn’t be ignored either.
If all this makes you a little more critical about significance testing it should. Survey researchers often use significance testing as a shortcut for determining what is “correct” or important. You should not automatically do that. Examine the full evidence. Use multiple data points to support your overall conclusion and drive your data story. Ensure that your survey design and data collection procedures are done properly. Significance testing is just one tool for helping you evaluate your data findings.