3.3 The total survey error framework

Total survey error = representation errors + measurement errors.

Estimates that come from sample surveys are often imperfect. That is, there is usually a difference between the estimate produced by a sample survey (e.g., the estimated average height of students in a school) and the true value in the population (e.g., the actual average height of students in a school). Sometimes these errors are so small that they are unimportant, but sometimes, unfortunately, they can be big and consequential. In an attempt to understand, measure, and reduce errors, researchers gradually created a single, overarching conceptual framework for the errors that can arise in sample surveys: the total survey error framework (Groves and Lyberg 2010). Although the development of this framework began in the 1940s, I think it offers us two helpful ideas for survey research in the digital age.

First, the total survey error framework clarifies that there are two types of errors: bias and variance. Roughly, bias is systematic error and variance is random error. In other words, imagine running 1,000 replications of the same sample survey and then looking at the distribution of the estimates from these 1,000 replications. The bias is the difference between the mean of these replicate estimates and the true value. The variance is the variability of these estimates. All else being equal, we would like a procedure with no bias and small variance. Unfortunately, for many real problems, such no-bias, small-variance procedures do not exist, which puts researchers in the difficult position of deciding how to balance the problems introduced by bias and variance. Some researchers instinctively prefer unbiased procedures, but a single-minded focus on bias can be a mistake. If the goal is to produce an estimate that is as close as possible to the truth (i.e., with the smallest possible error), then you might be better off with a procedure that has a small bias and a small variance than with one that is unbiased but has a large variance (figure 3.1). In other words, the total survey error framework shows that when evaluating survey research procedures, you should consider both bias and variance.

Figure 3.1: Bias and variance. Ideally, researchers would have a no-bias, low-variance estimation procedure. In reality, they often have to make decisions that create a trade-off between bias and variance. Although some researchers instinctively prefer unbiased procedures, sometimes a small-bias, small-variance procedure can produce more accurate estimates than an unbiased procedure that has high variance.

The second main insight from the total survey error framework, which will organize much of this chapter, is that there are two sources of errors: problems related to who you talk to (representation) and problems related to what you learn from those conversations (measurement). For example, you might be interested in estimating attitudes about online privacy among adults living in France. Making these estimates requires two different types of inference. First, from the answers that respondents give, you have to infer their attitudes about online privacy (which is a problem of measurement). Second, from the inferred attitudes among respondents, you must infer the attitudes in the population as a whole (which is a problem of representation). Perfect sampling with bad survey questions will produce bad estimates, as will bad sampling with perfect survey questions. In other words, good estimates require sound approaches to measurement and representation. Given that background, I’ll review how survey researchers have thought about representation and measurement in the past. Then, I’ll show how ideas about representation and measurement can guide digital-age survey research.