3.4.3 Non-probability samples: sample matching

Not all non-probability samples are the same. We can add more control on the front end.

The approach Wang and colleagues used to estimate the outcome of the 2012 US presidential election depended entirely on improvements in data analysis. That is, they collected as many responses as they could and then attempted to re-weight them. A complementary strategy for working with non-probability sampling is to have more control over the data collection process.

The simplest example of a partially controlled non-probability sampling process is quota sampling, a technique that goes back to the early days of survey research. In quota sampling, researchers divide the population into different groups (e.g., young men, young women, etc) and then set quotas for the number of people to be selected in each group. Respondents are selected in a haphazard manner until the researcher has met their quota in each group. Because of the quotas, the resulting sample looks more like the target population than would be true otherwise, but because the probabilities of inclusion are unknown many researchers are skeptical of quota sampling. In fact, quota sampling was a cause of the “Dewey Defeats Truman” error in the 1948 US Presidential polls. Because it provides some control over the sampling process, however, one can see how quota sampling might have some advantages over a completely uncontrolled data collection.

Moving beyond quota sampling, more modern approaches to controlling the non-probability sampling process are now possible. One such approach is called sample matching, and it is used by some commercial online panel providers. In its simplest form, sample matching requires two data sources: 1) a complete register of the population and 2) a large panel of volunteers. It is important that the volunteers do not need to be a probability sample from any population; to emphasize that there are no requirements for selection into the panel, I’ll call it a dirty panel. Also, both the population register and the dirty panel must include some auxiliary information about each person, in this example, I’ll consider age and sex, but in realistic situations this auxiliary information could be much more detailed. The trick of sample matching is to select samples from a dirty panel in a way that produces samples that look like probability samples.

Sample matching begins when a simulated probability sample is taken from the population register; this simulated sample becomes a target sample. Then, based on the auxiliary information, cases in the target sample are matched to people in the dirty panel to form a matched sample. For example, if there is a 25 year old female in the target sample, then the researcher finds a 25 year old female from the dirty panel to be in the matched sample. Finally, members of the matched sample are interviewed to produce the final set of respondents.

Even though the matched sample looks like the target sample, it is important to remember that the matched sample is not a probability sample. Matched samples can only match the target sample on the known auxiliary information (e.g., age and sex), but not on unmeasured characteristics. For example, if people on the dirty panel tend to be poorer—after all, one reason to join a survey panel is to earn money—then even if the matched sample looks like the target sample in terms of age and sex it will still have a bias toward poor people. The magic of true probability sampling is to rule out problems on both measured and unmeasured characteristics (a point that is consistent with our discussion of matching for causal inference from observational studies in Chapter 2).

In practice, sample matching depends on having a large and diverse panel eager to complete surveys, and thus it is mainly done by companies that can afford to develop and maintain such a panel. Also, in practice, there can be problems with matching (sometimes a good match for someone in the target sample does not exist on the panel) and non-response (sometimes people in the matched sample refuse to participate in the survey). Therefore, in practice, researchers doing sample matching also perform some kind of post-stratification adjustment to make estimates.

It is hard to provide useful theoretical guarantees about sample matching, but in practice it can perform well. For example, Stephen Ansolabehere and Brian Schaffner (2014) compared three parallel surveys of about 1,000 people conducted in 2010 using three different sampling and interviewing methods: mail, telephone, and an Internet panel using sample matching and post-stratification adjustment. The estimates from the three approaches were quite similar to estimates from high-quality benchmarks such as the Current Population Survey (CPS) and the National Health Interview Survey (NHIS). More specifically, both the Internet and mail surveys were off by an average of 3 percentage points and the phone survey was off by 4 percentage points. Errors this large are approximately what one would expect from samples of about 1,000 people. Although, none of these modes produced substantially better data, both the Internet and phone survey (which took days or weeks) were substantially faster to field than the mail survey (which took eight months), and the Internet survey, which used sample matching, was cheaper than the other two modes.

In conclusion, social scientists and statisticians are incredibly skeptical of inferences from these non-probability samples, in part because they are associated with some embarrassing failures of survey research such as the Literary Digest poll. In part, I agree with this skepticism: unadjusted non-probability samples are likely to produce bad estimates. However, if researchers can adjust for the biases in the sampling process (e.g., post-stratification) or control the sampling process somewhat (e.g., sample matching), they can produce better estimates, and even estimates of sufficient quality for most purposes. Of course, it would be better to do perfectly executed probability sampling, but that no longer appears to be a realistic option.

Both non-probability samples and probability samples vary in their quality, and currently it is likely the case that most estimates from probability samples are more trustworthy than estimates from non-probability samples. But, even now, estimates from well-conducted non-probability samples are probably better than estimates from poorly-conducted probability samples. Further, non-probability samples are substantially cheaper. Thus, it appears that probability vs non-probability sampling offers a cost-quality trade-off (Figure 3.6). Looking forward, I expect that estimates from well-done non-probability samples will become cheaper and better. Further, because of the breakdown in landline telephone surveys and increasing rates of non-response, I expect that probability samples will become more expensive and of lower quality. Because of these long-term trends, I think that non-probability sampling will become increasingly important in the third era of survey research.

Figure 3.6: Probability sampling in practice and non-probability sampling are both large, heterogeneous categories. In general, there is a cost-error trade-off with non-probability sampling being lower cost but higher error. However, well-done non-probability sampling can produce better estimates than poorly-done probability sampling. In the future, I expect that non-probability sampling will get better and cheaper while probability sampling will get worse and more expensive.

Figure 3.6: Probability sampling in practice and non-probability sampling are both large, heterogeneous categories. In general, there is a cost-error trade-off with non-probability sampling being lower cost but higher error. However, well-done non-probability sampling can produce better estimates than poorly-done probability sampling. In the future, I expect that non-probability sampling will get better and cheaper while probability sampling will get worse and more expensive.