2.5 Conclusion

Big data is everywhere, but using it and other forms of observational data for social research is difficult. In my experience there is something like a no free lunch property for research: if you don’t put in a lot of work collecting data, then you are probably going to have to put in a lot of work analyzing your data or in thinking about what is in an interesting question to ask of the data. Based on the ideas in this chapter, I think that there are three main ways that big data sources will be most valuable for social research:

  • empirically adjudicating between competing theoretical predictions. Examples of this kind of work include Farber (2015) (New York Taxi drivers) and King, Pan, and Roberts (2013) (Censorship in China)
  • improved social measurement for policy through nowcasting. An example of this kind of work is Ginsberg et al. (2009) (Google Flu Trends).
  • estimating causal effects with natural experiments and matching. Examples of this kind of work. Mas and Moretti (2009) (peer effects on productivity) and Einav et al. (2015) (effect of starting price on auctions at eBay).

Many important questions in social research could be expressed as one of these three. However, these approaches generally require researchers to bring a lot to the data. What makes Farber (2015) interesting is the theoretical motivation for the measurement. This theoretical motivation comes from outside the data. Thus, for those who are good at asking certain types of research questions, big data sources can be very fruitful.

Finally, rather than theory-driven empirical research (which has been the focus on this chapter), we can flip the script and create empirically-driven theorizing. That is, through the careful accumulation of empirical facts, patterns, and puzzles, we can build new theories.

This alternative, data-first approach to theory is not new, and it was most forcefully articulated by Glaser and Strauss (1967) with their call for grounded theory. This data-first approach, however, does not imply “the end of theory,” as has been claimed by much of the journalism around research in the digital age (Anderson 2008). Rather, as the data environment changes, we must expect a re-balancing in the relationship between theory and data. In a world where data collection was expensive, it makes sense to only collect the data that theories suggest will be the most useful. But, in a world where enormous amounts of data are already available for free, it makes sense to also try a data-first approach (Goldberg 2015).

As I have shown in this chapter, researchers can learn a lot by watching people. In the next three chapters, I’ll describe how we can learn more and different things if we tailor our data collection and interact with people more directly by asking them questions (Chapter 3), running experiments (Chapter 4), and even involving them in the research process directly (Chapter 5).