5.3.4 Conclusion

Open calls let many experts and non-experts propose solutions to problems where solutions are easier to check than generate.

In all three open call projects—Netflix Prize, Foldit, Peer-to-Patent—researchers posed questions of a specific form, solicited solutions, and then picked the best solutions. The researchers didn’t even need to know the best expert to ask, and sometimes the good ideas came from unexpected places.

Now I can also highlight two important differences between open call projects and human computation projects. First, in open call projects the researcher specifies a goal (e.g., predicting movie ratings) whereas in human computation the research specifies a micro-task (e.g., classifying a galaxy). Second, in open calls the researchers wanted the best contribution—the best algorithm for predicting movie ratings, the lowest-energy configuration of a protein, or the most relevant piece of prior art—not some kind of simple combination of all of the contributions.

Given the general template for open calls and these three examples, what kinds of problems in social research might be suitable to this approach? At this point, I should acknowledge that there have not been many successful examples yet (for reasons that I’ll explain in a moment). In terms of direct analogues, one could imagine that a Peer-to-Patent style project being used by a historical researcher searching for the earliest document to mention a specific person or idea. An open call approach to this kind of problem could be especially valuable when the relevant documents are not collected in a single archive but are widely distributed.

More generally, many governments have problems that might be amenable to open calls because they are about creating predictions that can be used to guide action (Kleinberg et al. 2015). For example, just as Netflix wanted to predict ratings on movies, governments might want to predict outcomes such as which restaurants are most likely to have health code violations in order to allocate inspection resources more efficiently. Motivated by this kind of problem, Glaeser et al. (2016) used an open call to help the City of Boston predict restaurant hygiene and sanitation violations based on data from Yelp reviews and historical inspection data. Glaeser and colleagues estimate that the predictive model that won the open call would improve the productivity of restaurant inspectors by about 50%. Businesses also have problems with a similar structure such as predicting customer churn (Provost and Fawcett 2013).

Finally, in addition to open calls that involve outcomes that have already happened in a particular data set (e.g., predicting health code violations using data on past health code violations), one could imagine predicting outcomes that have not happened yet for anyone in the dataset. For example, the Fragile Families and Child Wellbeing study has tracked about 5,000 children since birth in 20 different US cities (Reichman et al. 2001). Researchers have collected data about these children, their families, and their broader environment at birth and at ages 1, 3, 5, 9, and 15. Given all the information about these children, how well could researchers predict outcomes such as who will graduate from college? Or, expressed in a way that would be more interesting to many researchers, which data and theories would be most effective in predicting these outcomes? Since none of these children are currently old enough to go to college, this would be a true forward-looking prediction and there are many different strategies that researchers might employ. A researcher who believes that neighborhoods are critical in shaping life outcomes might take one approach while a researcher who focuses on families might do something completely different. Which of these approaches would work better? We don’t know, and in the process of finding out we might learn something important about families, neighborhoods, education, and social inequality. Further, these predictions might be used to guide future data collection. Imagine that there were a small number of college graduates that were not predicted to graduate by any of the models; these people would be ideal candidates for follow-up qualitative interviews and ethnographic observation. Thus, in this kind of open call, the predictions are not the end; rather, they provide a new way to compare, enrich, and combine different theoretical traditions. This kind of open call is not specific to using data from Fragile Families to predict who will go to college; it could be used to predict any outcome that will eventually be collected in any longitudinal social data set.

As I wrote earlier in this section, there have not been many examples of social researchers using open calls. I think that this is because open calls are not well suited to the way that social scientists typically frame their questions. Returning to the Netflix Prize, social scientists wouldn’t usually ask about predicting tastes, they would ask about how and why cultural tastes differ for people from different social classes (Bourdieu 1987). Such “how” and “why” question do not lead to easy to verify solutions, and therefore seem poorly fit to open calls. Thus, it appears that open calls are more amenable to question of prediction than questions of explanation; for more on the distinction between prediction and explanation see Breiman (2001). Recent theorists, however, have called on social scientists to reconsider the dichotomy between explanation and prediction (Watts 2014). As the line between prediction and explanation blurs, I expect that open contests will become increasingly common in the social sciences.