5.3.1 Netflix Prize

The Netflix Prize uses open call to predict which movies people will like.

The most well known open call project is the Netflix Prize. Netflix is an online movie rental company, and in 2000 it launched Cinematch, a service to recommend movies to customers. For example, Cinematch might notice that you liked Star Wars and The Empire Strikes Back and then recommend that you watch Return of the Jedi. Initially, Cinematch worked poorly. But, over the course of many years, it continued to improve its ability to predict what movies customers would enjoy. By 2006, however, progress on Cinematch had plateaued. The researchers at Netflix had tried pretty much everything they could think of, but, at the same time, they suspected that there were other ideas that might help them improve their system. Thus, they came up with what was, at the time, a radical solution: an open call.

Critical to the eventual success of the Netflix Prize was how the open call was designed, and this design has important lessons for how open calls can be used for social research. Netflix did not just put out an unstructured request for ideas, which is what many people imagine when they first consider an open call. Rather, Netflix posed a clear problem with a simple evaluation procedure: they challenged people to use a set of 100 million movie ratings to predict 3 million held-out ratings (ratings that users had made but that Netflix did not release). The first person to create an algorithm that predicted the 3 million held-out ratings 10% better than Cinematch would win a million dollars. This clear and easy to apply evaluation procedure—comparing predicted ratings with held-out ratings—meant that the Netflix Prize was framed in such a way that solutions were easier to check than generate; it turned the challenge of improving Cinematch into a problem suitable for an open call.

In October of 2006, Netflix released a dataset containing 100 million movie ratings from about about 500,000 customers (we will consider the privacy implications of this data release in chapter 6). The Netflix data can be conceptualized as a huge matrix that is approximately 500,000 customers by 20,000 movies. Within this matrix, there were about 100 million ratings on a scale from one to five stars (table 5.2). The challenge was to use the observed data in the matrix to predict the 3 million held-out ratings.

Table 5.2: Schematic of Data from the Netflix Prize
Movie 1 Movie 2 Movie 3 Movie 20,000
Customer 1 2 5 ?
Customer 2 2 ? 3
Customer 3 ? 2
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
Customer 500,000 ? 2 1

Researchers and hackers around the world were drawn to the challenge, and by 2008 more than 30,000 people were working on it (Thompson 2008). Over the course of the contest, Netflix received more than 40,000 proposed solutions from more than 5,000 teams (Netflix 2009). Obviously, Netflix could not read and understand all these proposed solutions. The whole thing ran smoothly, however, because the solutions were easy to check. Netflix could just have a computer compare the predicted ratings with the held-out ratings using a prespecified metric (the particular metric they used was the square root of the mean squared error). It was this ability to quickly evaluate solutions that enabled Netflix to accept solutions from everyone, which turned out to be important because good ideas came from some surprising places. In fact, the winning solution was submitted by a team started by three researchers who had no prior experience building movie recommendation systems (Bell, Koren, and Volinsky 2010).

One beautiful aspect of the Netflix Prize is that it enabled all the proposed solutions to be evaluated fairly. That is, when people uploaded their predicted ratings, they did not need to upload their academic credentials, their age, race, gender, sexual orientation, or anything about themselves. The predicted ratings of a famous professor from Stanford were treated exactly the same as those from a teenager in her bedroom. Unfortunately, this is not true in most social research. That is, for most social research, evaluation is very time-consuming and partially subjective. So, most research ideas are never seriously evaluated, and when ideas are evaluated, it is hard to detach those evaluations from the creator of the ideas. Open call projects, on the other hand, have easy and fair evaluation so they can discover ideas that would be missed otherwise.

For example, at one point during the Netflix Prize, someone with the screen name Simon Funk posted on his blog a proposed solution based on a singular value decomposition, an approach from linear algebra that had not been used previously by other participants. Funk’s blog post was simultaneously technical and weirdly informal. Was this blog post describing a good solution or was it a waste of time? Outside of an open call project, the solution might never have received serious evaluation. After all, Simon Funk was not a professor at MIT; he was a software developer who, at the time, was backpacking around New Zealand (Piatetsky 2007). If he had emailed this idea to an engineer at Netflix, it almost certainly would not have been read.

Fortunately, because the evaluation criteria were clear and easy to apply, his predicted ratings were evaluated, and it was instantly clear that his approach was very powerful: he rocketed to fourth place in the competition, a tremendous result given that other teams had already been working for months on the problem. In the end, parts of his approach were used by virtually all serious competitors (Bell, Koren, and Volinsky 2010).

The fact that Simon Funk chose to write a blog post explaining his approach, rather than trying to keep it secret, also illustrates that many participants in the Netflix Prize were not exclusively motivated by the million-dollar prize. Rather, many participants also seemed to enjoy the intellectual challenge and the community that developed around the problem (Thompson 2008), feelings that I expect many researchers can understand.

The Netflix Prize is a classic example of an open call. Netflix posed a question with a specific goal (predicting movie ratings) and solicited solutions from many people. Netflix was able to evaluate all these solutions because they were easier to check than to create, and ultimately Netflix picked the best solution. Next, I’ll show you how this same approach can be used in biology and law, and without a million-dollar prize.