5.2 Human computation

Human computation projects take a big problem, break it into simple pieces, send them to many workers, and then aggregate the results.

Human computation projects combine the efforts of many people working on simple microtasks in order to solve problems that are impossibly big for one person. You might have a research problem suitable for human computation if you’ve ever thought: “I could solve this problem if I had a thousand research assistants.”

The prototypical example of a human computation project is Galaxy Zoo. In this project, more than one hundred thousand volunteers classified images of about a million galaxies with similar accuracy to earlier—and substantially smaller—efforts by professional astronomers. This increased scale provided by mass collaboration led to new discoveries about how galaxies form, and it turned up an entirely new class of galaxies called “Green Peas.”

Although Galaxy Zoo might seem far from social research, there are actually many situations where social researchers want to code, classify, or label images or texts. In some cases, this analysis can be done by computers, but there are still certain forms of analysis that are hard for computers but easy for people. It is these easy-for-people yet hard-for-computers microtasks that we can turn over to human computation projects.

Not only is the microtask in Galaxy Zoo quite general, but the structure of the project is general as well. Galaxy Zoo, and other human computation projects, typically use a split-apply-combine strategy (Wickham 2011), and once you understand this strategy you’ll be able to use it to solve lots of problems. First, a big problem is split into lots of little problem chunks. Then, human work is applied to each little problem chunk, independently of the other chunks. Finally, the results of this work are combined to produce a consensus solution. Given that background, let’s see how the split-apply-combine strategy was used in Galaxy Zoo.