5.2 Human computation

Human computation projects take a big problem; break it into simple pieces; send them to many workers; and then aggregate the results.

Human computation projects combine the efforts of many people working on simple micro-tasks in order to solve problems that are impossibly big for one person. You might have a research problem suitable for human computation if you’ve ever thought: I could solve this problem if I had a thousand research assistants.

The prototypical example of a human computation project is Galaxy Zoo, which I’ll describe in detail below. In this project, more than 100,000 volunteers classified images of about 1,000,000 galaxies with similar accuracy to earlier—and substantially smaller—efforts by professional astronomers. This increased scale provided by mass collaboration lead to new discoveries about how galaxies form, and it turned up an entirely new class of galaxies called “Green Peas.”

Although Galaxy Zoo might seem far from social research, there are actually many situations where social researchers want to code, classify, or label images or texts. In some cases, this analysis can be done by computers, but there are still certain forms of analysis that are hard for computers but easy for people. It is these easy-for-people yet hard-for-computers micro-tasks that we can turn over to human computation projects.

Not only is the micro-task in Galaxy Zoo quite general, the structure of project is general as well. Galaxy Zoo, and other human computation projects, use a split-apply-combine strategy (Wickham 2011), and once you understand this strategy you’ll be able to use it to solve lots of problems. First, a big problem is split into lots of little problem chunks. Then, human work is applied to each little problem chunk, independent of the other chunks. Finally, the results of this work are combined to produce a consensus solution. Given that background, let’s see how the split-apply-combine strategy was used in Galaxy Zoo.