5.2.2 Crowd-coding of political manifestos

Coding political manifestos, something typically done by experts, can be performed by a human computation project resulting in greater reproducibility and flexibility.

Similar to Galaxy Zoo, there are many situations where social researchers want to code, classify, or label an image or piece of text. An example of this kind of research is the coding of political manifestos. During elections, political parties produce manifestos describing their policy positions and guiding philosophies. For example, here’s a piece of the manifesto of the Labor Party in the United Kingdom from 2010:

“Millions of people working in our public services embody the best values of Britain, helping empower people to make the most of their own lives while protecting them from the risks they should not have to bear on their own. Just as we need to be bolder about the role of government in making markets work fairly, we also need to be bold reformers of government.”

These manifestos contain valuable data for political scientists, particularly those studying elections and the dynamics of policy debates. In order to systematically extract information from these manifestos, researchers created The Manifesto Project, which collected 4,000 manifestos from nearly 1,000 parties in 50 countries and then organized political scientists to systematically code them. Each sentence in each manifesto was coded by an expert using a 56-category scheme. The result of this collaborative effort is a massive dataset summarizing the information embedded in these manifestos, and this dataset has been used in more than 200 scientific papers.

Kenneth Benoit and colleagues (2016) decided to take the manifesto coding task that had previously been performed by experts and turn it into a human computation project. As a result, they created a coding process that is more reproducible and more flexible, not to mention cheaper and faster.

Working with 18 manifestos generated during six recent elections in the United Kingdom, Benoit and colleagues used the split-apply-combine strategy with workers from a microtask labor market (Amazon Mechanical Turk and CrowdFlower are examples of microtask labor markets; for more on such markets, see Chapter 4). The researchers took each manifesto and split it into sentences. Next, a person applied the coding scheme to each sentence. In particular, readers were asked to classify each sentence as referring to economic policy (left or right), to social policy (liberal or conservative), or to neither (figure 5.5). Each sentence was coded by about five different people. Finally, these ratings were combined using a statistical model that accounted for both individual-rater effects and difficulty-of-sentence effects. In all, Benoit and colleagues collected 200,000 ratings from about 1,500 people.

Figure 5.5: Coding scheme from Benoit et al. (2016). Readers were asked to classify each sentence as referring to economic policy (left or right), to social policy (liberal or conservative), or to neither. Adapted from Benoit et al. (2016), figure 1.

Figure 5.5: Coding scheme from Benoit et al. (2016). Readers were asked to classify each sentence as referring to economic policy (left or right), to social policy (liberal or conservative), or to neither. Adapted from Benoit et al. (2016), figure 1.

In order to assess the quality of the crowd coding, Benoit and colleagues also had about 10 experts—professors and graduate students in political science—rate the same manifestos using a similar procedure. Although the ratings from members of the crowd were more variable than the ratings from the experts, the consensus crowd rating had remarkable agreement with the consensus expert rating (figure 5.6). This comparison shows that, as with Galaxy Zoo, human computation projects can produce high-quality results.

Figure 5.6: Expert estimates (x-axis) and crowd estimates (y-axis) were in remarkable agreement when coding 18 party manifestos from the United Kingdom (Benoit et al. 2016). The manifestos coded were from three political parties (Conservative, Labour, and Liberal Democrats) and six elections (1987, 1992, 1997, 2001, 2005, and 2010). Adapted from Benoit et al. (2016), figure 3.

Figure 5.6: Expert estimates (\(x\)-axis) and crowd estimates (\(y\)-axis) were in remarkable agreement when coding 18 party manifestos from the United Kingdom (Benoit et al. 2016). The manifestos coded were from three political parties (Conservative, Labour, and Liberal Democrats) and six elections (1987, 1992, 1997, 2001, 2005, and 2010). Adapted from Benoit et al. (2016), figure 3.

Building on this result, Benoit and colleagues used their crowd-coding system to do research that was impossible with the expert-run coding system used by the Manifesto Project. For example, the Manifesto Project did not code the manifestos on the topic of immigration because that was not a salient topic when the coding scheme was developed in the mid-1980s. And, at this point, it is logistically infeasible for the Manifesto Project to go back and recode their manifestos to capture this information. Therefore, it would appear that researchers interested in studying the politics of immigration are out of luck. However, Benoit and colleagues were able to use their human computation system to do this coding—customized to their research question—quickly and easily.

In order to study immigration policy, they coded the manifestos for eight parties in the 2010 general election in the United Kingdom. Each sentence in each manifesto was coded as to whether it related to immigration, and if so, whether it was pro-immigration, neutral, or anti-immigration. Within 5 hours of launching their project, the results were in. They had collected more than 22,000 responses at a total cost of $360. Further, the estimates from the crowd showed remarkable agreement with an earlier survey of experts. Then, as a final test, two months later, the researchers reproduced their crowd-coding. Within a few hours, they had created a new crowd-coded dataset that closely matched their original crowd-coded data set. In other words, human computation enabled them to generate coding of political texts that agreed with expert evaluations and was reproducible. Further, because the human computation was quick and cheap, it was easy for them to customize their data collection to their specific research question about immigration.