5.5.5 Be ethical

The exhortation to be ethical applies to all the research described in this book. In addition to the more general issues of ethics—discussed in chapter 6—some specific ethical issues arise in the case of mass collaboration projects, and since mass collaboration is so new to social research, these problems might not be fully apparent at first.

In all mass collaboration projects, issues of compensation and credit are complex. For example, some people consider it unethical that thousands of people worked for years on the Netflix Prize and ultimately received no compensation. Similarly, some people consider it unethical to pay workers on microtask labor markets extremely small amounts of money. In addition to these issues of compensation, there are related issues of credit. Should all participants in a mass collaboration be authors of the eventual scientific papers? Different projects take different approaches. Some projects give authorship credit to all members of the mass collaboration; for example, the final author of the first Foldit paper was “Foldit players” (Cooper et al. 2010). In the Galaxy Zoo family of projects, extremely active and important contributors are sometimes invited to be coauthors on papers. For example, Ivan Terentev and Tim Matorny, two Radio Galaxy Zoo participants, were coauthors on one of the papers that arose from that project (Banfield et al. 2016; Galaxy Zoo 2016). Sometimes projects merely acknowledge contributions without co-authorship. Decisions about coauthorship will obviously vary from case to case.

Open calls and distributed data collection can also raise complex questions about consent and privacy. For example, Netflix released customers movie ratings to everyone. Although movie ratings might not appear sensitive, they can reveal information about customers’ political preferences or sexual orientation, information that customers did not agree to make public. Netflix attempted to anonymize the data so that the ratings could not be linked to any specific individual, but just weeks after the release of the Netflix data it was partially re-identified by Arvind Narayanan and Vitaly Shmatikov (2008) (see chapter 6). Further, in distributed data collection, researchers could collect data about people without their consent. For example, in the Malawi Journals Projects, conversations about a sensitive topic (AIDS) were transcribed without the consent of the participants. None of these ethical problems are insurmountable, but they should be considered in the design phase of a project. Remember, your “crowd” is made up of people.