4.5.1.1 Use existing environments

You are reading the Open Review Edition of Bit by Bit. Click here to read the 1st Edition.

4.5.1.1 Use existing environments

You can run experiments inside existing environments, often without any coding or partnership.

Logistically, the easiest way to do digital experiments is to overlay your experiment on top of an existing environment, enabling you to run a digital field experiment. These experiments can be run at a reasonably large scale and don’t require partnership with a company or extensive software development.

For example, Jennifer Doleac and Luke Stein (2013) took advantage of an online marketplace (e.g., craigslist) to run an experiment that measured racial discrimination. Doleac and Stein advertised thousands of iPods, and by systematically varying the characteristics of the seller, they were able to study the effect of race on economic transactions. Further, Doleac and Stein used the scale of their experiment to estimate when the effect is bigger (heterogeneity of treatment effects) and offer some ideas about why the effect might occur (mechanisms).

Prior to the study of Doleac and Stein, there had been two main approaches to experimentally measuring discrimination. In correspondence studies researchers create resumes of fictional people of different races and use these resumes to, for example, apply for different jobs. Bertrand and Mullainathan’s (2004) paper with the memorable title “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination” is a wonderful illustration of a correspondence study. Correspondence studies have relatively low cost per observation, which enables a single researcher to collect thousands of observations in a typical study. But, correspondence studies of racial discrimination have been questioned because names potentially signal many things in addition to the race of the applicant. That is, names such as Greg, Emily, Lakisha, and Jamal may signal social class in addition to race. Thus, any difference in treatment for resumes of Greg’s and Jamal’s might be due to more than presumed race differences of the applicants. Audit studies, on the other hand, involve hiring actors of different races to apply in person for jobs. Even though audit studies provide a clear signal of applicant race, they are extremely expensive per observation, which means that they typically only have hundreds of observations.

In their digital field experiment, Doleac and Stein were able to create an attractive hybrid. They were able to collect data at relatively low cost per observation—resulting in thousands of observations (as in a correspondence study)—and they were able to signal race using photographs—resulting in a clear uncounfounded signal of race (as in an audit study). Thus, the online environment sometimes enables researchers to create new treatments that have properties that are hard to construct otherwise.

The iPod advertisements of Doleac and Stein varied along three main dimensions. First, they varied the characteristics of the seller, which was signaled by the hand photographed holding the iPod [white, black, white with tattoo] (Figure 4.12). Second, they varied the asking price [$90, $110, $130]. Third, they varied the quality of the ad text [high-quality and low-quality (e.g., cApitalization errors and spelin errors)]. Thus, the authors had a 3 X 3 X 2 design which was deployed across more than 300 local markets ranging from towns (e.g., Kokomo, IN and North Platte, NE) to mega-cities (e.g., New York and Los Angeles).

Figure 4.12: Hands used in the experiment of Doleac and Stein (2013). iPods were sold by sellers with different characteristics to measure discrimination in an online marketplace.

Averaged across all conditions, the outcomes were better for the white seller than the black seller, with the tattooed seller having intermediate results. For example, white sellers received more offers and had higher final sale prices. Beyond these average effects, Doleac and Stein estimated the heterogeneity of effects. For example, one prediction from earlier theory is that discrimination would be less in markets that are more competitive. Using the number of offers received as a proxy for market competition, the authors found that black sellers do indeed receive worse offers in markets with a low degree of competition. Further, by comparing outcomes for the ads with high-quality and low-quality text, Doleac and Stein found that ad quality does not impact the disadvantage faced by black and tattooed sellers. Finally, taking advantage of the fact that advertisements were placed in more than 300 markets, the authors find that black sellers are more disadvantaged in cities with high crime rates and high residential segregation. None of these results give us a precise understanding of exactly why black sellers had worse outcomes, but, when combined with the results of other studies, they can begin to inform theories about the causes of racial discrimination in different types of economic transactions.

Another example that shows the ability of researchers to conduct digital field experiments in existing systems is the research by Arnout van de Rijt and colleagues (2014) on the keys to success. In many aspects of life, seemingly similar people end up with very different outcomes. One possible explanation for this pattern is that small—and essentially random—advantages can lock-in and grow over time, a process that researchers call cumulative advantage. In order to determine whether small initial successes lock-in or fade away, van de Rijt and colleagues (2014) intervened into four different systems bestowing success on randomly selected participants, and then measured the long-term impacts of this arbitrary success.

More specifically, van de Rijt and colleagues 1) pledged money to randomly selected projects on kickstarter.com, a crowdfunding website; 2) positively rated randomly selected reviews on the website epinions; 3) gave awards to randomly chosen contributors to Wikipedia; and 4) signed randomly selected petitions on change.org. The researchers found very similar results across all four systems: in each case, participants that were randomly given some early success went on to have more subsequent success than their otherwise completely indistinguishable peers (Figure 4.13). The fact that the same pattern appeared in many systems increases the external validity of these results because it reduces the chance that this pattern is an artifact of any particular system.

Figure 4.13: Long-term effects of randomly bestowed success in four different social systems. Arnout van de Rijt and colleagues (2014) 1) pledged money to randomly selected projects on kickstarter.com, a crowdfunding website; 2) positively rated randomly selected reviews on the website epinions; 3) gave awards to randomly chosen contributors to Wikipedia; and 4) signed randomly selected petitions on change.org.

Together, these two examples show that researchers can conduct digital field experiments without the need to partner with companies or the need to build complex digital systems. Further, Table 4.2 provides even more examples that show the range of what is possible when researchers use the infrastructure of existing systems to deliver treatment and/or measure outcomes. These experiments are relatively cheap for researchers and they offer a high degree of realism. But, these experiments offer researchers limited control over the participants, treatments, and outcomes to be measured. Further, for experiments taking place in only one system, researchers need to be concerned that the effects could be driven by system-specific dynamics (e.g., the way that Kickstarter ranks projects or the way that change.org ranks petitions; for more information, see the discussion about algorithmic confounding in Chapter 2). Finally, when researchers intervene in working systems, tricky ethical questions emerge about possible harm to participants, non-participants, and systems. We will consider these ethical question in more detail in Chapter 6, and there is an excellent discussion of them in the appendix of van de Rijt (2014). The trade-offs that come with working in an existing system are not ideal for every project, and for that reason some researchers build their own experimental system, the topic of the next section.

Table 4.2: Examples of experiments in existing systems. These experiments seem to fall into three main categories, and this categorization may help you notice additional opportunities for your own research. First, there are experiments that involve selling or buying something (e.g., Doleac and Stein (2013)). Second, there are experiments that involve delivering a treatment to specific participants (e.g., Restivo and Rijt (2012)). Finally, there are experiments that involve delivering treatments to specific objects such as petitions (e.g., Vaillant et al. (2015)).
Topic	Citation
Effect of barnstars on contributions to Wikipedia	Restivo and Rijt (2012); Restivo and Rijt (2014); Rijt et al. (2014)
Effect of anti-harassment message on racist tweets	Munger (2016)
Effect of auction method on sale price	Lucking-Reiley (1999)
Effect of reputation on price in online auctions	Resnick et al. (2006)
Effect of race of seller on sale of baseball cards on eBay	Ayres, Banaji, and Jolls (2015)
Effect of race of seller on sale of iPods	Doleac and Stein (2013)
Effect of race of guest on Airbnb rentals	Edelman, Luca, and Svirsky (2016)
Effect of donations on the success of projects on Kickstarter	Rijt et al. (2014)
Effect of race and ethnicity on housing rentals	Hogan and Berry (2011)
Effect of positive rating on future ratings on epinions	Rijt et al. (2014)
Effect of signatures on the success of petitions	Vaillant et al. (2015); Rijt et al. (2014)