4.2 What are experiments?

You are reading the Open Review Edition of Bit by Bit. Click here to read the 1st Edition.

4.2 What are experiments?

Randomized controlled experiments have four main ingredients: recruitment of participants, randomization of treatment, delivery of treatment, and measurement of outcomes.

Randomized controlled experiments can take many forms and can be used to study many types of behavior. But, at their core, randomized controlled experiments have four main ingredients: recruitment of participants, randomization of treatment, delivery of treatment, and measurement of outcomes. The digital age does not change the fundamental nature of experimentation, but it does make them easier logistically. For example, in the past it might have been difficult to measure the behavior of millions of people, but that is now routinely happening in many digital systems. Researchers who can figure out how to harness these new opportunities will be able to run experiments that were impossible previously.

To make this all a bit more concrete—both what has stayed the same and what has changed—let’s consider Michael Restivo and Arnout van de Rijt’s (2012). The researchers wanted to understand the effect of informal peer rewards on editorial contributions to Wikipedia. In particular, they studied the effects of barnstars, an award that any Wikipedian can give to any other Wikipedian to acknowledge hard work and due diligence. Restivo and van de Rijt gave barnstars to 100 deserving Wikipedians. Then, Restivo and van de Rijt tracked the recipients’ subsequent contributions to Wikipedia over the next 90 days. Much to their surprise, the people to whom they awarded barnstars tended to make fewer edits after receiving one. In other words, the barnstars seemed to be discouraging rather than encouraging contribution.

Fortunately, Restivo and van de Rijt were not running a “perturb and observe” experiment; they were running a randomized controlled experiment. So, in addition to choosing 100 top contributors to receive a barnstar, they also picked 100 top contributors to whom they did not give a barnstar. These hundred served as a control group, and who got a barnstar and who didn’t was determined randomly. When Restivo and van de Rijt looked at the control group they found that it had a steep drop in contributions too. Finally, when the researchers compared people in the treatment group (i.e., received barnstars) and people in the control group, they found that the barnstar caused editors to contribute about 60% more. But, this increase in contribution was taking place as part of an overall decline in both groups.

As this study illustrates, the control group in experiments is critical in a way that is somewhat paradoxical. In order to precisely measure the effect of barnstars, Restivo and van der Rijt needed to observe people that did not receive barnstars. Many times researchers who are not familiar with experiments fail to appreciate the incredible value of the control group. If Restivo and van de Rijt didn’t have a control group, they would have drawn exactly the wrong conclusion. Control groups are so important that the CEO of a major casino company has said that there are only three ways that employees can be fired from his company: theft, sexual harassment, and running an experiment without a control group (Schrage 2011).

Restivo and van de Rijt’s study illustrates the four main ingredients of an experiment: recruitment, randomization, intervention, and outcomes. Together, these four ingredients allow scientists to move beyond correlations and measure the causal effect of treatments. Specifically, randomization means that when you compare outcomes for the treatment and control groups you get an estimate of the causal effect of that intervention for that set of participants. In other words, with a randomized controlled experiment you can be sure that any differences in outcomes are caused by the intervention and not a confounder, a claim that I make precise in the Technical Appendix using the potential outcomes framework.

In addition to being a nice illustration of the mechanics of experiments, Restivo and van de Rijt’s study also shows that the logistics of digital experiments can be completely different from analog experiments. In Restivo and van de Rijt’s experiment, it was easy to give the barnstar to anyone in the world and it was easy to track the outcome—number of edits—over an extended period of time (because edit history is automatically recorded by Wikipedia). This ability to deliver treatments and measure outcomes at no cost is qualitatively unlike experiments in the past. Although this experiment involved 200 people, it could have been run with 2,000 or 20,000 people. The main thing preventing the researchers from scaling up their experiment by a factor of 100 was not cost, it was ethics. That is, Restivo and van de Rijt didn’t want to give barnstars to undeserving editors and they didn’t want their experiment to disrupt the Wikipedia community (Restivo and Rijt 2012; Restivo and Rijt 2014). So, although the experiment of Restivo and van de Rijt is relatively simple, it clearly shows that some things about experiments have stayed the same and some have changed. In particular, the basic logic of experimentation is the same, but the logistics have changed. Next, in order to more clearly isolate the opportunities created by this change, I’ll compare the experiments that researchers can do now to the kinds of experiments that have been done in the past.