Let’s move beyond simple experiments. Three concepts are useful for rich experiments: validity, heterogeneity of treatment effects, and mechanism.
Researchers who are new to experiments often focus on a very specific, narrow question: does this treatment “work”? For example, does a phone call from a volunteer encourage someone to vote? Does changing a website button from blue to green increase click-through rate? Unfortunately, loose phrasing about what “works” obscures the fact that narrowly focused experiments don’t really tell you whether a treatment “works” in a general sense. Rather, narrowly focused experiments answer a much more specific question: what is the average effect of this specific treatment with this specific implementation for this population of participants at this time? I’ll call experiments that focus on this narrow question simple experiments.
Simple experiments can provide valuable information, but they fail to answer many questions that are both important and interesting such as: are there some people for whom the treatment had a larger or smaller effect?; is there another treatment that would be more effective?; and how does this experiment relate to broader social theories?
In order to show the value of moving beyond simple experiments, let’s consider one of my favorite analog field experiments, a study by P. Wesley Schultz and colleagues on the relationship between social norms and energy consumption (Schultz et al. 2007). Schultz and colleagues hung doorhangers on 300 households in San Marcos, California, and these doorhangers delivered different messages designed to encourage energy conservation. Then, Schultz and colleagues measured the effect of these messages on electricity consumption, both after one week and three weeks; see Figure 4.3 for a more detailed description of the experimental design.
The experiment had two conditions. In the first condition, households received general energy saving tips (e.g., use fans instead of air conditioners) and information about their household’s energy usage compared to the average of the energy usage in their neighborhood. Schultz and colleagues called this the descriptive normative condition because the information about the energy use in their neighborhood provided information about typical behavior (i.e., a descriptive norm). When Schultz and colleagues looked at the resulting energy usage in this group, the treatment appeared to have no effect, either in the short-term or the long-term; in other words, the treatment didn’t seem to “work” (Figure 4.4).
But, fortunately, Schultz et al. (2007) did not settle for this simplistic analysis. Before the experiment began they reasoned that heavy users of electricity—people above the mean—might reduce their consumption, and that light users of electricity—people below the mean—might actually increase their consumption. When they looked at the data, that’s exactly what they found (Figure 4.4). Thus, what looked like a treatment that was having no effect was actually a treatment that had two offsetting effects. The researchers called this counter-productive increase among the light users a boomerang effect.
Further, Schultz and colleagues anticipated this possibility, and in the second condition they deployed a slightly different treatment, one explicitly designed to eliminate the boomerang effect. The households in the second condition received the exact same treatment—general energy saving tips and information about their household’s energy usage compared to the their neighborhood—with one tiny addition: for people with below-average consumption, the researchers added a :) and for people with above-average consumption they added a :(. These emoticons were designed to trigger what the researchers called injunctive norms. Injunctive norms refer to perceptions of what is commonly approved (and disapproved) whereas descriptive norms refer to perceptions of what is commonly done (Reno, Cialdini, and Kallgren 1993).
By adding this one tiny emoticon, the researchers dramatically reduced the boomerang effect (Figure 4.4). Thus, by making this one simple change—a change that was motivated by an abstract social psychological theory (Cialdini, Kallgren, and Reno 1991)—the researchers were able to turn a program from one that didn’t seem to work into one that worked, and, simultaneously, they were able to contribute to the general understanding of how social norms affect human behavior.
At this point, however, you might notice that something is a bit different about this experiment. In particular, the experiment of Schultz and colleagues doesn’t really have a control group in the same way that randomized controlled experiments do. The comparison between this design and the design of Restivo and van de Rijt illustrates the differences between two major designs used by researchers. In between-subjects designs, such as Restivo and van de Rijt, there is a treatment group and a control group, and in within-subjects designs the behavior of participants is compared before and after the treatment (Greenwald 1976; Charness, Gneezy, and Kuhn 2012). In a within-subject experiment it is as if each participant acts as her own control group. The strength of between-subjects designs is that it provides protection against confounders (as I described earlier), and the strength of within-subjects experiments is increased precision in estimates. When each participant acts as their own control, between-participant variation is eliminated (see Technical Appendix). To foreshadow an that will come later when I offer advice about designing digital experiments, there is a final design, called a mixed design, that combines the improved precision of within-subjects designs and the protection against confounding of between-subjects designs.
Overall, the design and results of Schultz et al. (2007) show the value of moving beyond simple experiments. Fortunately, you don’t need to be a genius to create experiments like this. Social scientists have developed three concepts that will guide you toward richer and more creative experiments: 1) validity, 2) heterogeneity of treatment effects, and 3) mechanisms. That is, if you keep these three ideas in mind while you are designing your experiment, you will naturally create more interesting and useful experiments. In order to illustrate these three concepts in action, I’ll describe a number of follow-up partially digital field experiments that built on the elegant design and exciting results in Schultz et al. (2007). As you will see, through more careful design, implementation, analysis, and interpretation, you too can move beyond simple experiments.