4.4.2 Heterogeneity of treatment effects

Experiments normally measure the average effect, but the effect is probably not the same for everyone.

The second key idea for moving beyond simple experiments is heterogeneity of treatment effects. The experiment of Schultz et al. (2007) powerfully illustrates how the same treatment can have a different effect on different kinds of people (figure 4.4). In most analog experiments, however, researchers focused on average treatment effects because there were a small number of participants and little was known about them. In digital experiments, however, there are often many more participants and more is known about them. In this different data environment, researchers who continue to estimate only average treatment effects will miss out the ways in which estimates about the heterogeneity of treatment effects can provide clues about how a treatment works, how it can be improved, and how it can be targeted to those most likely to benefit.

Two examples of heterogeneity of treatment effects come from additional research on the Home Energy Reports. First, Allcott (2011) used the large sample size (600,000 households) to further split the sample and estimate the effect of the Home Energy Report by decile of pre-treatment energy usage. While Schultz et al. (2007) found differences between heavy and light users, Allcott (2011) found that there were also differences within the heavy- and light-user group. For example, the heaviest users (those in the top decile) reduced their energy usage twice as much as someone in the middle of the heavy-user group (figure 4.8). Further, estimating the effect by pre-treatment behavior also revealed that there was no boomerang effect, even for the lightest users (figure 4.8).

Figure 4.8: Heterogeneity of treatment effects in Allcott (2011). The decrease in energy use was different for people in different deciles of baseline usage. Adapted from Allcott (2011), figure 8.

In a related study, Costa and Kahn (2013) speculated that the effectiveness of the Home Energy Report could vary based on a participant’s political ideology and that the treatment might actually cause people with certain ideologies to increase their electricity use. In other words, they speculated that the Home Energy Reports might be creating a boomerang effect for some types of people. To assess this possibility, Costa and Kahn merged the Opower data with data purchased from a third-party aggregator that included information such as political party registration, donations to environmental organizations, and household participation in renewable energy programs. With this merged dataset, Costa and Kahn found that the Home Energy Reports produced broadly similar effects for participants with different ideologies; there was no evidence that any group exhibited boomerang effects (figure 4.9).

Figure 4.9: Heterogeneity of treatment effects in Costa and Kahn (2013). The estimated average treatment effect for the entire sample is -2.1% [-1.5%, -2.7%]. After combining information from the experiment with information about the households, Costa and Kahn (2013) used a series of statistical models to estimate the treatment effect for very specific groups of people. Two estimates are presented for each group because the estimates depend on the covariates they included in their statistical models (see models 4 and 6 in tables 3 and 4 in Costa and Kahn (2013)). As this example illustrates, treatment effects can be different for different people and estimates of treatment effects that come from statistical models can depend on the details of those models (Grimmer, Messing, and Westwood 2014). Adapted from Costa and Kahn (2013), tables 3 and 4.

As these two examples illustrate, in the digital age, we can move from estimating average treatment effects to estimating the heterogeneity of treatment effects because we can have many more participants and we know more about those participants. Learning about heterogeneity of treatment effects can enable targeting of a treatment where it is most effective, provide facts that stimulate new theory development, and provide hints about possible mechanisms, the topic to which I now turn.