4.4.2 Heterogeneity of treatment effects

Experiments normally measure the average effect, but the effect can be different for different people.

The second key idea for moving beyond simple experiments is heterogeneity of treatment effects. The experiment of Schultz et al. (2007) powerfully illustrates how the same treatment can have different effects on different kinds of people (Figure 4.4), but this analysis of heterogeneity is actually quite unusual for an analog age experiment. Most analog age experiments involve a small number of participants that are treated as interchangeable “widgets” because little about them is known pre-treatment. In digital experiments, however, these data constraints are less common because researchers tend to have more participants and know more about them. In this different data environment, we can estimate heterogeneity of treatment effects in order to provide clues about how the treatment works, how it can be improved, and how it can be targeted to those mostly likely to benefit.

Two examples of heterogeneity of treatment effects in the context of social norms and energy use come from additional research on the Home Energy Reports. First, Allcott (2011) used the large sample size (600,000 households) to further split the sample and estimate the effect of the Home Energy Report by decile of pre-treatment energy usage. While Schultz et al. (2007) found differences between heavy and light users, Allcott (2011) found that there were also differences within the heavy and light user group. For example, the heaviest users (those in the top decile) reduced their energy usage twice as much as someone in the middle of the heavy user group (Figure 4.7). Further, estimating the effect by pre-treatment behavior also revealed that there was not a boomerang effect even for the lightest users (Figure 4.7).

Figure 4.7: Heterogeneity of treatment effects in Allcott (2011). The decrease in energy use was different for people in different deciles of baseline usage.

Figure 4.7: Heterogeneity of treatment effects in Allcott (2011). The decrease in energy use was different for people in different deciles of baseline usage.

In a related study, Costa and Kahn (2013) speculated that the effectiveness of the Home Energy Report could vary based on a participant’s political ideology and that the treatment might actually cause people with certain ideologies to increase their electricity use. In other words, they speculated that the Home Energy Reports might be creating a boomerang effect for some types of people. To assess this possibility, Costa and Kahn merged the Opower data with data purchased from a third-party aggregator that included information such as political party registration, donations to environment organizations, and household participation in renewable energy programs. With this merged dataset, Costa and Kahn found that the Home Energy Reports produced broadly similar effects for participants with different ideologies; there was no evidence that any group exhibited boomerang effects (Figure 4.8).

Figure 4.8: Heterogeneity of treatment effects in Costa and Kahn (2013). The estimated average treatment effect for the entire sample is -2.1% [-1.5%, -2.7%]. By combining information from the experiment with information about the households, Costa and Kahn (2013) used a series of statistical models to estimate the treatment effect for very specific groups of people. Two estimates are presented for each group because the estimates depend on the covariates they included in their statistical models (see model 4 and model 6 in Table 3 and Table 4 in Costa and Kahn (2013)). As this example illustrates, treatment effects can be different for different people and estimates of treatment effects that come from statistical models can depend on the details of those models (Grimmer, Messing, and Westwood 2014).

Figure 4.8: Heterogeneity of treatment effects in Costa and Kahn (2013). The estimated average treatment effect for the entire sample is -2.1% [-1.5%, -2.7%]. By combining information from the experiment with information about the households, Costa and Kahn (2013) used a series of statistical models to estimate the treatment effect for very specific groups of people. Two estimates are presented for each group because the estimates depend on the covariates they included in their statistical models (see model 4 and model 6 in Table 3 and Table 4 in Costa and Kahn (2013)). As this example illustrates, treatment effects can be different for different people and estimates of treatment effects that come from statistical models can depend on the details of those models (Grimmer, Messing, and Westwood 2014).

As these two examples illustrate, in the digital age, we can move from estimating average treatment effects to estimating the heterogeneity of treatment effects because we can have many more participants and we know more about those participants. Learning about heterogeneity of treatment effects can enable targeting of a treatment where it is most effective, provide facts that stimulate new theory development, and provide hints about a possible mechanism, the topic to which I now turn.