In this appendix, I will summarize some ideas about making causal inference from non-experimental data in a slightly more mathematical form. There are two main approaches: the causal graph framework, most associated with Judea Pearl and colleagues, and the potential outcomes framework, most associated with Donald Rubin and colleagues. I will introduce the potential outcomes framework because it is more closely connected to the ideas in the mathematical notes at the end of chapter 3 and 4. For more on the causal graphs framework, I recommend Pearl, Glymour, and Jewell (2016) (introductory) and Pearl (2009) (advanced). For a book-length treatment of causal inference that combines the potential outcomes framework and the causal graph framework, I recommend Morgan and Winship (2014).

The goal of this appendix is to help you get comfortable with the notation and style of the potential outcomes tradition so that you can transition to some of the more technical material written on this topic. First, I’ll describe the potential outcomes framework. Then, I’ll use it to further discuss natural experiments like the one by Angrist (1990) on the effect of military service on earnings. This appendix draws heavily on Imbens and Rubin (2015).

**Potential outcomes framework**

The potential outcomes framework has three main elements: *units*, *treatments*, and *potential outcomes*. In order to illustrate these elements, let’s consider a stylized version of the question addressed in Angrist (1990): What is the effect of military service on earnings? In this case, we can define the *units* to be people eligible for the 1970 draft in the United States, and we can index these people by \(i = 1, \ldots, N\). The *treatments* in this case can be “serving in the military” or “not serving in the military.” I’ll call these the treatment and control conditions, and I’ll write \(W_i = 1\) if person \(i\) is in the treatment condition and \(W_i = 0\) if person \(i\) is in the control condition. Finally, the *potential outcomes* are bit more conceptually difficult because they involve “potential” outcomes; things that could have happened. For each person eligible for the 1970 draft, we can imagine the amount that they would have earned in 1978 if they served in the military, which I will call \(Y_i(1)\), and the amount that they would have earned in 1978 if they did not serve in the military, which I will call \(Y_i(0)\). In the potential outcomes framework, \(Y_i(1)\) and \(Y_i(0)\) are considered fixed quantities, while \(W_i\) is a random variable.

The choice of units, treatments, and outcomes is critical because it defines what can—and cannot—be learned from the study. The choice of units—people eligible for the 1970 draft—does not include women, and so without additional assumptions, this study will not tell us anything about the effect of military service on women. Decisions about how to define treatments and outcomes are important as well. For example, should the treatment of interest be focused on serving in the military or experiencing combat? Should the outcome of interest be earnings or job satisfaction? Ultimately, the choice of units, treatments, and outcomes should be driven by the scientific and policy goals of the study.

Given the choices of units, treatments, and potential outcomes, the causal effect of the treatment on person \(i\), \(\tau_i\), is

\[ \tau_i = Y_i(1) - Y_i(0) \qquad(2.1)\]

In other words, we compare how much person \(i\) would have earned after serving to how much person \(i\) would have earned without serving. To me, eq. 2.1 is the clearest way to define a causal effect, and although extremely simple, this framework turns out to generalizable in many important and interesting ways (Imbens and Rubin 2015).

When using the potential outcomes framework, I often find it helpful to write out a table showing the potential outcomes and the treatment effects for all units (table 2.5). If you are not able to imagine a table like this for your study, then you might need to be more precise in your definitions of your units, treatments, and potential outcomes.

Person | Earnings in treatment condition | Earnings in control condition | Treatment effect |
---|---|---|---|

1 | \(Y_1(1)\) | \(Y_1(0)\) | \(\tau_1\) |

2 | \(Y_2(1)\) | \(Y_2(0)\) | \(\tau_2\) |

\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |

\(N\) | \(Y_N(1)\) | \(Y_N(0)\) | \(\tau_N\) |

Mean | \(\bar{Y}(1)\) | \(\bar{Y}(0)\) | \(\bar{\tau}\) |

When defining the causal effect in this way, however, we run into a problem. In almost all cases, we don’t get to observe both potential outcomes. That is, a specific person either served or did not serve. Therefore, we observe one of the potential outcomes—\(Y_i(1)\) or \(Y_i(0)\)—but not both. The inability to observe both potential outcomes is such a major problem that Holland (1986) called it the *Fundamental Problem of Causal Inference*.

Fortunately, when we are doing research, we don’t just have one person; rather, we have many people, and this offers a way around the Fundamental Problem of Causal Inference. Instead of attempting to estimate the individual-level treatment effect, we can estimate the *average treatment effect* for all units:

\[ \text{ATE} = \bar{\tau} = \frac{1}{N} \sum_{i=1}^N \tau_i \qquad(2.2)\]

This equation is still expressed in terms of the \(\tau_i\), which are unobservable, but with some algebra (eq 2.8 of Gerber and Green (2012)), we get

\[ \text{ATE} = \frac{1}{N} \sum_{i=1}^N Y_i(1) - \frac{1}{N} \sum_{i=1}^N Y_i(0) \qquad(2.3)\]

This shows that if we can estimate the population average outcome under treatment (\(N^{-1} \sum_{i=1}^N Y_i(1)\)) and the population average outcome under control (\(N^{-1} \sum_{i=1}^N Y_i(1)\)), then we can estimate the average treatment effect, even without estimating the treatment effect for any particular person.

Now that I’ve defined our estimand—the thing we are trying to estimate—I’ll turn to how we can actually estimate it with data. And here we run directly into the problem that we only observe one of the potential outcomes for each person; we see either \(Y_i(0)\) or \(Y_i(1)\) (table 2.6). We could estimate the average treatment effect by comparing the earnings of people that served to the earnings of people that did not serve:

\[ \widehat{\text{ATE}} = \underbrace{\frac{1}{N_t} \sum_{i:W_i=1} Y_i(1)}_{\text{average earnings, treatment}} - \underbrace{\frac{1}{N_c} \sum_{i:W_i=0} Y_i(0)}_{\text{average earnings, control}} \qquad(2.4)\]

where \(N_t\) and \(N_c\) are the numbers of people in the treatment and control conditions. This approach will work well if the treatment assignment is independent of potential outcomes, a condition sometimes called *ignorability*. Unfortunately, in the absence of an experiment, ignorability is not often satisfied, which means that the estimator in eq. 2.4 is not likely to produce good estimate. One way to think about it is that in the absence of random assignment of treatment, eq. 2.4 is not comparing like with like; it is comparing the earnings of different kinds of people. Or expressed slightly different, without random assignment of treatment, the treatment allocation is probably related to potential outcomes.

In chapter 4, I’ll describe how randomized controlled experiments can help researchers make causal estimates, and here I’ll describe how researchers can take advantage of natural experiments, such as the draft lottery.

Person | Earnings in treatment condition | Earnings in control condition | Treatment effect |
---|---|---|---|

1 | ? | \(Y_1(0)\) | ? |

2 | \(Y_2(1)\) | ? | ? |

\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |

\(N\) | \(Y_N(1)\) | ? | ? |

Mean | ? | ? | ? |

**Natural experiments**

One approach to making causal estimates without running an experiment is to look for something happening in the world that has randomly assigned a treatment for you. This approach is called *natural experiments*. In many situations, unfortunately, nature does not randomly deliver the treatment that you want to the population of interest. But sometimes, nature randomly delivers a related treatment. In particular, I’ll consider the case where there is some *secondary treatment* that encourages people to receive the *primary treatment*. For example, the draft could be considered a randomly assigned secondary treatment that encouraged some people to take the primary treatment, which was serving in the military. This design is sometimes called an *encouragement design*. And the analysis method that I’ll describe to handle this situation is sometimes called *instrumental variables*. In this setting, with some assumptions, researchers can use the encouragement to learn about the effect of the primary treatment for a particular subset of units.

In order to handle the two different treatments—the encouragement and the primary treatment—we need some new notation. Suppose that some people are randomly drafted (\(Z_i = 1\)) or not drafted (\(Z_i = 0\)); in this situation, \(Z_i\) is sometimes called an *instrument*.

Among those who were drafted, some served (\(Z_i = 1, W_i = 1\)) and some did not (\(Z_i = 1, W_i = 0\)). Likewise, among those who were not drafted, some served (\(Z_i = 0, W_i = 1\)) and some did not (\(Z_i = 0, W_i = 0\)). The potential outcomes for each person can now be expanded to show their status for both the encouragement and the treatment. For example, let \(Y(1, W_i(1))\) be the earnings of person \(i\) if he was drafted, where \(W_i(1)\) is his service status if drafted. Further, we can split the population into four groups: compliers, never-takers, defiers, and always-takers (table 2.7).

Type | Service if drafted | Service if not drafted |
---|---|---|

Compliers | Yes, \(W_i(Z_i=1) = 1\) | No, \(W_i(Z_i=0) = 0\) |

Never-takers | No, \(W_i(Z_i=1) = 0\) | No, \(W_i(Z_i=0) = 0\) |

Defiers | No, \(W_i(Z_i=1) = 0\) | Yes, \(W_i(Z_i=0) = 1\) |

Always-takers | Yes, \(W_i(Z_i=1) = 1\) | Yes, \(W_i(Z_i=0) = 1\) |

Before we discuss estimating the effect of the treatment (i.e., military service), we can first define two effects of the encouragement (i.e., being drafted). First, we can define the effect of the encouragement on the primary treatment. Second, we can define the effect of the encouragement on the outcome. It will turn out that these two effects can be combined to provide an estimate of the effect of the treatment on a specific group of people.

First, the effect of the encouragement on treatment can be defined for person \(i\) as

\[ \text{ITT}_{W,i} = W_i(1) - W_i(0) \qquad(2.5)\]

Further, this quantity can be defined over the entire population as

\[ \text{ITT}_{W} = \frac{1}{N} \sum_{i=1}^N [W_i(1) - W_i(0)] \qquad(2.6)\]

Finally, we can estimate \(\text{ITT} _{W}\) using data:

\[ \widehat{\text{ITT}_{W}} = \bar{W}^{\text{obs}}_1 - \bar{W}^{\text{obs}}_0 \qquad(2.7)\]

where \(\bar{W}^{\text{obs}}_1\) is the observed rate of treatment for those who were encouraged and \(\bar{W}^{\text{obs}}_0\) is the observed rate of treatment for those who were not encouraged. \(\text{ITT}_W\) is also sometimes called the *uptake rate*.

Next, the effect of the encouragement on the outcome can be defined for person \(i\) as:

\[ \text{ITT}_{Y,i} = Y_i(1, W_i(1)) - Y_i(0, W_i(0)) \qquad(2.8)\]

Further, this quantity can be defined over the entire population as

\[ \text{ITT}_{Y} = \frac{1}{N} \sum_{i=1}^N [Y_i(1, W_i(1)) - Y_i(0, W_i(0))] \qquad(2.9)\]

Finally, we can estimate \(\text{ITT}_{Y}\) using data:

\[ \widehat{\text{ITT}_{Y}} = \bar{Y}^{\text{obs}}_1 - \bar{Y}^{\text{obs}}_0 \qquad(2.10)\]

where \(\bar{Y}^{\text{obs}}_1\) is the observed outcome (e.g., earnings) for those who were encouraged (e.g., drafted) and \(\bar{W}^{\text{obs}}_0\) is the observed outcome for those who were not encouraged.

Finally, we turn our attention to the effect of interest: the effect of the primary treatment (e.g., military service) on the outcome (e.g., earnings). Unfortunately, it turns out that one cannot, in general, estimate this effect on all units. However, with some assumptions, researchers can estimate the effect of the treatment on compliers (i.e., people who will serve if drafted and people who will not serve if not drafted, table 2.7). I’ll call this estimand the *complier average causal effect* (CACE) (which is also sometimes called the *local average treatment effect*, LATE):

\[ \text{CACE} = \frac{1}{N_{\text{co}}} \sum_{i:G_i=\text{co}} [Y(1, W_i(1)) - Y(0, W_i(0))] \qquad(2.11)\]

where \(G_i\) donates the group of person \(i\) (see table 2.7) and \(N_{\text{co}}\) is the number of compliers. In other words, eq. 2.11 compares the earnings of compliers who are drafted \(Y_i(1, W_i(1))\) and not drafted \(Y_i(0, W_i(0))\). The estimand in eq. 2.11 seems hard to estimate from observed data because it is not possible to identify compliers using only observed data (to know if someone is complier you would need to observe whether he served when drafted and whether he served when not drafted).

It turns out—somewhat surprisingly—that if there are any compliers, then provided one makes three additional assumptions, it is possible to estimate CACE from observed data. First, one has to assume that the assignment to treatment is random. In the case of the draft lottery this is reasonable. However, in some settings where natural experiments do not rely on physical randomization, this assumption may be more problematic. Second, one has to assume that their are no defiers (this assumption is also sometimes called the monotonicity assumption). In the context of the draft it seems reasonable to assume that there are very few people who will not serve if drafted and will serve if not drafted. Third, and finally, comes the most important assumption which is called the *exclusion restriction*. Under the exclusion restriction, one has to assume that all of the effect of the treatment assignment is passed through the treatment itself. In other words, one has to assume that there is no direct effect of encouragement on outcomes. In the case of the draft lottery, for example, one needs to assume that draft status has no effect on earnings other than through military service (figure 2.11). The exclusion restriction could be violated if, for example, people who were drafted spent more time in school in order to avoid service or if employers were less likely to hire people who were drafted.

If these three condition (random assignment to treatment, no defiers, and the exclusion restriction) are met, then

\[ \text{CACE} = \frac{\text{ITT}_Y}{\text{ITT}_W} \qquad(2.12)\]

so we can estimate CACE:

\[ \widehat{\text{CACE}} = \frac{\widehat{\text{ITT}_Y}}{\widehat{\text{ITT}_W}} \qquad(2.13)\]

One way to think about CACE is that it is the difference in outcomes between those who were encouraged and those not encouraged, inflated by the uptake rate.

There are two important caveats to keep in mind. First, the exclusion restriction is a strong assumption, and it needs to be justified on a case-by-case basis, which frequently requires subject-area expertise. The exclusion restriction cannot be justified with randomization of the encouragement. Second, a common practical challenge with instrumental variable analysis comes when the encouragement has little effect on the uptake of treatment (when \(\text{ITT}_W\) is small). This is called a *weak instrument*, and it leads to a variety of problems (Imbens and Rosenbaum 2005; Murray 2006). One way to think about the problem with weak instruments is that \(\widehat{\text{CACE}}\) can be sensitive to small biases in \(\widehat{\text{ITT}_Y}\)—potentially due to violations of the exclusion restriction—because these biases get magnified by a small \(\widehat{\text{ITT}_W}\) (see eq. 2.13). Roughly, if the treatment that nature assigns doesn’t have a big impact on the treatment you care about, then you are going to have a hard time learning about the treatment you care about.

See chapter 23 and 24 of Imbens and Rubin (2015) for a more formal version of this discussion. The traditional econometric approach to instrumental variables is typically expressed in terms of estimating equations, not potential outcomes. For an introduction from this other perspective, see Angrist and Pischke (2009), and for a comparison between the two approaches, see section 24.6 of Imbens and Rubin (2015). An alternative, slightly less formal presentation of the instrumental variables approach is provided in chapter 6 of Gerber and Green (2012). For more on the exclusion restriction, see D. Jones (2015). Aronow and Carnegie (2013) describe an additional set of assumptions that can be used to estimate ATE rather than CACE. For more on how natural experiments can be very tricky to interpret, see Sekhon and Titiunik (2012). For a more general introduction to natural experiments—one that goes beyond just the instrumental variables approach to also include designs such as regression discontinuity—see Dunning (2012).