2.4.2 Forecasting and nowcasting

Predicting the future is hard, but predicting the present is easier.

The second main strategy researchers can use with observational data is forecasting. Making guesses about the future is notoriously difficult, and perhaps for that reason, forecasting is not currently a large part of social research (although it is a small and important part of demography, economics, epidemiology and political science). Here, however, I’d like to focus on a special kind of forecasting called nowcasting—a term derived from combining “now” and “forecasting.” Rather than predicting the future, nowcasting attempts to use ideas from forecasting to measure the current state of the world; it attempts to “predict the present” (Choi and Varian 2012). Nowcasting has the potential to be especially useful to governments and companies that require timely and accurate measures of the world.

One setting where the need for timely and accurate measurement is very clear is epidemiology. Consider the case of influenza (“the flu”). Each year, seasonal influenza epidemics cause millions of illnesses and hundreds of thousands of deaths around the world. Further, each year, there is a possibility that a novel form of influenza could emerge that would kill millions. The 1918 influenza outbreak, for example, is estimated to have killed between 50 and 100 million people (Morens and Fauci 2007). Because of the need to track and potentially respond to influenza outbreaks, governments around the world have created influenza surveillance systems. For example, the US Centers for Disease Control and Prevention (CDC) regularly and systematically collect information from carefully selected doctors around the country. Although this system produces high-quality data, it has a reporting lag. That is, because of the time it takes for the data arriving from doctors to be cleaned, processed, and published, the CDC system releases estimates of how much flu there was two weeks ago. But, when handling an emerging epidemic, public health officials don’t want to know how much influenza there was two weeks ago; they want to know how much influenza there is right now.

At the same time that the CDC is collecting data to track influenza, Google is also collecting data about influenza prevalence, although in a quite different form. People from around the world are constantly sending queries to Google, and some of these queries—such as “flu remedies” and “flu symptoms”—might indicate that the person making the query has the flu. But, using these search queries to estimate flu prevalence is tricky: not everyone who has the flu makes a flu-related search, and not every flu-related search is from someone who has the flu.

Jeremy Ginsberg and a team of colleagues (2009), some at Google and some at CDC, had the important and clever idea to combine these two data sources. Roughly, through a kind of statistical alchemy, the researchers combined the fast and inaccurate search data with the slow and accurate CDC data in order to produce fast and accurate measurements of influenza prevalence. Another way to think about it is that they used the search data to speed up the CDC data.

More specifically, using data from 2003 to 2007, Ginsberg and colleagues estimated the relationship between the prevalence of influenza in the CDC data and the search volume for 50 million distinct terms. From this process, which was completely data-driven and did not require specialized medical knowledge, the researchers found a set of 45 different queries that seemed to be most predictive of the CDC flu prevalence data. Then, using the relationships that they learned from the 2003-2007 data, Ginsberg and colleagues tested their model during the 2007-2008 influenza season. They found that their procedures could indeed make useful and accurate nowcasts (figure 2.6). These results were published in Nature and received adoring press coverage. This project—which was called Google Flu Trends—became an often-repeated parable about the power of big data to change the world.

Figure 2.6: Jeremy Ginsberg and colleagues (2009) combined Google search data with CDC data to create Google Flu Trends, which could nowcast the rate of influenza-like illness (ILI). Results in this figure are for the mid-Atlantic region of the United States in the 2007-2008 influenza season. Although it was initially very promising, the performance of Google Flu Trends decayed over time (Cook et al. 2011; Olson et al. 2013; Lazer et al. 2014). Adapted from Ginsberg et al. (2009), figure 3.

However, this apparent success story eventually turned into an embarrassment. Over time, researchers discovered two important limitations that make Google Flu Trends less impressive than it initially appeared. First, the performance of Google Flu Trends was actually not much better than that of a simple model that estimates the amount of flu based on a linear extrapolation from the two most recent measurements of flu prevalence (Goel et al. 2010). And, over some time periods, Google Flu Trends was actually worse than this simple approach (Lazer et al. 2014). In other words, Google Flu Trends with all its data, machine learning, and powerful computing did not dramatically outperform a simple and easier-to-understand heuristic. This suggests that when evaluating any forecast or nowcast, it is important to compare against a baseline.

The second important caveat about Google Flu Trends is that its ability to predict the CDC flu data was prone to short-term failure and long-term decay because of drift and algorithmic confounding. For example, during the 2009 Swine Flu outbreak Google Flu Trends dramatically overestimated the amount of influenza, probably because people tend to change their search behavior in response to widespread fear of a global pandemic (Cook et al. 2011; Olson et al. 2013). In addition to these short-term problems, the performance gradually decayed over time. Diagnosing the reasons for this long-term decay are difficult because the Google search algorithms are proprietary, but it appears that in 2011 Google began suggesting related search terms when people search for flu symptoms like “fever” and “cough” (it also seem that this feature is no longer active). Adding this feature is a totally reasonable thing to do if you are running a search engine, but this algorithmic change had the effect of generating more health-related searches which caused Google Flu Trends to overestimate flu prevalence (Lazer et al. 2014).

These two caveats complicate future nowcasting efforts, but they do not doom them. In fact, by using more careful methods, Lazer et al. (2014) and Yang, Santillana, and Kou (2015) were able to avoid these two problems. Going forward, I expect that nowcasting studies that combine big data sources with researcher-collected data will enable companies and governments to create more timely and more accurate estimates by essentially speeding up any measurement that is made repeatedly over time with some lag. Nowcasting projects such as Google Flu Trends also show what can happen if big data sources are combined with more traditional data that were created for the purposes of research. Thinking back to the art analogy of chapter 1, nowcasting has the potential to combine Duchamp-style readymades with Michelangelo-style custommades in order to provide decision makers with more timely and more accurate measurements of the present and predictions of the near future.