5.4.1 eBird

eBird collects data on birds from birders; volunteers can provide a scale that no research team can match.

Birds are everywhere, and ornithologists would like to know where every bird is at every moment. Given such a perfect dataset, ornithologists could address many fundamental questions in their field. Of course, collecting these data is beyond the scope of any particular researcher. At the same time that ornithologists desire richer and more complete data, “birders”—people who go bird watching for fun—are constantly observing birds and documenting what they see. These two communities have a long history of collaborating, but now these collaborations have been transformed by the digital age. eBird is a distributed data collection project that solicits information from birders around the world, and it has already received over 260 million bird sightings from 250,000 participants (Kelling, Fink, et al. 2015).

Prior to the launch of eBird, most of the data created by birders were unavailable to researchers:

“In thousands of closets around the world today lie countless notebooks, index cards, annotated checklists, and diaries. Those of us involved with birding institutions know well the frustration of hearing over and over again about ‘my late-uncle’s bird records’ [sic] We know how valuable they could be. Sadly, we also know we can’t use them.” (Fitzpatrick et al. 2002)

Rather than having these valuable data sit unused, eBird enables birders to upload them to a centralized, digital database. Data uploaded to eBird contain six key fields: who, where, when, what species, how many, and effort. For non-birding readers, “effort” refers to the methods used while making observations. Data quality checks begin even before the data are uploaded. Birders trying to submit unusual reports—such as reports of very rare species, very high counts, or out-of-season reports—are flagged, and the website automatically requests additional information, such as photographs. After collecting this additional information, the flagged reports are sent to one of hundreds of volunteer regional experts for further review. After investigation by the regional expert—including possible additional correspondence with the birder—the flagged reports are either discarded as unreliable or entered into the eBird database (Kelling et al. 2012). This database of screened observations is then made available to anyone in the world with an Internet connection, and, so far, almost 100 peer-reviewed publications have used it (Bonney et al. 2014). eBird clearly shows that volunteer birders are able to collect data that are useful for real ornithology research.

One of the beauties of eBird is that it captures “work” that is already happening—in this case, birding. This feature enables the project to achieve a tremendous scale. However, the “work” done by birders does not exactly match the data needed by ornithologists. For example, in eBird, data collection is determined by the location of birders, not the location of the birds. This means that, for example, most observations tend to occur close to roads (Kelling et al. 2012; Kelling, Fink, et al. 2015). In addition to this unequal distribution of effort over space, the actual observations made by birders are not always ideal. For example, some birders only upload information about species that they consider interesting, rather than information on all species that they observed.

eBird researchers have two main solutions to these data quality issues—solutions that might be helpful in other distributed data collection projects as well. First, eBird researchers are constantly trying to upgrade the quality of the data submitted by birders. For example, eBird offers education to participants, and it has created visualizations of each participant’s data that, by their design, encourage birders to upload information about all species that they observed, not just the most interesting (Wood et al. 2011; Wiggins 2011). Second, eBird researchers use statistical models that attempt to correct for the noisy and heterogeneous nature of the raw data (Fink et al. 2010; Hurlbert and Liang 2012). It is not yet clear if these statistical models fully remove biases from the data, but ornithologists are confident enough in the quality of adjusted eBird data that, as had been mentioned earlier, these data have been used in almost 100 peer-reviewed scientific publications.

Many non-ornithologists are initially extremely skeptical when they hear about eBird for the first time. In my opinion, part of this skepticism comes from thinking about eBird in the wrong way. Many people first think “Are the eBird data perfect?”, and the answer is “absolutely not.” However, that’s not the right question. The right question is “For certain research questions, are the eBird data better than existing ornithology data?” For that question the answer is “definitely yes,” in part because for many questions of interest—such as questions about large-scale seasonal migration—there are no realistic alternatives to distributed data collection.

The eBird project demonstrates that it is possible to involve volunteers in the collection of important scientific data. However, eBird, and related projects, indicate that challenges related to sampling and data quality are concerns for distributed data collection projects. As we will see in the next section, however, with clever design and technology, these concerns can be minimized in some settings.