Quality and quantity with citizen science

Citizen science is a range of activities and projects through which people from all walks of life help advance scientific discovery. Citizen scientists bring science into the mainstream and make science relevant to their lives. As a scientist, I rely on citizen scientists as research collaborators. As a blogger, I’ve become a citizen science advocate giving three cheers to discoveries and projects. Every time I share stories about citizen science, the most frequent response I receive is skepticism about data quality. How could people – veritable strangers – without formal training in the sciences be of any authentic use to professional scientific research projects? How can people without scientific credential do work of sufficient quality to result in products of genuine scientific value? Is it really possible for science-society collaborations involving individuals with highly variable levels of expertise to produce reliable and trustworthy knowledge?

My short answer is the proof of the pudding is in the eating, by which I mean the resume of citizen science boasts a wide range of discoveries that already passed the rigor of peer-review publication. Unfortunately, the scientific contributions from citizen science are almost impossible to see because they are rarely labelled as such. As one example, colleagues and I looked at a review paper by Knudsen et al. in 2011 in which they synthesized what’s known about migratory birds and climate change and gave their expert opinions on the reliability of the state of the knowledge for particular claims. Note that those authors synthesized and evaluated the literature without considering the sources of data upon which studies were based. We found that half of the studies they referenced about migratory birds and climate change relied on citizen science observations. Yet, not even one of those publications used the term “citizen science.” Furthermore, there was no relationship between the expert opinions and citizen science: Some of the claims the experts agreed were most reliable turned out to be those dominated by citizen science, but an equal number were not. With its long history of engaged birdwatchers, ornithology might be on the high end of the large but invisible contribution citizen science makes our knowledge base, but numerous other disciplines have some discoveries attributable to citizen science as well.

cover_imageMy long (280+ page) answer is that I’ve been so fascinated by citizen science that I wrote a book to share stories of seemingly ordinary people helping uncover extraordinary findings. We should think twice about how discoveries are made, who is involved, and who science serves. How we answer these questions of knowledge production has big implications for the role of science in society and how we might solve some of humanity’s pressing challenges.

To satisfy those who want some nitty gritty about how citizen science projects actually address data quality, here is my medium-length answer, a brief review of the technical aspects of designing and implementing citizen science to ensure the data are fit for intended uses. When it comes to crowd-driven citizen science, it makes sense to assess how those data are handled and used appropriately. Rather than question whether citizen science data quality is low or high, ask whether it is fit or unfit for a given purpose. For example, in studies of species distributions, data on presence-only will fit fewer purposes (like invasive species monitoring) than data on presence and absence, which are more powerful. Designing protocols so that citizen scientists report what they do not see can be challenging which is why some projects place special emphasize on the importance of “zero data.”

It is a misnomer that the quality of each individual data point can be assessed without context. Yet one of the most common way to examine citizen science data quality has been to compare volunteer data to those collected by trained technicians and scientists. Even a few years ago I’d noticed over 50 papers making these types of comparisons and the overwhelming evidence suggested that volunteer data are fine. And in those few instances when volunteer observations did not match those of professionals, that was evidence of poor project design. While these studies can be reassuring, they are not always necessary nor would they ever be sufficient. Because citizen science makes different discoveries than science carried out by professionals alone, comparing data quality in these two systems can be like comparing apples and oranges. First, the research questions are different, often in terms of scale. My rule of thumb is that if a research question can be answered by scientists alone, then citizen science is not appropriate. There is more unknown to us than known. Citizen science should focus on discovering the enormous fraction of unknowns that scientists can’t uncover by themselves. Second, given the different contexts, the data can be handled in different ways. When one field technician mis-identifies a species, that’s a problem with particular solutions (e.g., train field technicians better). When one in twenty volunteers mis-identifies a species, there’s multiple other ways to handle the collective data to address the problem (see below).

Scientists designing and implementing citizen science projects are concerned and aware of data quality and fitness for use. From the start of a project to its completion, there are multiple ways to deal with data quality issues.  For example, another common way to assess data quality (whether from citizen science or professional science) is to examine sources of bias. Data from birdwatchers typically have weekend bias, and once it is identified, it can be quantified and corrected statistically or methodologically.

Wiggins and colleagues surveyed a range of projects to learn how they address data quality issues. Most responses came from medium-sized projects in North America that focus on monitoring and gathering observations. The Wiggins group found at least 18 different approaches to handling data contributions so that they are fit for as many purposes as possible. I’m coarsely lumping (and excluding) categories into five common approaches below.  (see their more recent summary here: http://onlinelibrary.wiley.com/wol1/doi/10.1002/fee.1436/full)

Expert review – over three-quarters of projects in Wiggins et al sample included some form of expert review to validate observation. An iconic example is eBird which has over 500 volunteer reviewers. eBird reviewers each have reputations as excellent birders. They work with a filtering system in which every species reported to eBird is automatically cross-checked based on the number observed, location, and date of observation. If anything exceeds what’s typical for a given species at a given location on a given date, then the observation is “flagged” for review. Reviewers then decide whether to follow-up with the volunteer to request more evidence. In Project FeederWatch, semi-automated requests are sent encouraging photographic evidence.

Photo submission – 40% of projects in the Wiggins et al sample include photo vouchers for validating data. A community-based citizen science project encountered the need for photographs. They began as volunteers combing the beach for seas turtle tracks for the NC Wildlife Resource Commission. After they began their own citizen science project (called Wrightsville Beach Keep it Clean!) to monitor garbage on the beach, the validity of their findings were questioned by residents of Wrightsville Beach. The volunteers had to modify their protocols so that each person now brings home all the garbage they collect, rinses off the sand, sorts it by type of trash, and photographs it. With photo-documentation, no one doubts their claims.

Training and testing – over 20% reported having a training program related to quality assurance and quality control. For some projects, it is essential to train volunteers and/or require evidence of skills before they can make meaningful contributions. For example, in NC Candid Critters all volunteers must take an online training module to learn to use motion sensitive “camera traps” before they can borrow such a camera from their local library. To play the online citizen science game Foldit, which involves solving 3-D puzzles of protein folding, participants have to complete a series of tutorial puzzles, and then play games designated only for beginners, until they gain the ability (as demonstrated in points) to solve harder puzzles. Hobby communities are great for citizen science by facilitating people teaching each other the skills needed. For example, for my ornithology citizen science, I’m thankful for local Audubon groups brining up birdwatchers, and state and local bluebird societies teaching enthusiasts to install and monitor nest boxes.

Replication by multiple participants – almost one-quarter of projects in Wiggins et al sample use redundancy to validate data. Redundancy is relevant to all citizen science projects that rely on crowds, whether online or geographically scattered in the field. For online projects, redundancy is in the form of behind-the-scenes consensus. For example, every image in Galaxy Zoo is tagged by multiple volunteer independently until a trustworthy level of consensus has been reached. For field observations, big data allows researchers to place less emphasis on outliers (or completely eliminate them) and look for consistent patterns within the core of observations. Also redundancy in protocols is a way of double-checking for errors, such as by requiring data entry on paper forms and through online entry.

A good match – People who are not scientists have lots of skills and expertise of value to citizen science. In natural history fields in particularly, expertise on species taxonomy is held almost exclusively (other than museum curators) by amateurs.

A project requiring participants with such expertise will fail if their participants are poorly matched to the project. This happened in the very early days of citizen science. In the late 1800s, shortly before the creation of the Christmas Bird Count, the newly formed American Ornithologist’s Union created citizen science project in which lighthouse keepers were asked to report on fatalities of migrating birds hitting the lighthouses.  Most lighthouse keepers weren’t birdwatchers and were unfamiliar with the taxonomy and nomenclature of bird species. Their observations were indecipherable as they reported species like sea robins, mother-careys chickens, black sea duckes, and bee martins. Unfortunately, ornithologists didn’t know how to translate those colloquial names into species.

Matching people and projects appropriately is essential to project success. That’s the goal of Scistarter.com.  With thousands of citizen science projects and opportunities, the SciStarter team is launching a new version of SciStarter with new tools to help people navigate the world of citizen science.

The frequency in which I encounter “the data quality question,” asked by general and scientific audiences alike, reveals that science resides unnecessarily on too high a pedestal in our society. Citizen science brings science within reach. By bridging the gap between science and society, citizen science may also bridge the divide between growing pro-science and anti-science sentiments.

In January 2017, Cooper begins as co-chair of the CODATA-WDS Task Group on Citizen Science and the Validation, Curation, and Management of Crowdsourced Data; Cooper’s book with The Overlook Press is called Citizen Science: How Ordinary People are Changing the Face of Discovery is in bookstores now.

Categories: Citizen Science, Other


About the Author

Caren Cooper

Caren Cooper

Dr. Caren Cooper is an associate professor in Forestry and Environmental Resources at NCSU in the Chancellor's Faculty Excellence program on Leadership in Public Science, and assistant head of the Biodiversity Research Lab at the North Carolina Museum of Natural Sciences. She is co-editor-in-chief of Citizen Science: Theory & Practice, a journal of the Citizen Science Association. She has authored over 50 scientific papers, co-developed software to automate metrics of incubation rhythms, and co-created NestWatch, CamClickr, Celebrate Urban Birds, YardMap, and Sparrow Swap. She is a blogger with SciStarter, and author of Citizen Science: How Ordinary People are Changingthe Face of Discovery. She likes to propel herself on one wheel, two wheel, and eight wheel devices. Follow her @CoopSciScoop. She hosts periodic Twitter discussions with panelists at #CitSCiChat and runs @IamCitSci, a Twitter account with rotating weekly guest hosts.