Disclosure Avoidance in the 2020 Census: What Should Data Users Know About Respondent Privacy and Data Accuracy?
PRB partnered with the Census Bureau to release a new series of materials designed to provide concise, reader-friendly information to data users about the new privacy protection methods being used for the 2020 Census.
How do you collect information for all 330 million people in the United States, then summarize and publish that information, all while keeping responses confidential? Disclosure avoidance—or ensuring data can’t be tied back to individual people—is a challenge the U.S. Census Bureau has struggled with for decades.
Beginning with the 1930 Census, the Census Bureau stopped publishing certain tables for small geographic areas to protect respondents’ confidential data. For the 1970 and 1980 Censuses, the Census Bureau did not publish certain tables based on the number of people or households in a given area. Starting with the 1990 Census, the Census Bureau began using more sophisticated techniques, such as data swapping (swapping the geographic identifiers on records for certain households with the identifiers from nearby households with similar characteristics) to protect against disclosure.
The swapping method was meant to protect the privacy of people who were unique from their neighbors. In an infamous example, the 2010 Census data for a white husband and wife living on Liberty Island, New York, were swapped with those of an Asian household.
But older disclosure avoidance methods, such as suppression and swapping, were not designed to defend against new types of privacy risks. New computing techniques broaden privacy risks even for seemingly de-identified data.
How could summary data pose a privacy risk?
Advances in computing power and methods have opened new possibilities for database reconstruction—using information in published tables to reconstruct the original census responses without names or addresses. Once data are reconstructed, individual people could be reidentified by linking the reconstructed data to external databases (or personal knowledge about a person).
For example, recent research has shown that transgender teens could be identified by “linking seemingly anonymized information such as their neighborhood and age to discover that their sex was reported differently in successive censuses.”
To protect the privacy of all respondents, for the 2020 Census, the Census Bureau implemented new disclosure avoidance methods based on the principle of differential privacy.
Differential privacy is a scientific framework for processing data to protect the identities and personal information of the people in the data. It works by adding statistical noise—small, random additions or subtractions—to published statistics so that no one can reidentify a specific person or household with any certainty using any combination of the published data.
What do data users need to know?
These new privacy protection methods mean that data users need new guidance for how to work with information from the 2020 Census. How and where noise was added? How does that affect error and bias in the published statistics? And how can you use data in ways that reduce noise and bias?
PRB partnered with the Census Bureau to release a new series of materials designed to provide concise, reader-friendly information to data users about the new disclosure avoidance methods being used for the 2020 Census. The first briefs are:
- Disclosure Avoidance and the 2020 Redistricting Data provides a list of dos and don’ts for working with the 2020 Census Redistricting Data, as well as information about how much noise and bias were introduced to the data to protect privacy.
- Why the Census Bureau Chose Differential Privacy explains the tradeoffs between disclosure avoidance methods and why the Census Bureau chose this new method for disclosure avoidance.
- Disclosure Avoidance and the 2020 Census: How the TopDown Algorithm Works details how the disclosure avoidance system works for the 2020 Census Redistricting Data and Demographic and Housing Characteristics File.
More briefs, workshops, and materials are in development.