The coronavirus pandemic and multiple natural disasters created unprecedented challenges for U.S. Census 2020 data collection, resulting in schedule delays and limiting essential time for data processing. Earlier this month, a congressional hearing described internal Census Bureau documents revealing 15 different data errors affecting the accuracy of census data across many states.1 With a growing chorus of census stakeholders and government watchdogs raising concerns about the 2020 Census, it is important for data users, stakeholders, and the public to understand how and when census accuracy and data quality are evaluated.
No census is perfect, but we have measures that capture how close each census gets to a complete and accurate count. These data quality measures help us answer questions such as: How many households did not self-respond to the census? How many people were counted twice? How many young children were left off of their household’s census form?
Getting an accurate 2020 Census count is important because census numbers impact daily life in the United States in many ways. For example, census data are used to allocate more than one trillion dollars in federal funding each year for important projects and services that benefit local communities.2 Census data are used by state and local governments to plan for schools, roads, and hospitals. The census also plays a vital role in our nation’s system of government by determining how many representatives will be sent to Congress from each state. So, how can we be confident that the Census Bureau got it right despite 2020’s unprecedented hurdles?
How Is Census Accuracy Measured?
One of the most important measures of census accuracy is called coverage. It indicates how close the census came to enumerating all persons living in the United States on April 1 who should have been included in the count. In every census, some people are missed (referred to as omissions) and some are counted more than once or included in the census when they shouldn’t be (called erroneous enumerations), such as foreign tourists. In addition, the Census Bureau may add people to the census count (called whole-person imputations) when housing units appear to be occupied but the residents don’t return a census form or respond to visits from a Census Bureau enumerator.
The sum of all these errors—omissions, erroneous enumerations, and whole-person imputations—is termed gross coverage error. However, some of these errors offset one another in the final count. The extent to which omissions are offset by erroneous enumerations and whole-person imputations is called net coverage error. Net coverage error can be positive or negative.
If the number of omissions exceeds the combined number of erroneous enumerations and whole-person imputations, the difference represents a net undercount. On the other hand, if the sum of erroneous enumerations and whole-person imputations is larger than the number of omissions, the difference represents a net overcount. In the 2010 Census, 16 million omissions were offset by more than 10 million erroneous enumerations and nearly 6 million whole-person imputations, resulting in a net undercount that was virtually zero. However, a low net undercount can mask dramatic differences across groups and geographic areas. For example, renters were more likely to be undercounted in the 2010 Census, as were young children, Blacks and Latinos, and American Indian and Alaska Native people living on tribal reservations. Conversely, homeowners, non-Hispanic whites, college-age young adults, and people ages 50 to 84 were overcounted in the 2010 Census.3 Such differences in net coverage across groups are called differential undercount. Differential undercount matters because it means that some groups and geographic areas may not receive their fair share of federal resources or political representation.
How Does the Census Bureau Estimate Coverage Errors?
The Census Bureau uses two primary tools to measure census accuracy—Demographic Analysis and the Post-Enumeration Survey (PES). Each of these tools provides an independent estimate of the size and characteristics of the U.S. population that can be compared to the census results.
Demographic Analysis (DA) is a program the Census Bureau has used since 1960 to estimate population size and selected characteristics based on historical population data, birth and death records, Medicare enrollment records, and estimates of international migration. These national estimates of the population by age, sex, race, and Hispanic origin are produced independently of the decennial census, and the differences between these two sets of data are used to estimate net undercounts or overcounts of the population. For example, DA has shown that, of all age groups, children under age five face the highest risk of being undercounted in the census. In the 2010 Census, the net undercount rate for children under age five was nearly 5%. However, because DA only provides national-level estimates, these data cannot help us understand net and differential undercounts for subnational geographic areas.
If you are a data user or stakeholder eager to use DA to assess 2020 net undercounts and overcounts for racial and ethnic groups, then you should note that the DA data for race and Hispanic origin are much more limited than the data collected in the census. Because the DA method uses historical data with racial and ethnic categories that differ from the decennial census, DA only provides estimates for the following groups: Black; Black Alone or in Combination; Non-Black; Non-Black Alone or in Combination; Hispanic Origin (only for ages 0 to 29). In addition, because the census includes the category “Some Other Race”, the DA data cannot be directly compared to the corresponding 2020 Census data until those who marked Some Other Race are allocated to the census racial categories that match those in DA. For the 2010 Census, the data reconciling the census categories with those in DA were not released until 2012.
The Census Bureau released their 2020 Demographic Analysis results on December 15, but we won’t know if they show net undercounts or overcounts until 2020 Census data are released. Although state population totals from the 2020 Census are due by law on December 31, the Census Bureau has not yet announced when those data will be released. To fix the problems already uncovered during census data processing, these data may not be released until late January or early February 2021. The first 2020 Census data on age and racial/ethnic distributions will be included in the redistricting data product (known as PL 94-171). The redistricting data file is due by law on March 31, 2021, but it is not clear if this data release will also be delayed due to problems uncovered during data processing.
The Post-Enumeration Survey (PES) is the second primary tool the Census Bureau uses to evaluate census accuracy. For the PES, the Census Bureau conducts an independent data collection from a sample of households across the nation approximately five months after the April 1 census count. The results from the PES are then compared with census records to determine both the number and demographic characteristics of those who were counted correctly and those who were missed or erroneously included. The PES is a critical tool for measuring census accuracy for three reasons: 1) It not only provides estimates of total coverage error like DA, but also of the components of that error—namely omissions, erroneous enumerations, and whole-person imputations; 2) It provides estimates for all the racial and ethnic categories included in the census; and 3) It provides these estimates at both the national and state level.
When will the Census Bureau release results from the PES? Again, data users and census stakeholders will have to be patient. The first PES results at the national level are slated for release in November 2021, with state-level results following in February 2022. However, it is important to recognize that the coronavirus pandemic has also impacted the PES schedule and delayed completion of PES in-person interviews. As a result, these data may also be delayed beyond their targeted release dates.
How Can Census Stakeholders Learn More?
Given the relatively long wait for DA and PES results, what can concerned data users and census stakeholders do now to assess census accuracy and data quality? Fortunately, other data can provide clues. For instance, research has shown that the most reliable information comes from households that respond on their own to the census. Self-response rates do not tell us how accurate the census count is, but people living in areas with low self-response rates are more likely to be undercounted. A PRB analysis showed that self-response rates are lowest in communities of color, increasing undercount risk for Black and Latino children.
Low self-response rates also mean more households must be enumerated using one of the following four methods: 1) by a Census Bureau enumerator during nonresponse followup (NRFU); 2) by proxy (meaning a third party, such as a neighbor, told a census taker about a household’s occupants); 3) through the use of administrative records; or 4) by statistical models (imputation). All four methods yield less precise data than self-response.
Normally the Census Bureau works with state demographers to review group quarters count data. While this review is imprecise, it provides valuable information about potential discrepancies. Unfortunately, in the crunch to meet reporting deadlines, the review was canceled for the 2020 Census, eliminating an important opportunity to assess data quality.
In response to the unprecedented disruptions to 2020 Census data collection, stakeholder groups and state and local demographers are calling for the Census Bureau to provide more data quality measures now—in advance of census data releases. For example, in October, the American Statistical Association (ASA) issued a report recommending the Census Bureau release additional quality indicators including the percentage of housing units visited by a census taker, counted by proxy, counted using administrative records (such as IRS 1040 and IRS 1099 filings), or included by imputation. ASA also recommended evaluating the percentage of records missing critical information such as name or date of birth.4 Indicators like these all help paint a picture about the completeness and accuracy of the 2020 Census.
New metrics will also help data users assess census data accuracy. The Census Bureau announced this month that, for the first time, it will release information for the nation, states, the District of Columbia, and Puerto Rico detailing the way in which households were counted (self-response, interviews by census takers, proxy interviews, and administrative records), the number of count imputations, and metrics for addresses that were resolved as occupied, vacant, or “delete,” meaning they couldn’t find a physical housing unit associated with the address or the address was a duplicate listing.5 Because accuracy varies across these counting methods, the opportunity to compare differences in their distribution should help data users and census stakeholders flag areas of concern and get an initial read on the quality of 2020 Census data.