Senior Research Associate
California Doesn’t Really Have 40 Times More Hispanic/Latino COVID-19 Cases Than Texas—They Just Have Better Data
The ongoing coronavirus pandemic has highlighted the challenge of data gaps in tracking the pandemic’s impact on racial and ethnic populations across the United States.
May 19, 2021
The ongoing coronavirus pandemic has highlighted the challenge of data gaps in tracking the pandemic’s impact on racial and ethnic populations across the United States. This challenge was underscored when UnidosUS asked PRB for help tracking trends in COVID-19 cases and deaths for the Hispanic/Latino population in six states (Arizona, California, Colorado, Florida, Nevada, Texas).
To tackle this project, we needed reliable time-series data on COVID-19 cases and deaths by race and ethnicity. These data would be important for trend analysis, serve as a benchmark for projections, and allow us to calculate COVID-19 case fatality rates (CFRs).
This data didn’t seem like too much to ask for but what we found was a hodgepodge of reporting and data quality standards that varied widely from state to state. And we’re not alone in uncovering less-than-ideal state-level demographic reporting. Ishaan Pathak and his colleagues found that among the 50 states, only California, Illinois, and Ohio had sufficient age and racial/ethnic detail to investigate disparities in CFRs, controlling for age.
California’s Comprehensive Data Reporting
Throughout the pandemic, California has offered comprehensive, timely reporting on coronavirus cases and deaths by race/ethnicity. As of January 2021, nearly all California coronavirus-related deaths were reported by race/ethnicity, and three quarters of cases were reported with racial/ethnic detail (see Table 1).
TABLE 1. Comprehensive Racial/Ethnic Reporting for California COVID-19 Cases and Deaths
|American Indian/Alaska Native||7||5,750||90|
|Native Hawaiian/Pacific Islander||8||10,337||147|
|Total with race/ethnicity||10||1,828,872||27,314|
|Total (including records missing race/ethnicity)||11||2,482,226||27,462|
Note: Racial and ethnic-group titles appear as they are reported by the source and are mutually exclusive.
Source: California Department of Public Health, COVID-19 Race and Ethnicity Data, as of Jan. 7, 2021.
If we look at California data for early January 2021, we see considerable racial/ethnic detail, with coverage for 99% of death records and 74% of case records. While the case coverage isn’t perfect (26% missing), the resulting CFRs aren’t wildly out of alignment with what we’d expect. For example, dividing the number of deaths by the number of cases among African Americans yields a CFR of around 2,500 deaths per 100,000 cases.
Texas, on the other hand… has some egregious problems with missing data.
Data Problems Are Bigger in Texas
Looking at the same time period (early January 2021), we see that in Texas the death data are reasonably complete (96% of deaths are identified by race/ethnicity). But there’s almost no tracking of race/ethnicity for cases (see Table 2). Texas reports racial/ethnic case detail for fewer than 5% of cases, and even within that small share, a substantial proportion are labeled “Unknown.” Based on these reported numbers, the CFR for Black Texans would be estimated at almost 25,000 deaths per 100,000 cases—10 times the estimated rate for African Americans in California.
TABLE 2. Almost No Race/Ethnicity Reporting for Texas COVID-19 Cases
|7||Total with race/ethnicity||67,613||27,771|
|8||Total (including records missing race/ethnicity)||1,563,758||28,877|
|9||Percent with race/ethnicity reported||4%||96%|
Note: Racial and ethnic group titles appear as they are reported by the source and are mutually exclusive.
Source: Texas Department of State Health Services, Texas COVID-19 Data, as of Jan. 8, 2021.
How Did We Deal With the Data Gap?
To produce realistic case trend data for UnidosUS, we tested a variety of estimation methods, ranging from presenting the data as reported to using the (more reliable) death data to reverse engineer an estimate of what the case numbers might have been. Each of the alternatives has pros and cons, as illustrated in our decision matrix.
|1||1. Use reported totals||None||Massive underestimate|
|2||2. Apply % reported race/ethnicity cases to total cases||Straightforward; consistent with reported case totals||Doesn’t account for bias in reported race/ethnicity data|
|3||3. Apply % reported race/ethnicity deaths to total cases||Straightforward; consistent with reported case totals||Doesn’t account for differences in mortality by race/ethnicity|
|4||4. Reverse engineer number of cases based on death data and other-state case fatality rates||Maintains consistency between case and death data||Other-state case fatality rates are not the same; may over- or under-estimate cases in Texas|
We quickly ruled out using the data as is. Reporting Hispanic/Latino COVID-19 cases based on reporting that covers just 4% of the universe would be misleading at best. Of the remaining alternatives we chose option 2: Estimate Hispanic/Latino cases using the racial/ethnic distribution from reported cases and apply that distribution to all cases. While this method has some potential for bias, further analysis (presented at the Population Association of America Applied Demography Conference in February 2021) suggested the approach was reasonable.
The main takeaway here is that regardless of the specific technique, estimating to fill in data gaps is a band-aid solution at best. Ideally, we would have better demographic data on COVID-19 cases.
Visit the UnidosUS website to learn more about the methods we used for these trends and view an interactive data visualization of the results.