March 24, 2022
If the mantra in real estate is “location, location, location,” the mantra for public data users is—or should be—“disaggregation, disaggregation, disaggregation.”
Why Does Disaggregation Matter?
The average income in your neighborhood will go up if a billionaire moves in, but that rising average doesn’t tell you anything about whether income rose, fell, or remained the same for everyone else. A new billionaire neighbor is a dramatic example but one that illustrates the point: Disaggregated data are crucial for understanding how people are doing.
We couldn’t uncover these truths without disaggregated data:
- Even as suicide rates fell between 2019 and 2020, rates rose for women ages 15-24.
- Even though job recovery has been robust since the recession in 2020, Black women have not seen equivalent job gains.
- While more households nationwide have high-speed internet, rural areas lag behind urban areas in broadband access.
During the 2021 conference of the Association of Public Data Users (APDU), speaker Rhonda Vonshay Sharpe provided numerous examples of how disaggregating data—by gender, race and ethnicity, and education—provides crucial insights for improving public health and well-being. Her talk was inspiring and also left many in the audience wondering…
If Disaggregation Is So Important, Why Isn’t It More Common?
To be fair, some people probably just don’t think about disaggregation. But there are bigger, systemwide challenges.
Sometimes survey sample sizes are too small to produce reliable estimates for a population of interest. When this happens, researchers—hoping to provide some data rather than none—may group smaller demographic groups together so they have enough combined survey responses to get an estimate they can report.
I have done this kind of aggregation in my own work—grouping across income levels, sexual orientations, racial/ethnic groups, geographies, or ages—because in the context of the work I was doing, aggregated data were preferable to tables full of missing data. If you’re considering aggregating groups, the Urban Institute provides some handy guidelines. And remember that sometimes noting in your work that the sample size is small or estimates are unreliable is important because it signals that there’s a data gap.
Speaking of data gaps: Sometimes data are only reported for aggregate groups. A visitor to federal statistical websites will often find data for just five racial groups (American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White) and two ethnic groups (Hispanic or Latino and Not Hispanic or Latino). These groups reflect minimum standards set by the U.S. Office of Management and Budget (OMB) in 1997. Even with those standards in place, it was just last month that the Bureau of Labor Statistics began publishing jobs data for American Indians and Alaska Natives.
While many agencies in the federal statistical system go beyond the minimum standards—including reporting data for multiracial populations—the standards (in my opinion) are overdue for an overhaul.
What Can Be Done to Make Disaggregated Data More Widely Available?
Recently, PRB joined more than 150 other signatories in a letter to the acting director of the Office of Management and Budget requesting that the OMB minimum standards be revised. The letter included requests, developed in collaboration with community groups and based on the latest research on self-identification, such as the following:
- “The use of a combined question versus separate questions to measure race and ethnicity and question phrasing as a solution to race/ethnicity question nonresponse;
- The classification of a Middle Eastern and North African (MENA) group and distinct ethnic reporting category;
- The description of the intended use of minimum reporting categories; and
- The salience of terminology used for race and ethnicity classifications and other language in the standard.”
The letter included specific suggestions focused on the data needs of Asian American populations, Native Hawaiian and Pacific Islander populations, Hispanic/Latino populations, Middle Eastern and North African populations, and Black and African American populations.
But racial demographics are just the tip of the disaggregation iceberg. Sexual orientation and gender identity, age, geography, education, and other topics also deserve data systems that are robust enough to support disaggregation. The solution for survey data is to structure—and fund—surveys that have enough records to support detailed disaggregation. This could be achieved through larger sample sizes overall, as proposed by The Census Project for the American Community Survey, or through strategic oversampling of specific smaller populations of interest.
For administrative data such as birth and death records, education statistics, and others, many agencies already collect more data on race/ethnicity, age, income, and sexual orientation and gender identity than they report. Reporting is often limited by staff time, data quality issues, and, in some cases, privacy and confidentiality concerns. In these cases, newer tools, such as synthetic estimation or noise infusion, may help achieve a balance between reporting disaggregated data and protecting individual privacy.
What Can a Data Geek Do in the Meantime?
There is no one perfect answer, but here are some suggestions:
- Disaggregate data when you can.
- Consider whether reporting “data not available,” rather than aggregating, could be a powerful advocacy tool to spotlight data gaps.
- Be clear about which groups you’re aggregating and why.
- When reporting data for larger groups, speak to what is known about how smaller groups may differ from the aggregate trend.
- Communicate with data providers about data gaps and advocate for more funding for federal and state agencies to collect and disseminate the data you need.
Only by breaking down the data can we understand enough to make wise policy decisions that build up our communities.
Note: A version of this piece first appeared in the Association of Public Data Users blog. It has been modified slightly for PRB.