Building Up Communities by Breaking Down Data
Only by disaggregating data can we understand enough to make wise policy decisions that build up our communities.
If the mantra in real estate is โlocation, location, location,โ the mantra for public data users isโor should beโโdisaggregation, disaggregation, disaggregation.โ
Why Does Disaggregation Matter?
The average income in your neighborhood will go up if a billionaire moves in, but that rising average doesnโt tell you anything about whether income rose, fell, or remained the same for everyone else. A new billionaire neighbor is a dramatic example but one that illustrates the point: Disaggregated data are crucial for understanding how people are doing.
We couldnโt uncover these truths without disaggregated data:
- Even as suicide rates fell between 2019 and 2020, rates rose for women ages 15-24.
- Even though job recovery has been robust since the recession in 2020, Black women have not seen equivalent job gains.
- While more households nationwide have high-speed internet, rural areas lag behind urban areas in broadband access.
During the 2021 conference of the Association of Public Data Users (APDU), speaker Rhonda Vonshay Sharpe provided numerous examples of how disaggregating dataโby gender, race and ethnicity, and educationโprovides crucial insights for improving public health and well-being. Her talk was inspiring and also left many in the audience wonderingโฆ
If Disaggregation Is So Important, Why Isnโt It More Common?
To be fair, some people probably just donโt think about disaggregation. But there are bigger, systemwide challenges.
Sometimes survey sample sizes are too small to produce reliable estimates for a population of interest. When this happens, researchersโhoping to provide some data rather than noneโmay group smaller demographic groups together so they have enough combined survey responses to get an estimate they can report.
I have done this kind of aggregation in my own workโgrouping across income levels, sexual orientations, racial/ethnic groups, geographies, or agesโbecause in the context of the work I was doing, aggregated data were preferable to tables full of missing data. If youโre considering aggregating groups, the Urban Institute provides some handy guidelines. And remember that sometimes noting in your work that the sample size is small or estimates are unreliable is important because it signals that thereโs a data gap.
Speaking of data gaps: Sometimes data are only reported for aggregate groups. A visitor to federal statistical websites will often find data for just five racial groups (American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White) and two ethnic groups (Hispanic or Latino and Not Hispanic or Latino). These groups reflect minimum standards set by the U.S. Office of Management and Budget (OMB) in 1997. Even with those standards in place, it was just last month that the Bureau of Labor Statistics began publishing jobs data for American Indians and Alaska Natives.
While many agencies in the federal statistical system go beyond the minimum standardsโincluding reporting data for multiracial populationsโthe standards (in my opinion) are overdue for an overhaul.
What Can Be Done to Make Disaggregated Data More Widely Available?
Recently, PRB joined more than 150 other signatories in a letter to the acting director of the Office of Management and Budget requesting that the OMB minimum standards be revised. The letter included requests, developed in collaboration with community groups and based on the latest research on self-identification, such as the following:
- โThe use of a combined question versus separate questions to measure race and ethnicity and question phrasing as a solution to race/ethnicity question nonresponse;
- The classification of a Middle Eastern and North African (MENA) group and distinct ethnic reporting category;
- The description of the intended use of minimum reporting categories; and
- The salience of terminology used for race and ethnicity classifications and other language in the standard.โ
The letter included specific suggestions focused on the data needs of Asian American populations, Native Hawaiian and Pacific Islander populations, Hispanic/Latino populations, Middle Eastern and North African populations, and Black and African American populations.
But racial demographics are just the tip of the disaggregation iceberg. Sexual orientation and gender identity, age, geography, education, and other topics also deserve data systems that are robust enough to support disaggregation. The solution for survey data is to structureโand fundโsurveys that have enough records to support detailed disaggregation. This could be achieved through larger sample sizes overall, as proposed by The Census Project for the American Community Survey, or through strategic oversampling of specific smaller populations of interest.
For administrative data such as birth and death records, education statistics, and others, many agencies already collect more data on race/ethnicity, age, income, and sexual orientation and gender identity than they report. Reporting is often limited by staff time, data quality issues, and, in some cases, privacy and confidentiality concerns. In these cases, newer tools, such as synthetic estimation or noise infusion, may help achieve a balance between reporting disaggregated data and protecting individual privacy.
What Can a Data Geek Do in the Meantime?
There is no one perfect answer, but here are some suggestions:
- Disaggregate data when you can.
- Consider whether reporting โdata not available,โ rather than aggregating, could be a powerful advocacy tool to spotlight data gaps.
- Be clear about which groups youโre aggregating and why.
- When reporting data for larger groups, speak to what is known about how smaller groups may differ from the aggregate trend.
- Communicate with data providers about data gaps and advocate for more funding for federal and state agencies to collect and disseminate the data you need.
Only by breaking down the data can we understand enough to make wise policy decisions that build up our communities.
Note: A version of this piece first appeared in the Association of Public Data Users blog.ย It has been modified slightly for PRB.
