Technical Director, Demographic Research
Analyzing Big Data on a Shoestring Budget
Big data has opened a new world for demographers and public health scientists to explore. But is analyzing big data practical and affordable?
March 27, 2023
Luis Gabriel Cuervo
Universitat Autònoma de Barcelona
Max Planck Institute for Demographic Research and Portland State University
Max Planck Institute for Demographic Research
Big data has opened a new world for demographers and public health scientists to explore, to gain insights into social and health phenomena using the myriad digital traces we leave behind in our daily lives. But is analyzing big data practical and affordable? Researchers and organizations who have not made the leap might wonder: Do we need a lot more funding? Supercomputers? Armies of data scientists?
Three studies, presented recently in a PRB Demography Talk, show the feasibility of conducting research on a proverbial shoestring—using big data that are publicly, freely available to anyone with a personal computer and Wi-Fi connection.
Study 1: Can Google data help measure health care access more accurately?
The first study, presented by Luis Gabriel Cuervo of the Universitat Autònoma de Barcelona and the AMORE project, used Google mobility data to assess the effect of traffic congestion on people’s ability to access health services in Cali, Colombia, a city of 2.3 million. The study aimed to improve how health care accessibility is measured and communicated, to inform urban and health services planning.
Cuervo assembled a multidisciplinary research team, including mobility experts, to examine travel times from where people live to urgent and frequently used health services. The team used Google’s Distance Matrix API, which provides travel times and distance between origins and destinations, accounting for changing traffic conditions. The data are generated from Google Maps on people’s cell phones.
Combining this information with census and health services data, the study measured travel times repeatedly and revealed significant inequality by sociodemographic characteristics. On typical days, 60% of the city’s population lived more than 15 minutes by car from emergency care, with those in the poorest neighborhoods facing the longest travel times and a greater impact from traffic congestion.
Studies 2 and 3: Can Google data help predict changes in birth rates and examine excess deaths from COVID-19 related shutdowns?
In another study, Joshua Wilde from the Max Planck Institute for Demographic Research (MPIDR) and Portland State University asked, can Google search data predict whether COVID-related shutdowns will lead to a baby boom or bust? In 2020, early in the pandemic, Wilde and team constructed a forecasting model based on volumes of Google searches with keywords related to conception, pregnancy, childbirth, and economic stability. Their thinking was that if searches increased sharply for keywords such as “pregnancy test” and “missed period,” one might expect higher birth rates seven to nine months later. On the other hand, prior research had associated unemployment with lower birth rates—so if unemployment-related searches climbed, one might expect a baby bust.
After selecting the most salient among 40 possible keywords from Google Trends, their model predicted a 12% decline in monthly birth rates between November 2020 and February 2021. The team was able to compare their results to official U.S. birth statistics in 2022. While not a perfect match, they found the model’s predictions remarkably close to actual declines in birth rates, with Google searches related to unemployment being the strongest predictor of pregnancies and subsequent birth rates.
In the third study, Diego Alburez-Gutierrez and colleagues from MPIDR used Google mobility data to examine how restrictions on people’s movements during the pandemic affected death rates in England and Wales between February and August 2020. Google’s COVID-19 Community Mobility Reports contain aggregated records of movements tracked on cell phones with Google Maps (through October 2022), which researchers can search by mode of travel and type of activity.
The research team looked specifically at the extent to which stay-at-home measures during the first wave of COVID-19 affected excess mortality—the difference between current death rates and those that would be expected if mortality followed the patterns of prior years. Examining mobility patterns, they were able to estimate excess deaths with and without government-mandated restrictions on movement.
They estimated that in London, about 22,000 deaths were averted due to mobility restrictions; in England and Wales as a whole, more than 94,000 deaths were averted over the seven-month period. In other words, without shutdown measures, the number of excess deaths could have more than doubled during this period, especially in the London area. These findings are highly relevant for policymakers considering responses to future disease outbreaks.
Studies show promises and pitfalls of big data
What lessons can we draw about using publicly available, low-cost big data? Google data is surprisingly easy to find and use. For example, researchers can access the search data by selecting topics of interest and downloading the desired data in a spreadsheet. The data are nationally and internationally comparable due to Google’s global reach and—as of now—its generous sharing of data files through a standard interface. By relying on these tools, the research teams had access to millions of data points that would have been nearly impossible to collect using more conventional methods.
In the PRB Demography Talk, some pitfalls of Google data came to light, some of which apply to all big data:
- Data drawn from online activity do not represent the whole population, only those using electronic devices and Google apps.
- The data are aggregated and anonymized so that the identities of individuals are not disclosed. While essential to protect people’s privacy, such aggregation prevents researchers from analyzing specific characteristics of the people studied.
- Google data are not available where the number of searches or map users is too low to be aggregated, such as in remote locations or places with poor connectivity.
- Google (or any other company, for that matter) can make any changes to the data they collect or discontinue access at any time without notifying data users, adding uncertainty to those planning research studies. Also, data may not be archived and therefore cannot be retrieved for later analyses.
Moving forward with big data
Cuervo’s experience in Colombia showed that multidisciplinary research teams are essential to bring in different skill sets, interests, and resources to a study. Collaboration among scientists in various disciplines, governments, health service providers, and end-users of data from the outset also made it more likely research results would be used and acted upon.-
Alburez-Gutierrez observed that relying on “the kindness of strangers” (in this case, technology platforms like Google) to provide digital trace data is not enough to build solid research infrastructure. Instead, he argued, “We should have forward-looking rather than reactive policies that allow researchers to access data, and regulation may be a way of achieving this.” In other words, research institutions and governments need to negotiate agreements in which Google and other companies create systems that provide secure access to their data for researchers.
Greater access to these types of big data will enable researchers to explore many other pressing issues in the future, such as predicting the potential impact of climate change on births, deaths, migration, and other trends. Wilde and colleagues are already investigating some of these questions, for example, looking at how warming temperatures could drive changes in Google searches, and how these reflect changes in behaviors, such as dating, that are linked to fertility change. These and other innovative studies will enable policymakers to develop better-informed, real-time responses to the rapidly changing conditions of today’s world.
2:16 Presentation by Luis Gabriel Cuervo
19:18 Presentation by Joshua Wilde
40:00 Presentation by Diego Alburez-Gutierrez
55:37 Q&A session
For more information on this topic, see PRB’s Population Bulletin, “Demystifying Big Data for Demography and Global Health.” PRB Demography Talks is a webinar series exploring innovative demographic research.