- Data note
- Open Access
Brazilian spatial, demographic, and socioeconomic data from 1996 to 2020
BMC Research Notes volume 15, Article number: 159 (2022)
We present a database on Brazilian spatial, demographic, and socioeconomic characteristics from 1996 to 2020. This database aims for integration and harmonization with epidemiological data from two major studies. It can also be a valuable database for designing and conducting various types of epidemiologic research, such as health inequality studies, ecological studies (mapping and time-trends), and multi-level analysis.
The database gathers official information obtained via open sources from the Brazilian Institute of Geography and Statistics, the Institute for Applied Economic Research, and the Ministry of Health. It includes 139,153 observations and 26 attributes aggregated by years and policy-relevant geographic units on geocoding of municipality centroids, total population size, child population by age-group, birth and mortality measures, Brazilian Municipal Human Development Index, Gini coefficient, Gross Domestic Product, and sanitation. We automated all data processing and curation in the free and open software R.
Spatial, demographic, and socioeconomic information is crucial for research, planning, and policy development in health and other sectors. It helps countries compute many health indicators, optimize budgeting and resources allocation, measure and track progress toward international goals and national priorities, and support effective decision-making [1, 2]. Brazil is a federative republic with more than 211 million inhabitants, representing 47% of South America total population, and a well-developed national statistical system with 98% of births and 96% of deaths registered .
We present a database on Brazilian spatial, demographic, and socioeconomic characteristics from 1996 to 2020. This database aims for integration and harmonization with epidemiological data from two major studies [4, 5], including avoidable child mortality, hospitalization, immunization, breastfeeding, and primary health care resources . It can also be a valuable database for designing and conducting various types of epidemiologic research, such as health inequality studies, ecological studies (mapping and time-trends), and multi-level analysis.
The database gathers official information obtained via open sources from the Brazilian Institute of Geography and Statistics (IBGE) [7, 8], the Institute for Applied Economic Research (IPEA) , and the Ministry of Health (MoH) [10, 11]. Data extraction occurred on November 18, 2021. The database has 139,153 observations and 26 attributes aggregated by years (1996–2020) and policy-relevant geographic units (country, macroregions, states, municipalities, and capitals). It includes geocoding of municipality centroids, total population size, child population by age-group, birth and mortality measures, Brazilian Municipal Human Development Index (MHDI), Gini coefficient, Gross Domestic Product (GDP), and sanitation. We automated all data processing and curation in the free and open software R. The codes can be audited, replicated, and reused to produce alternative analysis.
Table 1 provides an overview of the report’s files and datasets stored in Synapse. The R scripts hold the codes for the data extraction (data files 1–5), transformation, and loading (data files 6–11) routines. We extracted the data in its original format (datasets 1–5) and separately saved each workflow endpoint’s processed data (datasets 6–11). The HTML files show type-specific information for all attributes of the treated datasets, including statistical summaries and missing frequencies (data files 12–17). Data file 11 builds the database (dataset 11), and data file 18 documents its metadata and attribute descriptions.
The data workflow comprises two main steps. The first step covered the extraction, transformation, and loading routines of data obtained from primary sources of information. The data extraction resulted in 1452 raw files, including spatial data of the Brazilian municipalities, individual data on births and deaths, and aggregated data on population size and socioeconomic characteristics. The key features of data transformation were (i) variables selection/renaming and observations filtering, (ii) calculation of municipality centroids, (iii) correction of codes and names identifying geographic units, (iv) cleansing numeric values, e.g., excluding special characters, and (v) enrichment of the municipal datasets with data aggregated by states, macroregions, and country. This step produced five datasets treated and usable in the database construction.
The second step in the workflow involved data integration, harmonization, and enrichment. The IBGE treated-dataset defined the final database structure, in which we combined the other treated datasets according to the years and codes of geographic units. As socioeconomic data was not available for all time points, we applied a simple imputation method for missing data using the next or previous observation of the geographic units. Furthermore, we created the following variables: mortality rate, infant mortality rate, birth rate, estimated population of children under 1-year-old and 1-year-old. The number of children by age group considered two business rules. For children under 1-year-old, we used the MoH estimates in 1996–2005 and the number of live births in 2006–2020. For children of 1-year-old, we used the MoH estimates in 1996–2005 and our estimates in 2006–2020 (calculation method: the difference between live births and infant deaths occurred in the previous year). R codes and data processing/curation were peer-reviewed, and their results compared to the information presented on official sites.
We should mention the potential limitations and warnings of the database. First, our eight socioeconomic indicators have different timeframes because of their availability at the municipal level—GDP total and per capita from 1999 to 2018. MHDI (global, education, longevity, and income dimensions), Gini coefficient, and sanitation only 1991, 2000, 2010. It’s worth noting that Brazilian National Household Sample Survey provides some of these indicators for capitals, states, macroregions, and Brazil with a longer timeframe. Moreover, we adopted a simple imputation method for missing data, with several intrinsic limitations, and we presented GDP indicators in Brazilian reais and unadjusted for purchasing power parity. Second, total population size came from the results of demographic censuses (2000, 2010), inter-census counts (1996, 2007), and population estimates (other years), the only ways to capture these data at the municipal level. Our results for states, macroregions, and Brazil may diverge somewhat from population projections, which do not incorporate post-baseline territorial boundary updates. Finally, the Live Birth Information System (SINASC) and the Mortality Information System (SIM), used to collect live births and deaths data, have variable coverages over time and across geographic units—i.e., lower at the beginning of historical series and underserved areas. Nevertheless, overall SINASC and SIM coverages are high—98% and 96%, respectively .
Availability of data and materials
The data described in this Data note are freely and openly available on the Synapse repository at https://doi.org/10.7303/syn26525521. Anyone can browse the content on the Synapse website, but you must register for an account using your email address to download the files and datasets. Please see Table 1 and references [4, 5, 12] for details and links to the data.
Gross domestic product
Brazilian institute of geography and statistics
Institute for applied economic research
Brazilian municipal human development index
Ministry of health
Live birth information system
Mortality information system
World Health Organization. Score for health data technical package: global report on health data systems and capacity, 2020. Geneva: WHO; 2021.
World Health Organization. World health statistics 2021: monitoring health for the SDGs, sustainable development goals. Geneva: WHO; 2021.
Ministry of Health (Brazil). Health Brazil 2020/2021: an analysis of the health situation and the quality of information. Brasília: Ministry of Health; 2021.
Boccolini CS. Breastfeeding in Brazil in the MATRECI model: mapping, trending, clustering, and impact. 2021. https://doi.org/10.7303/syn25049520. Accessed 25 Nov 2021.
Boccolini PMM. COVAC: the role of social media, Bolsa Familia program, and Primary Health Care in vaccination coverage for children under five in Brazil. https://doi.org/10.7303/syn25148356. Accessed 25 Nov 2021.
Baroni L, Alves RFS, Boccolini CS, et al. Database on the coverage of the “Bolsa-Família” conditioning cash-transfer program: Brazil, 2005 to 2021. BMC Res Notes. 2021;14:435.
Brazilian Institute of Geography and Statistics. https://ftp.ibge.gov.br/ (2021). Accessed 25 Nov 2021.
Pereira RHM, Gonçalves CN. geobr: download official spatial data sets of Brazil. 2021. https://CRAN.R-project.org/package=geobr. Accessed 25 Nov 2021.
Institute for Applied Economic Research (Brazil). Atlas human development in Brazil. 2021. http://www.atlasbrasil.org.br/. Accessed 25 Nov 2021.
Ministry of Health (Brazil). https://datasus.saude.gov.br/populacao-residente (2021). Accessed 25 Nov 2021.
Ministry of Health (Brazil). https://datasus.saude.gov.br/transferencia-de-arquivos/ (2021). Accessed 25 Nov 2021.
Alves RFS, et al. Data resource profile: BASICS—spatial, demographic, and socioeconomic data for epidemiologic research Brazil 1996–2020. Synapse. 2021. https://doi.org/10.7303/syn26525521.
This work was supported, in whole or in part, by the Bill & Melinda Gates Foundation [Grant ID INV 027961] and National Council for Scientific and Technological Development (CNPq). Under the grant conditions of the Foundation, a Creative Commons Attribution 4.0 Generic License has already been assigned to the Author Accepted Manuscript version that might arise from this submission.
Ethics approval and consent to participate
We used data from open sources. The Brazilian Institute of Geography and Statistics, the Institute for Applied Economic Research, and the Ministry of Health of Brazil are committed to respecting the ethical precepts and ensuring data privacy and security. The Brazilian legislation exempts the use of public and anonymized secondary data from ethical approval.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Fernandes Santos Alves, R., de Moraes Mello Boccolini, P., Baroni, L.R. et al. Brazilian spatial, demographic, and socioeconomic data from 1996 to 2020. BMC Res Notes 15, 159 (2022). https://doi.org/10.1186/s13104-022-06044-w
- Population characteristics
- Socioeconomic factors
- Vital statistics
- Routinely collected health data
- Health information systems