A geodatabase of blood pressure level and the associated factors including lifestyle, nutritional, air pollution, and urban greenspace

Hypertension is a prevalent chronic disease globally. A multifaceted combination of risk factors is associated with hypertension. Scientific literature has shown the association among individual and environmental factors with hypertension, however, a comprehensive database including demographic, environmental, individual attributes and nutritional status has been rarely studied. Moreover, an integrated spatial-epidemiological approach has been scarcely researched. Therefore, this study aims to provide and describe a geodatabase including individual-based and socio-environmental data related to people living in the city of Mashhad, Iran in 2018. The database has been extracted from the PERSIAN Organizational Cohort study in Mashhad University of Medical Sciences. The data note includes three shapefiles and a help file. The shapefile format is a digital vector storage format for storing geometric location and associated attribute information. The first shapefile includes the data of population, air pollutants and amount of available green space for each census block of the city. The second shapefile consists of aggregated blood pressure data to the census blocks of the city. The third shapefile comprises the individual characteristics data (i.e., demographic, clinical, and lifestyle). Finally, the fourth file is a guide to the previous data files for users.


Objective
Hypertension is one of the prevalent health problems that causes almost 10.4 million deaths worldwide [1] and known as a growing challenge in countries with ageing population [2]. It is estimated that more than threequarters of hypertensive patients are from developing countries [3]. A recent study has shown that the hypertension prevalence is 25% in Iran between 2004 and 2018 [4], which varies across the country [5]. This geographical variation designates the importance of geospatial analyses. Hypertension causes some severe complications such as cardiovascular and kidney diseases [6], moreover, it is one of the risk factors for worse COVID-19 outcomes [7,8]. Hypertension occurrence is associated with a combination of genetic and lifestyle-related factors such as lack of physical activity [9], smoking [10], obesity [9,11], alcohol consumption [11] and environmental factors, including the amount of available green space and air pollution [12][13][14][15][16][17][18][19][20]. The relationship between air pollution factors (pm1, pm 2.5, pm10) and hypertension has been confirmed [12,13,[21][22][23][24]. Targeted strategies should be applied to prevent and control hypertension [25]. However, spatial analyses of all mentioned risk factors are needed to provide evidence-based information to develop appropriate interventions in areas with high priority. Geospatial analysis that is conducted by the geographical information system (GIS) help researchers and policymakers to analyze, identify, and visualize geographical patterns of diseases [26][27][28]. Thus, in this study, we used GIS to link and quantify risk factors to describe and provide a geodatabase of the PERSIAN Organizational Cohort study [29], as a set of individual and socioenvironmental factors, to determine the blood pressure level of people living in Mashhad City in 2018. This geodatabase is a practical tool to identify high-risk areas of hypertension and exploring socio-environmental factors in future studies for managing resources and implementing targeted interventions [30].

Data description
In this study, 5938 samples were obtained through the PERSIAN Cohort study in Mashhad [29]. The city of Mashhad is the second most populous city in Iran, located in the northeast of the country [31]. The data have been linked to the census tract level that is the finest available geographical level in Iran [32]. Hypertension is the most significant risk factor for cardiovascular disease (32% of all deaths worldwide in 2019) [33]. It is also associated with severity and mortality in patients with coronavirus 2019 (COVID-19) [7,8]. Therefore, the connection with the two main global causes of death conveys the importance of this database.
This data note includes three data files and a help file (Table 1). Data file 1 has population and air pollutants data that have been aggregated into the ‫‬ 2018 census tracts. This data file contains census tracts identification number, total population, male and female population, air quality index (AQI) and green space per capita. It also contains the amount of air pollutants in each census tract including sulfur dioxide (SO 2 ), particulate matter less than 10 microns (PM 10 ), carbon monoxide (CO), nitrogen dioxide (NO 2 ), particulate matter less than 2.5 microns (PM 2.5 ) and ozone (O 3 ).
Data file 2 includes aggregated blood pressure data into the census tracts containing number of individuals without hypertension, cases in the pre-hypertensive stage (Elevated), cases with type 1 and type 2 blood pressure [34] and finally number of total cases. According to the definition of hypertension [34], the participants were categorized into normal (systolic blood pressure (SBP) less than 120 mm Hg and diastolic blood pressure (DBP) less than 80 mm Hg), elevated (SBP between 120 and 129 mm Hg and DBP less than 80 mm Hg), stage 1 hypertension (SBP between 130 and 139 mm Hg or DBP between 80 and 89 mm Hg) and stage 2 hypertension (SBP more than 140 mm Hg or DBP more than 90 mm Hg).
Data file 3 includes the individual data of participants and their nutritional habits. The data include age, sex, SBP, DBP, lifestyle information (height, weight, body mass index, amount of meat and vegetable intake, alcohol intake, smoking and salt consumption).
Data file 4 is a guide in excel format to using data of data files 1-3. The data do not include any direct identification data of patients, in data file 3, the point locations have been jittered around a 500-m circle to protect the individuals' privacy.
These data files can be used by researchers in different disciplines such as health geography, urban planning, medicine and healthcare ecosystem research. A multilevel regression model can be used to quantify the impact of area level characteristics on individuals' hypertension [35]. GIS has a great capacity to integrate diverse data from different sources including spatial, temporal and descriptive components into one framework [36][37][38]. These data can help to investigate the potential relationship of lifestyle and environmental risk factors with hypertension. Table 1 shows the details of each dataset and provides access links to these data.

Limitations
One of the limitations of the data is that the PERSIAN COHORT study may not generalizable to the total population in Mashhad. Further, we obtained the air pollution