Data
The INDEPTH iSHARE2 data repositoryFootnote 1 went online in 2014 and provides a unique resource of high-quality, fully documented HDSS longitudinal datasets available for download to a wide range of users, including HDSS-linked scientists and analysts, researchers and students [5]. The repository, which is growing over time, holds amongst others, core micro datasets describing the key demographic events of more than 25 HDSS populations and unique data on cause specific mortality [5]. Recently, the first of a series of multi-centre core micro datasets attached to the MADIMAH project has been released and is structured to examine determinants of in- and out-migration, particularly the education status of the migrant [6].
Methods
The efficient use of these micro datasets requires that users are able to handle HDSS data structures (such as the residency episode files) and understand the range of core events that alter residency status in the HDSS, especially, in- and out-migration, births and deaths. These data structures and properties form the necessary foundation for the statistical analyses of population dynamics. In order to address these requirements, the MADIMAH group developed a manual based on the group’s experiences of conducting comparative analyses across multiple HDSS sites, and of training HDSS data scientists and analysts in these methods. The intention was to provide data managers and analysts who manage raw questionnaire data with a step-by-step description of the process of structuring and preparing a dataset for the calculation of demographic rates and EHA. The approach was to create a common language and set of codes that can create synergies and enable communication across larger communities of data managers and analysts working with longitudinal research designs.
Results: training manual
The training manual is available on-line as Additional file 1 to this note. It provides a general introduction to event history data management. The manual leads the user through all the procedures necessary to format and analyse longitudinal data. It demonstrates how to create a core residency file suitable for EHA and how to check for inconsistencies in the data. The approach is flexible and covers the calculation of basic demographic rates, as well as more complex determinants analysis through the addition of individual and household attributes. The manual illustrates how to enrich the database with new events with precise or imputed dates of occurrence. Finally, the manual explains how to create duration events of several types. The methods outlined in the manual are implemented in detailed coding using Stata software. All sections start with an example of an output file, followed by a check-list and conclude with further examples or programmes needed to solve specific technical issues. Longer, more detailed Stata programmes are available in Additional file 1: Appendix.
This manual is the first comprehensive guide to HDSS longitudinal data management and has become a standard for INDEPTH member HDSS Centres. It can be implemented on longitudinal data from other sources, including register-based, retrospective, or cohort data. It forms the first part of a two-part series. The second manual will guide analysts through the computation of demographic rates and the analysis of determinants and outcomes of demographic processes, using the longitudinal dimension in the data.