Validation of anaemia, haemorrhage and blood disorder reporting in hospital data in New South Wales, Australia

Objective Hospital data are a useful resource for studying pregnancy complications, including bleeding-related conditions, however, the reliability of these data is unclear. This study aims to examine reliability of reporting of bleeding-related conditions, including anaemia, obstetric haemorrhage and blood disorders, and procedures, such as blood transfusion and hysterectomy, in coded hospital records compared with obstetric data from two large tertiary hospitals in New South Wales. Results There were 36,051 births between 2011 and 2015 included in the analysis. Anaemia and blood disorders were poorly reported in the hospital data, with sensitivity ranging from 2.5% to 24.8% (positive predictive value (PPV) 12.0–82.6%). Reporting of postpartum haemorrhage, transfusion and hysterectomy showed high sensitivity (82.8–96.0%, PPV 78.0–89.6%) while moderate consistency with the obstetric data was observed for other types of obstetric haemorrhage (sensitivity: 41.9–65.1%, PPV: 50.0–56.8%) and placental complications (sensitivity: 68.2–81.3%, PPV: 20.3–72.3%). Our findings suggest that hospital data may be a reliable source of information on postpartum haemorrhage, transfusion and hysterectomy. However, they highlight the need for caution for studies of anaemia and blood disorders, given high rates of uncoded and ‘false’ cases, and suggest that other sources of data should be sought where possible. Supplementary Information The online version contains supplementary material available at 10.1186/s13104-021-05584-x.


Introduction
Hospital data are an efficient, cost-effective resource to study trends and outcomes of medical conditions and procedures [1][2][3][4][5], including during pregnancy. The range of procedures and conditions that can affect pregnant women and their babies are not always captured in obstetric data. Obstetric data are often recorded in a form difficult to analyse, such as free text, and are less commonly available on a population level than hospital data. Hospital data, comprised of coded diagnoses and procedures following international standards, provide a useful alternative or supplement to obstetric data for the purposes of research [1][2][3][4][5][6]. However, the reliability of findings depends on the extent to which hospital data accurately identify patients with and without the condition or procedure. Discrepancies in hospital data can occur at various stages of the recording process, including initial documentation, coding, and data entry, and accuracy may change over time, influenced by changes in practice, guidelines, and focus on particular conditions. Anaemia, obstetric haemorrhage, blood disorders such as coagulation and platelet disorders, and related procedures, are associated with adverse maternal and neonatal outcomes [7][8][9][10]. Previously, nutritional anaemia [11], placental abruption [12,13] and hysterectomy [13] have been shown to have poor sensitivities, while haemolytic anaemia [11], transfusion [13], and coagulation disorders [11] have been shown to have good sensitivities in hospital data. However, these studies were based on births in 2000 [12] and 2002 [11,13], and it is not known whether reporting has changed in the intervening years.
Validation studies comparing coded hospital data with medical charts can be used to assess the reliability of data sources, but are time consuming, expensive, and tend to review a small number of records and short time period. An alternative approach is to compare two independent databases [14][15][16]. Here, we compare reporting of bleeding-related conditions and procedures in pregnancy in coded hospital data extracted from the electronic medical record to obstetric data from the ObstetriX database, using ObstetriX as the reference standard.
The hospital data are coded following the International Classification of Diseases (ICD) and Australian Classification of Health Interventions by trained clinical coders, using clinical documentation during an inpatient episode of care to assign the appropriate diagnosis, and where relevant, procedure code(s). Government policy and ICD coding standards mostly limit what is coded to conditions affecting the current admission, require substantiation by clear medical record documentation, and prohibit interpretation of results [17]. Coded data are used to facilitate activity based funding, healthcare management and planning, and also inform the population-level New South Wales Admitted Patient Data Collection.
ObstetriX is a clinical database specific to the pregnancy, birth and early postnatal period, collected by midwives, a subset of which forms the statewide New South Wales Perinatal Data Collection. This population-based data collection has shown high levels of accuracy for reporting of diagnoses and procedures during labour and delivery [18], and validation studies of similar Australian data collections such as the Victorian Perinatal Data Collection have shown high accuracy for most data items [19][20][21]. Given the different purposes and perspectives of the databases, ObstetriX may be considered an imperfect reference standard, however most reference standards are not without error and uncertainty, particularly where self-report is relied upon [22][23][24], and large population datasets have been shown to be robust to the introduction of random errors and omissions [25].
The aim of this study is to determine the consistency of reporting of anaemia, haemorrhage and blood disorders during pregnancy, and related procedures including blood transfusion and hysterectomy, in hospital records compared with an obstetric database. We aimed to compare reporting between two large tertiary hospitals, and to determine whether patient or pregnancy characteristics affect reporting.

Methods
Women giving birth to singleton infants (≥ 24 weeks gestation) in two tertiary hospitals in the Sydney metropolitan area, New South Wales (NSW), Australia, between 2011 and 2015 were included. In NSW, all births in a hospital or birth centre are treated as inpatient admissions and assigned an electronic medical record and an obstetric record. Delivery in hospital and birth centres represent 99% of births in the state [26]. Women who had prearranged to give birth in a different hospital to the hospital of birth were excluded, because antenatal data would have been collected at the hospital of booking and may be incomplete.
ObstetriX contains maternal health and demographic data, obstetric history and pregnancy details. Initial information is obtained on pregnancy and medical history at the face-to-face booking consultation with a midwife (an outpatient encounter, by 16 weeks gestation). The record is updated with labour, birth and postnatal information collected during the birth admission. Data are entered by midwives and recorded in checkboxes or drop-down menus, with a small amount of free text available. The majority of procedures and conditions are recorded as present, absent or unknown/missing. The midwives do not have access to the hospital codes, as coding is performed four to six weeks after discharge.
The coded hospital data contain diagnoses coded according to the International Classification of Diseases, Tenth Revision, Australian Modification (ICD-10-AM) with a small number coded using Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) codes. Procedures are coded following the Australian Classification of Health Interventions, Eighth Edition. Coding is performed by trained clinical coders based on clinical documentation in the electronic medical record. Medical records for admissions throughout pregnancy and the birth were searched for the relevant diagnoses and procedures (codes provided in Additional file 1: Table S1).
Records were deterministically linked using patient Medical Record Numbers and checked using other personal identifiers, by personnel external to the project. Data were de-identified data prior to analysis.
Reporting in hospital data was compared to that in obstetric data as the reference standard. Sensitivity, specificity, positive predictive values (PPV) and negative predictive values (NPV) are reported with exact confidence intervals. These measures were also calculated separately for the two hospitals. Analyses were performed in SAS 9.3.

Results
Of the 38,343 singleton births of at least 24 weeks gestation between 2011 and 2015, 36,051 (95.1%) gave birth at their booking hospital and were included in the analysis.
Rates of diagnoses were similar across the two data sources, with the exception of anaemia (haemoglobin < 110 g/L) in pregnancy, B12 deficiency anaemia and coagulation disorders (Table 1). For anaemia, platelet disorders, coagulation disorders, and to a lesser extent placenta accreta, numbers were similar between the databases, however there was a low level of overlap, with different women identified in each source.
Rates of diagnoses and procedures were similar between hospitals, with the exception of postpartum haemorrhage (PPH), antepartum haemorrhage (APH) and nutritional anaemia, which were higher at Hospital One (Additional file 1: Table 2).
Sensitivity, specificity, PPV and NPV for reporting of anaemia were similar for nulliparous and parous women ( Table 3). Sensitivity and PPV for anaemia tended to increase by year of birth, and were higher for hospital medical, private obstetrician or general practitionerfocussed models of care compared to midwife-centred care. Sensitivity tended to increase, with decreases in specificity and NPV, where women received a blood transfusion compared to where they did not, except for intrapartum haemorrhage (IPH) and APH, where transfusion was associated with decreased sensitivity. Potential misclassification was identified, with 32.2% of the 58.1% whose IPH was not identified in the hospital data reported with PPH (29 of the 90 missed cases), compared to 16.9% (11 of 65) among those whose IPH was reported in the hospital data. In comparison, among those whose PPH was not identified in the hospital data (745 of 4330), 3.0% were recorded with IPH (22 of 745), compared to 0.6% among those whose PPH reported in ObstetriX (20 of 3565 ).

Discussion
All types of anaemia were under-ascertained in ICDcoded hospital data, with a high proportion of cases reported in the hospital data and not in obstetric data. Using combined categories for anaemia did not meaningfully improve sensitivity or PPV. Thalassaemia, platelet disorders, and coagulation disorders were similarly underreported. Reliability for APH and IPH were moderate, and using a broad category for any bleeding before birth improved sensitivity and PPV. Postpartum haemorrhage, transfusion, placental complications (including accreta, praevia, and abruption), and hysterectomy, were reported with moderate to high consistency in the obstetric data. Compared with previous studies, we found better reporting of hysterectomy [13] and PPH [13], but poorer reporting for APH [13], coagulation disorders [11], placenta praevia [12,13], haemolytic anaemia [11], and placenta accreta [13]. Accuracy was similar to previous studies for transfusion [13,14], placental abruption [12,13] and nutritional anaemia [11].
Consistency of reporting of anaemia improved with time and where antenatal care was provided by a hospital medical team, private obstetrician or general practitioner, compared to midwife-centred care. This may be reflective of a lack of an agreed definition of anaemia  for pregnant women, amid discrepancies between the National Blood Authority and World Health Organisation definitions [27]. Having a transfusion affected reliability of reporting, with anaemia showing higher sensitivity for patients who received a blood transfusion, however this reduced specificity and PPV remained low.
Having a transfusion made little difference to sensitivity for PPH, although it was associated with a reduction in specificity, and transfusion was associated with reduced sensitivity for APH and IPH. Having a transfusion slightly increased sensitivity for thalassaemia, although it remained poorly reported among that group. While data on bleeding severity were not available, previous studies have shown that the likelihood that haemorrhage or transfusion are reported in hospital data increases with higher severity [13,14] The inconsistent reporting of anaemia and blood disorders reflects the difficulties in capturing chronic but minor conditions not affecting the current admission. Serious, persistent conditions that are more likely to affect the hospital admission, including diabetes and hypertension, have demonstrated better correlation between datasets [28]. The differences in timing of data collection likely affected consistency, as issues may arise or resolve between early pregnancy and birth, particularly for fluctuating conditions such as iron-deficiency anaemia. Indeed, a recent study of the same study population and datasets as examined here found that 38% of women recorded with low haemoglobin (< 110 g/L) in the first 20 weeks of pregnancy had their haemoglobin level restored (110 g/L or higher) after 20 weeks gestation, with 38% not restored and 24% not recorded, likely reflecting good antenatal care and the recent emphasis on treating anaemia as a pillar of patient blood management [29]. The increase in anaemia prevalence, from 3.9% based on clinical diagnosis in the hospital data compared to 19.4% according to pathology results, however, highlights discrepancies between the language clinicians use in the medical notes and the language required for clinical coding, such that Hb = 69 g/L cannot be interpreted as anaemia by a coder if the words 'anaemia' or 'low Hb' are not written in the notes [17]. Additionally, a code may not be assigned for a queried diagnosis. Regarding obstetric haemorrhage, while bleeding may occur before labour (APH), during labour but before the birth (IPH) or following birth (PPH), and should be recorded as such, practically, conflation of IPH and PPH is not uncommon and it is the total blood loss rather than the exact timing of bleeding that guides the clinical response. Progressive loss during labour may not be separated into discrete time periods in the documentation or data entry, particularly when IPH occurs close to the time of birth. Hence, a composite measure may be more useful.
There was a lack of overlap in patients recorded with anaemia, platelet disorders, coagulation disorders, and placenta accreta in the two databases. For anaemia, this may be related to the timing of data collection, as discussed above, which may contribute to the high proportion of "false" cases in the hospital data. While one of the strengths of ObstetriX is that it uses systematic pre-specified fields, making omissions less likely than for hospital data, it is likely that for some conditions, obstetric data are incomplete. For transfusion, hospital data are likely more complete than obstetric data given that blood must be dispensed and recorded in patient notes. Previous studies have shown high reliability for transfusion in hospital data compared to blood pack information from transfusion laboratories [14]. For placenta accreta, hospital data are likely more accurate, given that coding considers placental histopathology and procedures such as manual removal of placenta. Obstetric data are largely self-reported, which may contain inaccuracies [24]. Further, data are entered into ObstetriX by midwives, with accuracy sometimes sacrificed when personnel are busy providing clinical care, and the person entering the data was generally not present for the entire episode of care. In light of our findings, we suggest that hospital data may be a more appropriate source for identifying transfusion than obstetric data, while for the remaining conditions, using both data sources to identify cases, possibly corrected with capture-recapture models [30], is advisable where possible for improving ascertainment. This supports recommendations made elsewhere [18,31].
Slight differences in the rates of diagnoses between hospitals likely relate to the different demographic compositions, with Hospital Two tending to have a younger obstetric population with a different ethnic and comorbidity mix. However, the overall consistency between the two hospitals suggests that the results hold across different socioeconomic and ethnic patient populations, locations, and facilities.

Conclusion
We found poor reporting of anaemia and blood disorders. Reporting of ante-and intrapartum haemorrhage and placental complications were moderately well reported, while hysterectomy, transfusion and PPH showed high consistency between datasets. Procedures were better reported than conditions, and conditions occurring around the time of birth, such as PPH, were better reported than pre-existing conditions, such as blood disorders. Caution should be exercised in the use of hospital data for studies of anaemia and blood disorders, given both the high rates of missed cases and cases unconfirmed in obstetric data, and other sources of data should be sought where possible. These findings are likely to be reasonably generalizable, given that ICD-10 is widely used [31,32], and previous studies have shown similar accuracy between hospital data from NSW and elsewhere [33].

Limitations
Obstetric data is an imperfect reference standard. The data are collected at different times, which may have affected the results as issues may arise or resolve in early pregnancy and birth. ObstetriX data are largely selfreported, which may be inaccurate [24], and in some cases may be less complete than hospital data. The implications of these limitations are discussed in relation to the findings above. Data were available from two hospitals only.