The effect of maternity waiting homes on perinatal mortality is inconclusive: a critical appraisal of existing evidence from Sub-Saharan Africa

Objectives To assess the appropriateness of the statistical methodology used in a recent meta-analysis investigating the effect of maternity waiting homes (MWHs) on perinatal mortality in Sub-Saharan Africa. Results A recent meta-analysis published in BMC Research Notes used a fixed-effect model to generate an unadjusted summary estimate of the effectiveness of MWHs in reducing perinatal mortality in Africa using ten observational studies (pooled odds ratio 0.15, 95% confidence interval 0.14–0.17). The authors concluded that MWHs reduce perinatal mortality by over 80% and should be incorporated into routine maternal health care services. In the present article, we illustrate that due to the contextual and methodological heterogeneity present in existing studies, the authors’ conclusions about the effectiveness of MWHs in reducing perinatal mortality were likely overstated. Additionally, we argue that because of the selection bias and confounding inherent in observational studies, unadjusted pooled estimates provide little causal evidence for effectiveness. Additional studies with robust designs are required before an appropriately designed meta-analysis can be conducted; until then, the ability to draw causal inferences regarding the effectiveness of MWHs in reducing perinatal mortality is limited.


Introduction
There is renewed interest in maternity waiting homes (MWHs) as a strategy to increase facility-based obstetric care and reduce maternal and perinatal mortality. MWHs provide temporary accommodation near a health facility prior to birth for women with high-risk pregnancies and/ or living far away from health facilities [1]. Several African and Asian countries are investing in MWH scale-up as part of their national health strategies [2][3][4][5].
While observational studies have reported some benefits [6][7][8] there is still insufficient evidence that MWHs reduce mortality [3,9] or impact newborn outcomes [10]. The quality of available evidence is also low, yet a recently published meta-analysis has drawn strong conclusions in favour of MWHs. An 82.5% reduction in perinatal mortality was attributed to MWH use and consequently, the authors recommend that "all pregnant women be admitted to MWHs before delivery" [11]. This review has been cited repeatedly to advocate MWH use [12][13][14][15][16][17][18][19] despite limitations in the original studies and the review. In this research note, we critically assess the analytic approach employed in this meta-analysis [11] and discuss Open Access BMC Research Notes *Correspondence: zohra.lassi@adelaide.edu.au important considerations when pooling observational data on complex interventions such as MWHs.

Features of the recent meta-analysis on MWHs and perinatal mortality
The meta-analysis by Bekele and colleagues included ten observational studies from six countries [7,8,[20][21][22][23][24][25][26][27] after 31% (n = 73/236) were excluded because full texts were unavailable [11]. Most of these studies included women who delivered at hospitals offering some level of comprehensive emergency obstetric care [7,8,20,21,23,26,27]. The number of perinatal deaths abstracted for MWH users and women admitted directly to hospitals were reported [11], but there were abstraction errors for two studies [21,27] and some overlap in data from two studies conducted at Attat Hospital in Ethiopia [8,24]. Three studies [7,8,23] reported stillbirths but not early neonatal deaths and in two others it was difficult to distinguish outcomes for MWH users and non-users [22,26]. The authors used a fixed-effect model to generate an unadjusted pooled odds ratio estimating the association between MWH use and perinatal mortality. The authors reported conducting sub-group analyses by study design due to the high degree of heterogeneity detected (I 2 = 97%), but no sub-group estimates were reported or discussed [11].

Choice of model for meta-analysis of complex interventions
Decisions about which statistical model to use in a metaanalysis depends on the type of effect expected and the goal of the analysis [28]. Using a fixed-effect model conveys the belief that there is one common true effect size estimated by all individual studies, and that differences in observed effect sizes are a result of sampling error [28][29][30]. When a fixed-effect model is used, the goal is not to extrapolate findings beyond the included set of studies [28,31]. In contrast, random-effects models are suitable when a distribution of true effects exists, and included studies represent a random sample of possibilities; in this case, findings may be generalized to other similar scenarios [29].
Heterogeneity is the variability in true effects underlying different studies [32,33]. The I 2 statistic (indicates the proportion of variance in observed effects due to variance in true effects and is a "measure of inconsistency") [32,33] is often used to decide whether sufficient heterogeneity exists to run a random-effects model but this is not recommended as it has low power [28]. What may be more useful is to assess whether it is likely that studies included are "functionally identical" [29] as assumed under a fixed-effect model. Widespread differences in participant characteristics, intervention designs, settings and outcomes, make the absence of heterogeneity unlikely [28,33,34]. Public health interventions are even less likely to be homogenous; they often have interacting components targeting multiple groups, accommodate flexible delivery, and are embedded within complex systems [35]. Given the considerable variation in MWH implementation [36] random-effects models are likely more suitable for meta-analyses involving MWHs.
Alone, however, the estimated mean effect provides an incomplete picture [37] as how effect sizes vary under different conditions and populations is often of interest [38]. With sufficient numbers of studies, sub-group analysis within a few important, pre-specified subgroups (to avoid issues with multiple testing) [28,39] is one way to explore heterogeneity. Results need to be interpreted cautiously due to the observational nature of the analysis [30].
Finally, in fixed-effect models, larger studies are weighted more heavily [30] as they have smaller sampling error and higher precision. The pooled estimate reported by Bekele et al. [11] was, thus, largely influenced by one study [8] (weight: ~ 74%). In random-effects models, each study provides unique information about the distribution of true effect sizes, therefore weighting is more equivalent [29].

Methodology for the present study
In light of the methodological considerations outlined above, we sought to critically assess the methodology employed by Bekele and colleagues, and explore whether heterogeneity may be better accounted for using a random-effects model. For illustrative purposes, we reabstracted information from the seven studies [7,8,20,21,23,25,27] from the review that had appropriate data available, as well as three additional eligible studies [40][41][42] identified from reference lists (Table 1). We calculated a summary estimate in Review Manager version 5.4 using a random-effects model for stillbirths and perinatal mortality separately, using unadjusted outcome events reported for MWH users and women directly admitted to hospital.
To explore heterogeneity, we conducted sub-group analysis for stillbirths to demonstrate how country and type of managing authority may change effect estimates. While no definitive conclusions can be made, the results provide insight into sources of heterogeneity.

Random effects model findings and implications
The pooled estimates are suggestive of an association between MWH use and lower stillbirths ( (Fig. 1). The comparative similarity in weights calculated for stillbirths point to higher between-studies than withinstudy variance [29]; this is also reflected in the high values of I 2 (I 2 = 93%, indicating 93% of the total variation is attributable to heterogeneity [33]) and τ 2 (τ 2 = 0.97). The lower I 2 values suggest that there is more consistency among studies conducted in Ethiopia (I 2 = 86%) and even more among those conducted in other countries (I 2 = 35%) than when all studies are considered together (I 2 = 93%) ( Table 2).
Overall, the reduction in the between-study variance for country sub-groups (τ 2 = 0.10-0.28 subgroups versus τ 2 = 0.97 all studies) suggests that betweencountry contextual differences could be one source of heterogeneity. The between-study variance was also lower when the type of managing authority was considered. There was more consistency among studies with government-run facilities (I 2 = 42% τ 2 = 0.47) than overall (I 2 = 93% τ 2 = 0.97). While the test for subgroup differences was not statistically significant, the existence of heterogeneity due to managing authority cannot be ruled out.

Conclusion
Given the complexity of MWH interventions and the variation in contextual factors, heterogeneity must be appropriately addressed when conducting meta-analysis on MWH effects. More robustly designed studies with adequate reporting are needed to enable exploration of heterogeneity in effects. Careful consideration of the quality of evidence and specific conditions required to improve outcomes for women and babies is required before implementing further scale-up of MWHs.

Limitations
Firstly, meta-analyses produce "observational" results even if randomized controlled-trials (RCTs) are included as random allocation is not preserved [43]. Observational studies, where assignment to comparison groups is not random, are considered to be at even higher risk for selection bias and confounding than RCTs [34]. While a random effects model is more suitable for MWH studies, the pooled estimates presented here may still be compromised by bias and confounding inherent to observational designs. Future analyses may consider meta-regression to assess the effect of study-level covariates on effect sizes [28] when at least ten studies are available [30]. If available, adjusted analyses with comparable adjustment variables can also be used to generate adjusted pooled estimates. Ideally, however, additional individual studies using robust designs are required for results from metaanalyses to be more informative. RCTs are generally accepted as providing the highest quality evidence [34] if well designed, conducted and reported. Where it is not feasible or ethical to conduct trials, longitudinal studies with careful participant selection, adequate confounder information, sufficient follow-up levels that analyse data suitably may be acceptable alternatives. Availability of additional studies would also improve estimates of between-study variance (τ 2 ) which tend to be imprecise with fewer available studies [28]. Precision, in random effects models, is enhanced by the number of studies included, not study sample sizes [29].
Secondly, while there is an urgent need to improve methodological reporting in primary studies as illustrated in Table 1, there is an equal necessity to provide more details about MWH models themselves. Specifically, information on referral criteria and practices, community outreach activities to raise awareness and facilitate women's access to MWHs, duration of stay and gestational age at admission, accommodation services available at MWHs, associated costs, level of monitoring of MWHs by health workers, the stage of labour when women are transferred to the health facility, and level of obstetric care available are needed to have a clear understanding of what is required to achieve reported reductions in mortality. This information could support a more comprehensive exploration of heterogeneity which we were not able to do due to the small number of studies and insufficient reporting in individual studies.
Thirdly, a better understanding of modifiable risk factors associated with stillbirths and neonatal deaths is required to assess the extent to which MWHs could potentially facilitate improved perinatal outcomes. A study investigating modifiable health-system risk factors reported that having to wait more than 10 min to receive care after reaching a facility was associated with higher odds of stillbirth [44]. Other modifiable risk factors for stillbirths include maternal infections and prolonged pregnancy [45] which may be addressed through quality antenatal and intrapartum care, irrespective of MWH use. Reporting the type of stillbirth (intrapartum or antepartum) in future studies may help to disentangle stillbirths that can be averted through access to timely obstetric care (intrapartum stillbirths) and those which result from more longterm issues such as foetal growth restriction [45]. Only one of the studies included in the review made this distinction [21] making it impossible to explore.
Stillbirths and neonatal deaths are also a relatively rare event, which would make it difficult for studies with small sample sizes to detect meaningful changes in outcomes. Any reported associations between MWH use and stillbirth rates or perinatal mortality should, thus, be interpreted with caution.
A defining feature of systematic reviews is the use of clearly articulated, well-documented, comprehensive search strategies targeting multiple sources that are designed to capture the highest proportion of eligible studies in a transparent and reproducible fashion. In this way, bias is minimized and more reliable estimates are generated [30]. Since our aim was to illustrate the issues associated with statistical modelling, we did not repeat the search but largely relied on studies identified by Bekele and colleagues [11].
Finally, no firm conclusions can be drawn about the effectiveness of MWHs in reducing perinatal mortality from meta-analyses that do not employ methods that appropriately incorporate contextual variation and adequately consider the quality of included studies. The need to update evidence on MWH effectiveness using well-designed studies from diverse settings that reflect current levels of service use and quality remains.