Measuring improvement in fracture risk prediction for a new risk factor: a simulation
BMC Research Notes volume 11, Article number: 62 (2018)
Improvements in clinical risk prediction models for osteoporosis-related fracture can be evaluated using area under the receiver operating characteristic (AUROC) curve and calibration, as well as reclassification statistics such as the net reclassification improvement (NRI) and integrated discrimination improvement (IDI) statistics. Our objective was to compare the performance of these measures for assessing improvements to an existing fracture risk prediction model. We simulated the effect of a new, randomly-generated risk factor on prediction of major osteoporotic fracture (MOF) for the internationally-validated FRAX® model in a cohort from the Manitoba Bone Mineral Density (BMD) Registry.
The study cohort was comprised of 31,999 women 50+ years of age; 9.9% sustained at least one MOF in a mean follow-up of 8.4 years. The original prediction model had good discriminative performance, with AUROC = 0.706 and calibration (ratio of observed to predicted risk) of 0.990. The addition of the simulated risk factor resulted in improvements in NRI and IDI for most investigated conditions, while AUROC decreased and changes in calibration were negative. Reclassification measures may give different information than discrimination and calibration about the performance of new clinical risk factors.
Methods to predict the risk of an outcome are receiving considerable attention in the clinical literature. The incremental improvement in risk prediction when a new risk factor is added to an existing model is of particular interest because new measures of risk are continually being defined and collected in an attempt to refine prediction models . This is an important topic for osteoporosis-related fracture risk prediction, where a number of models have been proposed  and numerous studies have examined the incremental improvement in prediction when biomarkers or other clinical characteristics of patients are introduced to existing models [3, 4].
Improvements in predictive model performance have traditionally been assessed using area under the receiver operating characteristic (AUROC) curve, which measures model discrimination, and calibration, which measures prediction error. Using these measures, a new risk factor is considered to be a beneficial model addition if the AUROC and calibration statistics for the new model, which includes the predictors in the original model plus the new risk factor, are better than the corresponding statistics for the original model. However, AUROC and calibration statistics are summary measures that may not provide a complete picture of the change in predicted risk for all individuals , particularly those in the lowest and highest risk categories.
Clinicians have given increased attention to reclassification tables and statistics such as the net reclassification index (NRI), which summarize the change in risk probability or the frequency (percentage) of individuals who will move from one risk category to another based on the addition of a new risk factor to the original prediction model. Reclassification statistics are increasingly used to describe the performance of risk prediction models [4, 6, 7]. However, few studies, particularly in the area of fracture risk prediction, have compared the performance of different measures .
Our purpose was to compare conventional AUROC and calibration statistics with newer reclassification statistics for fracture risk prediction. We did this within the context of the internationally-validated Fracture Risk Assessment Tool (FRAX®), which predicts risk of a major osteoporotic fracture (MOF) . Our hypothesis was that conventional measures and newer reclassification statistics would not lead to the same conclusions about the incremental improvement in model performance when a new risk factor was added to the FRAX® model.
Study design and cohort development
The study was conducted by combining analyses of a real dataset with simulation. The real data were from the province of Manitoba, Canada for the period from 1987 to 2011 and came from the Manitoba Bone Mineral Density (BMD) Program and administrative health databases, including hospital separation records, physician billing claims, prescription drug records, and population registry.
The Manitoba BMD Program database is a regionally-based clinical database that captures dual energy X-ray absorptiometry (DXA) results for the entire provincial population since the program’s inception in 1996 . Hospital abstracts are completed at the point of discharge from acute care facilities and contain diagnoses coded using the World Health Organization’s International Classification of Diseases (ICD). Physician claims are submitted to the provincial ministry of health by physicians paid on a fee-for-service basis; they capture virtually all outpatient services and contain a single ICD code. Prescription drug records are from the Drug Program Information Network, a centralized, electronic, point-of-sale database connecting all retail pharmacies. The population registry captures information on all provincial residents eligible to receive publicly-insured health services, including dates of health insurance coverage and demographics.
The study cohort included women aged 50+ years who had a BMD test between 1996 and 2011. If an individual had more than one BMD test during this period, only the first one was used. The BMD test date was the index date for creating predictors for the FRAX® model: age, body mass index, prior fracture, parental hip fracture, chronic obstructive pulmonary disease, rheumatoid arthritis, alcohol or substance use, recent glucocorticoid use, and femoral neck T-score. These measures were defined from the Manitoba BMD Program database and codes in administrative health databases [11,12,13,14,15].
MOF encompasses fractures of the spine, hip, forearm, and humerus. Fractures that occurred after the index BMD test and up to March 31, 2011, death, or migration out of province, were identified from hospital and physician billing claims databases. Health service records were assessed for fracture information not associated with trauma using established methods .
The study cohort was described on socio-demographic and clinical characteristics using means, standard deviations, and percentages. The 10-year MOF risk was estimated for each cohort member using the FRAX® Canada calculator (FRAX® Desktop Multi-Patient Entry, version 3.7) . FRAX® uses a continuous hazard function based on Poisson regression to produce risk estimates.
In the computer simulation, new risk predictions were generated based on the addition of one multiplicative risk factor to the original FRAX® estimates. This risk factor was simulated from a Bernoulli distribution and was independent of other predictors; Kooter et al.  demonstrate formulae to estimate the impact of simulated risk factors on predicted risk. Relative risk (RR), which quantifies the independent association between this simulated variable and the outcome, varied from 1.25 to 3.50 in increments of 0.25. Prevalence varied from 10 to 100% in increments of 10%. The intervention threshold, which was used to construct the reclassification tables, ranged from 5 to 50% in increments of 5%.
For each combination of simulation parameters, AUROC and calibration statistics were computed for the original FRAX® model and the new model. AUROC was calculated based upon the original and simulated risk predictions; the difference was computed. Calibration was the ratio of the observed cumulative fracture incidence at 10 years to the average predicted risk probability .
The NRI measures the frequency of appropriate reclassification compared to inappropriate reclassification with the new model compared to the original model. The predicted probabilities based on the two models are assigned to ordinal risk categories and cross-tabulated . We defined upward movement as a change into a higher risk category based on the new model and downward movement as a change in the opposite direction. As per convention, for individuals with a fracture event, a value of 1 was assigned for upward movement, a score of − 1 was assigned for downward movement, and zero was assigned for no change. The opposite scoring was used for cohort members who do not experience a fracture event. The NRI is the sum of individual scores, divided by the number of cohort members. The IDI is based on the change in calculated risk; specifically, it quantifies the increment in the predicted probabilities for the cohort members experiencing an event and the decrement for the cohort members who do not experience an event [5, 18]. Conventional and reclassification statistics for the original and new models were descriptively analysed. Statistical analyses were conducted using SPSS for Windows, Version 22.0.
Study cohort characteristics
The study cohort (Table 1) was comprised of 31,999 women 50+ years of age. A total of 9.9% sustained a MOF; 17.2% were censored at death. There were differences between cohort members with and without a MOF on most variables in the original FRAX® model. The 10-year estimated MOF risk from the original model was 10.5% [standard deviation (SD) = 6.8%] for cohort members without a MOF and 16.3% (SD = 9.6%) for cohort members with a MOF.
Risk prediction model characteristics and computer simulation
The estimated AUROC of the original model was 0.706 [95% confidence interval (95% CI) 0.697–0.716] and calibration was 0.990. Using this model, 6.8% of cohort members were predicted to have low fracture risk (i.e., < 10%), 13.9% as moderate fracture risk (i.e., 10–20%), and 27.3% as high fracture risk (i.e., > 20%).
The results obtained after introducing the new simulated risk factor into the original model are reported in Table 2. The first set of results was obtained when the RR varied and other parameters were held constant. Across the investigated RR values, the NRI demonstrated a U-shaped pattern; it was low for small values of the RR, increased for moderate values of RR, and then decreased for higher RR values. In fact, for RR > 3.0, the NRI attained a small negative value. In contrast, the IDI demonstrated small incremental increases as RR increased. The change in AUROC between the original and new model was negative; it decreased to a low of 0.642; in general, values less than 0.70 indicate poor discriminant performance . Calibration also decreased as RR increased.
The second set of results was obtained when the prevalence of the new simulated risk factor was varied and other simulation parameters were held constant. The NRI increased from 0.015 to 0.120 as prevalence increased from 10 to 100%, while the IDI showed a more modest increase, from 0.006 to 0.058. The change in the AUROC was negative for all except the largest prevalence values and the change in calibration was negative for all conditions.
The final set of results, which was obtained by varying the intervention threshold for treatment, resulted in NRI values that ranged from − 0.041 to 0.063. Given that the IDI is based on continuous values of the risk probabilities, it did not change with variations in the intervention threshold, nor did the AUROC and calibration statistics.
Discussion and conclusions
Several multivariable Fracture Risk Assessment Tools have been proposed , and there is continual exploration of new clinical risk factors that may improve fracture risk prediction in these tools . There are multiple measures of improvement in predictive performance for a new risk factor. These measures will not always produce consistent results, confirming our hypothesis.
Our results show that a risk factor with a moderate to strong independent association with the outcome simultaneously resulted in decreases in model discrimination and calibration (demonstrating that a new risk factor does not always incrementally improve risk prediction) and positive changes in the NRI and IDI indicating improvements in risk classification. However, the NRI and IDI did not always produce consistent results. For example, as the NRI decreased the IDI increased when the RR of the new risk factor increased. However, when prevalence of the new risk factor increased, both the NRI and IDI increased. These findings are consistent with previous simulations .
This study, along with previous research about fracture risk prediction , underscores the importance of examining multiple performance measures in the development and refinement of fracture risk prediction models [7, 22]. Reclassification tables and statistics such as the NRI and IDI provide clinicians and researchers with supplementary statistical indicators about the potential uncertainty in risk estimates and the net effect of a new risk factor on predictive performance. The NRI and IDI can provide insights about scenarios for which appropriate reclassification occurs relative to inappropriate reclassification with the introduction of a new risk factor.
The benefits of adding a new risk factor to a prediction model such as FRAX® will depend on a number of considerations, including cost, availability, and clinical relevance. Clinicians working in the area of fracture risk prediction, as in other risk prediction areas, must keep abreast of developments in risk modeling and continually look to new opportunities to add to their toolbox of relevant statistical methods.
The limitations of this study relate to the simulation and choice of statistical procedures. We manipulated a single risk factor in the simulation even though multiple risk factors might have been manipulated. However, researchers interested in improving risk prediction often focus on new risk factors one at a time . We considered a dichotomous risk factor; ordinal or continuously-distributed risk factors could also be investigated. However, calculation of the potential impact on risk is more complicated for the latter scenario and will depend on a number of features of the measure, including shape of the population distribution . The new risk factor was independently associated with the outcome; in real-world settings risk factors are often correlated and this will affect their impact on risk estimation. We gave equal weighting to false positive and negative values, which may not always be realistic and may not reflect clinical practice, in which greater weight may be assigned to one type of error.
We examined only a single fracture risk prediction model, although there have been a number of different models proposed ; the choice of models will affect AUROC and calibration statistics. Finally, there are other reclassification statistics that have been proposed and may produce different results than the NRI and IDI .
area under the receiver operating characteristic
bone mineral density
chronic obstructive pulmonary disease
Fracture Risk Assessment Tool
International Classification of Diseases
major osteoporotic fracture
net classification index
integrated discrimination improvement
Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21:128–38.
Leslie WD, Lix LM. Comparison between various Fracture Risk Assessment Tools. Osteoporos Int. 2014;25:1–21.
Poku EK, Towler MR, Cummins NM, Newman JD. Developing novel prognostic biomarkers for multivariate fracture risk prediction algorithms. Calcif Tissue Int. 2012;91:204–14.
Iki M, Tamaki J, Kadowaki E, Sato Y, Dongmei N, Winzenrieth R, Okamoto N, Kurumatani N. Trabecular bone score (TBS) predicts vertebral fractures in Japanese women over 10 years independently of bone density and prevalent vertebral deformity: the Japanese Population-Based Osteoporosis (JPOS) cohort study. J Bone Miner Res. 2014;29:399–407.
Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27:157–72.
Chan MY, Nguyen ND, Center JR, Eisman JA, Nguyen TV. Absolute fracture-risk prediction by a combination of calcaneal quantitative ultrasound and bone mineral density. Calcif Tissue Int. 2012;90:128–36.
Leslie WD, Berger C, Langsetmo L, Lix LM, Adachi JD, Hanley DA, et al. Construction and validation of a simplified Fracture Risk Assessment Tool for Canadian women and men: results from the CaMos and Manitoba cohorts. Osteoporos Int. 2011;22:1873–83.
Cook NR, Paynter NP. Performance of reclassification statistics in comparing risk prediction models. Biom J. 2011;53:237–58.
Kanis JA, McCloskey EV, Johansson H, Oden A, Strom O, Borgstrom F. Development and use of FRAX in osteoporosis. Osteoporos Int. 2010;21(Suppl 2):S407–13.
Leslie WD, Caetano PA, Macwilliam LR, Finlayson GS. Construction and validation of a population-based bone densitometry database. J Clin Densitom. 2005;8:25–30.
Morin SN, Lix LM, Leslie WD. The importance of previous fracture site on osteoporosis diagnosis and incident fractures in women. J Bone Miner Res. 2014;29:1675–80.
Lix LM, Azimaee M, Acan B, Caetano P, Morin S, Metge C, Goltzman D, Kreiger N, Prior J, Leslie WD. Osteoporosis-related fracture case definitions for administrative data. BMC Public Health Res. 2012;12:301.
Leslie WD, Lix LM, Johansson H, Oden A, McCloskey E, Kanis JA. Independent clinical validation of a Canadian FRAX tool: fracture prediction and model calibration. J Bone Miner Res. 2010;25:2350–8.
Binkley N, Kiebzak GM, Lewiecki EM, Krueger D, Gangnon RE, Miller PD, Shepherd JA, Drezner MK. Recalculation of the NHANES database SD improves T-score agreement and reduces osteoporosis prevalence. J Bone Miner Res. 2005;20:195–201.
Kanis JA, McCloskey EV, Johansson H, Oden A, Melton LJ, Khaltaev N. A reference standard for the description of osteoporosis. Bone. 2008;42:467–75.
Siminoski K, Leslie WD, Frame H, Hodsman A, Josse RG, Khan A, Lentle BC, Levesque J, Lyons DJ, Tarulli G, Brown JP. Recommendations for bone mineral density reporting in Canada. Can Assoc Radiol J. 2005;56:178–88.
Kooter AJ, Kostense PJ, Groenewold J, Thijs A, Sattar N, Smulders YM. Integrating information from novel risk factors with calculated risks: the critical impact of risk factor prevalence. Circulation. 2011;124:741–5.
Steyerberg EW, Pencina MJ, Lingsma HF, Kattan MW, Vickers AJ, Van Calster B. Assessing the incremental value of diagnostic and prognostic markers: a review and illustration. Eur J Clin Invest. 2012;42:216–28.
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (Roc) curve. Radiology. 1982;143:29–36.
Van Calster B, Vickers AJ, Pencina MJ, Baker SG, Timmerman D, Steyerberg EW. Evaluation of markers and risk prediction models: overview of relationships between NRI and decision-analytic measures. Med Decis Making. 2013;33:490–501.
Donaldson MG, Cawthon PM, Schousboe JT, Ensrud KE, Lui LY, Cauley JA, Hillier TA, Taylor BC, Hochberg MC, Bauer DC, Cumming SR, for the Study of Osteoporotic Fracture (SOF). Novel methods to evaluate fracture risk models. J Bone Miner Res. 2011;26:1767–73.
Pressman AR, Lo JC, Chandra M, Ettinger B. Methods for assessing fracture risk prediction models: experience with FRAX in a large integrated health care delivery system. J Clin Densitom. 2011;14:407–15.
Kerr KF, Wang Z, Janes H, McClelland RL, Psaty BM, Pepe MS. Net reclassification indices for evaluating risk prediction instruments: a critical review. Epidemiology. 2014;25:114–21.
LML, WDL, and SRM contributed to study conception, design, and interpretation of findings, and participated in manuscript preparation and revision. WDL contributed to data analyses and generation of the simulated data. All authors read and approved the final manuscript.
The authors are indebted to Manitoba Health for provision of data (HIPC 2011/12-31). Results and conclusions are those of the authors. No official endorsement by Manitoba Health is intended or should be inferred.
The authors declare that they have no competing interests.
Availability of data and materials
The data that support the findings of this study can be accessed via the Manitoba Centre for Health Policy at the University of Manitoba, but restrictions apply to the availability of these data, which were used under license for the current study, and are not publicly available. Data access approvals are given by the Manitoba Health Information Privacy Committee upon application receipt and review (http://www.gov.mb.ca/health/hipc/index.html). Ethics approval by the University of Manitoba Health Research Ethics Board is required as part of the data access approval process.
Consent for publication
Ethics approval and consent to participate
This study was reviewed and approved by the Health Research Ethics Board for the University of Manitoba. Study cohort members were not required to provide consent for participation in this study; access to anonymized administrative health data for this study was provided by the Health Information Privacy Committee of Manitoba.
LML was supported by a Manitoba Research Chair from Research Manitoba during the completion of this research. SRM holds the Endowed Chair in Patient Health Management (Faculty of Medicine and Dentistry and Pharmacy and Pharmaceutical Sciences, University of Alberta). No other funding was received to support this research.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lix, L.M., Leslie, W.D. & Majumdar, S.R. Measuring improvement in fracture risk prediction for a new risk factor: a simulation. BMC Res Notes 11, 62 (2018). https://doi.org/10.1186/s13104-018-3178-z