Our examples show that in addition to other factors, policymakers and clinicians should consider real sociodemographic factors such as prevalence when making choices between tests and when setting positive test thresholds in clinical practice as well as in policy guidelines. The interplay of sensitivity, specificity, and prevalence determines the balance of true and false results associated with a given diagnostic test strategy, and all three factors should be explicitly incorporated in evaluating testing programs. With any given test, as disease prevalence varies, the trade-off between the number of false-positive and false-negative results will vary, resulting in significant variations of health and economic consequences across settings.
Other investigators have considered the importance of incorporating prevalence in analyses of diagnostic tests. Sackett and colleagues demonstrated early on that for any given sensitivity and specificity the false-to-true positive ratio will decrease and the positive predictive value will increase with increasing prevalence. It is also well acknowledged that sensitivity and specificity would likely change with varying prevalence, although this may be a manifestation of changing patient spectrum, with prevalence playing a secondary role. The incorporation of prevalence along with sensitivity and specificity has further been described in conducting meta-analyses of diagnostic tests. In addition, the issue of pre-test probability has been considered particularly relevant when using tests with an implicit or subjective threshold, where clinicians may move their subjective threshold in response to the perception of increased prevalence[16, 17].
In the current analysis, we used screening for LTBI to demonstrate the importance of considering disease prevalence when evaluating such trade-offs in testing strategy decisions. We chose TB as an example because of its growing worldwide importance, its variations in prevalence (Figure1), its diagnostic issues such as comorbidities and latent-versus-active disease, and the critical role of health systems and resources in determining optimal screening and treatment programs[3, 4]. Although early detection and effective screening are critical to TB treatment and prevention, improvements in detection have recently slowed, with close to 40% of infections worldwide still not being properly detected or treated. This slowing is in part due to the lower sensitivity and specificity of the standard TB skin test among certain populations[6–8] as well as the challenges in determining appropriate testing strategies in settings of highly varied levels of disease prevalence and resource constraints.
For example, in lower-burdened and higher-resourced countries such as the US, TB-control strategies target high-specificity LTBI screening and treatment to prevent later conversion to active TB. Such a strategy, however, is less common in higher-prevalence settings where resources are more often allocated towards treating those with active infection and especially in poorer settings where treatment costs may be more than double a household’s monthly income. So although the introduction of newer tests such as QFT-IT and T-Spot can offer improved operating characteristics, improvements in outcomes depend on the establishment of testing strategies that are specific to each setting.
Thus, with the recent introduction of new tests for TB and the publication of WHO and FDA guidelines regarding their implementation, this analysis provides a timely demonstration that highly specific IGRA tests cause more harm and generate fewer benefits when used in high-prevalence countries, where there would be too many false negatives, too little treatment of diseased individuals, and more future illness and disease spread. In addition to confirming the WHO’s recommendations that IGRAs not be used in developing countries, our analysis also is useful to show that in making LTBI testing decisions within lower-prevalence developed countries, there may be benefits of using one IGRA test over another, depending on prevalence. The lower the prevalence, the more specific the test should be.
Our analysis is not only useful for making decisions between tests but also in determining setting-specific positive test thresholds. With the FDA’s 2009 decision to change the T-Spot cutoff for a positive result from six to eight spots[19, 20], it actively weighed the sensitivity-specificity balance in the face of lower-prevalence TB in the US. This decision recognized that changing the threshold would decrease test sensitivity, yet it would increase specificity and result in improved outcomes for this setting. However, because of the inherent nature of the test, it is possible that this changed threshold may not increase specificity to the levels of the QFT-IT. Therefore, if the revised T-Spot sensitivity and specificity values were known and included in the current analysis, the magnitude of differences in outcomes between the two tests would clearly diminish; however; the degree of decline is uncertain and is unlikely to be absolute. More research is needed to clearly determine the specificity of IGRAs in settings of varying prevalence, and in particular of the T-Spot assay with the revised US threshold.
Our TB example is a good demonstration of the issue of determining appropriate diagnostic testing strategies when the optimal sensitivity-specificity balance varies throughout the world. Consider, for example, two countries: one developed, the other not. Healthcare in the developed country is generally good, multi-drug resistant TB is relatively rare, and TB prevalence is relatively low. In the developing country, healthcare access is more limited, resistant TB is more common, and TB prevalence is higher.
An LTBI test such as T-Spot that offers a significant increase in sensitivity – compared with QFT-IT – at the cost of a 13% decrease in specificity may be valued differently in the two countries. The developed country may find that the 7% increase in early case detection benefits too few people to justify the high burden of false positives. The developing country may find that with higher disease prevalence, the greater increase in early detection is worth the increased treatment of false-positive cases, especially given the poorer access to medical services. This is not to say that the trade-off is not worthwhile in the developed country or that it is worthwhile in the developing country. Resources and local priorities and values should determine that. Rather, one should not expect the trade-off to be similar in different areas; indeed, it may differ by orders of magnitude as prevalence varies.
Despite this differential impact between settings, testing decisions do not always consider specific populations and disease characteristics, and like those for QFT-IT and T-Spot, positive-result thresholds are usually set at a global level by manufacturers and applied consistently across countries[12, 13]. Given that the prevalence of many diseases varies worldwide, encouraging policymakers to explicitly incorporate disease prevalence in their testing decisions and allowing them to choose setting-specific thresholds – or to choose from a menu of possible choices – could increase the value of a given test by optimizing test performance and improving health and economic outcomes.
Tuberculosis is a good example for demonstrating the impact of prevalence in decisions regarding positive thresholds and test strategies because of issues such as the challenges of estimating accurate test operating characteristics, the varying disease prevalence, and the differences between active and latent infection. Although such issues apply when testing for any disease, they must be taken into account when interpreting the implications of our analysis. For example, the impact of incorrect LTBI diagnoses can be particularly difficult to estimate because of low treatment compliance and the challenge of estimating the impact of delayed diagnoses. This analysis also ignores other issues involved in testing for the less-prevalent active TB[6–8].
In addition, test sensitivity and specificity and the impact of prevalence are not the only determinants of a test’s usefulness, and decisions regarding positive test thresholds and test usefulness in different settings must consider a multitude of factors. To name but a few: variation in estimates of test sensitivity and specificity (e.g., as determined by factors such as study methodology); balance of risks and benefits; reason for testing (screening or diagnosis); population-specific geography and demographics; patient preference; and patient values for different outcomes (e.g., associated with culture). Testing programs may maximize benefit, minimize risk, and successfully prevent and treat disease only when all such factors are considered. Although the examples discussed herein come from only one disease (TB), this should not be considered a limitation of the study. Rather, this analysis demonstrates an epidemiologic principle that holds true for any disease, even though the magnitude of effect will vary from one disease to another.