 Research article
 Open Access
 Published:
Biomarker selection for medical diagnosis using the partial area under the ROC curve
BMC Research Notesvolume 7, Article number: 25 (2014)
Abstract
Background
A biomarker is usually used as a diagnostic or assessment tool in medical research. Finding an ideal biomarker is not easy and combining multiple biomarkers provides a promising alternative. Moreover, some biomarkers based on the optimal linear combination do not have enough discriminatory power. As a result, the aim of this study was to find the significant biomarkers based on the optimal linear combination maximizing the pAUC for assessment of the biomarkers.
Methods
Under the binormality assumption we obtain the optimal linear combination of biomarkers maximizing the partial area under the receiver operating characteristic curve (pAUC). Related statistical tests are developed for assessment of a biomarker set and of an individual biomarker. Stepwise biomarker selections are introduced to identify those biomarkers of statistical significance.
Results
The results of simulation study and three real examples, Duchenne Muscular Dystrophy disease, heart disease, and breast tissue example are used to show that our methods are most suitable biomarker selection for the data sets of a moderate number of biomarkers.
Conclusions
Our proposed biomarker selection approaches can be used to find the significant biomarkers based on hypothesis testing.
Background
A biomarker is a biological indicator showing the absence, presence, or the condition of a disease, and it can be used to determine the status of a subject, the effectiveness of a treatment, and so on. Ideally, a biomarker with both high sensitivity and specificity for accurate prediction is preferred. However, it is not easy to find such a biomarker in practice. Combining biomarkers provides an alternative to improve the performance of those individual biomarkers that are currently available. The serum prostatespecific antigen PSA is a typical example. It is a wellaccepted prognostic biomarker used to screen for prostate cancer. However, this test has a low specificity and therefore might lead to overdiagnosis and overtreatment. In addition to PSA, several other alternatives have also been investigated [1]. Nevertheless, there is no single alternative which outperforms PSA, and therefore most investigators propose the use of a combination of PSA and other biomarkers. The combination of PSA and percentfree PSA is an alternative method [2]. Recently, due to significant advances in biotechnology, many genetic and genomic biomarkers have been discovered that could be potential candidates [3]. Once their clinical evidence is validated, integrating multiple biomarkers in order to obtain a better prediction will become an essential and important task.
The ROC curve is the most popular graphical tool for evaluating the diagnostic power of a biomarker. It provides an exhaustive look at the trend of sensitivity over all cutoffs, and thus provides information about the relationship between the sensitivity and the specificity of a biomarker. However, the abundance of information it provides makes the comparison between biomarkers difficult, because the underlying ROC curves are often likely to cross. The area under the ROC curve (AUC), which integrates the curve over all cutoffs, is proposed for an efficient summarization. This criterion can be extended by giving different weights at various cutoffs according to, for example, the cost resulting from the prediction error in the diseased or in the nondiseased population, and the prevalence rate of the disease [4]. In some applications, investigators focus only on a part of the curve. For example, a high level of specificity is required for a biomarker serving as a population screening tool. As a consequence, a biomarker is assessed on the partial area under the ROC curve (pAUC) in a region of specificity above a certain level [5–7].
This study focuses on combining multiple continuousscaled biomarkers into one single diagnostic or predictive rule for a disease with emphases on assessment of each biomarker. For better interpretability, we propose the use of a linear combination for summarization. The discriminatory power of a linear combination of biomarkers is evaluated based on the pAUC. The optimal linear combination, which provides the best discriminatory power among all combinations, is the target solution of research interest.
In the presence of multiple biomarkers, a traditional method of medical diagnosis is to fit a multiple logistic regression model to the data set. An example of this is the study of outcome prediction of aneurysmal subarachnoid hemorrhage (aSAH) patients [8]. Alternatively, seeking the maximal discriminatory power, the explicit form of the best linear combination in terms of AUC under a binormal model is derived [9]. Following their study, a solution that is superior to all others in certain scenarios when a high specificity or a high sensitivity is required was found [10]. Nevertheless, these scenarios are not universal. The use of empirical AUC estimates in finding the optimal linear combination was proposed [11, 12]. In our earlier study, we found that not only the analytical derivation, but also the computation, became much more complicated with the use of the pAUC criterion [13].
When an optimal linear combination is available, the solution is useful in evaluating either the entire biomarker set or one specific biomarker in the set. For example, the maximal pAUC of a biomarker set provides the best discriminatory power that the biomarker set can achieve. If even the best linear combination does not have a significant discriminatory power, none of the biomarkers should be considered to be associated with the disease. In addition to the global predictability, some insights on the importance of an individual biomarker can be obtained from the coefficients in the optimal linear combination. If a coefficient is nearly zero, the corresponding biomarker contributes little to disease diagnosis and is regarded as less important. In this study, we propose three testing procedures based on the optimal linear combination maximizing the pAUC for assessment of the biomarkers.
The proposed statistical tests will be embedded in two stepwise biomarker selection methods to identify biomarkers of statistical significance. It’s known that a classification is parallel to a diagnostic rule. Recently, in order to deal with big data several algorithmbased classification approaches have been proposed which also directly use either AUC or pAUC as the objective function [14–21]. The computational feasibility and efficiency are usually the major considerations in development of the methods. One popular way is to add some penalty in the optimization to stabilize the calculation. The penalization naturally leads to variable selection, which is a desirable outcome in an analysis of a huge data set. In contrast, we consider the conventional stepwise selection methods, which select or discard a biomarker on the basis of the statistical significance. However, acquiring the evidence of significance necessitates intensive computation. Therefore, our methods are most suitable for the data sets of a moderate number of biomarkers.
The paper is organized as follows: In the first part of Section (Methods), the sample version of the optimal linear combination will be defined. The testing procedures for the global and individual discriminatory power will be proposed in the second part of Section (Methods). Furthermore, two biomarker selection approaches adopting the proposed tests will be developed in the third part of Selection (Methods). Numerical results, including an intensive simulation and real example analysis, are given in the first part and the second part of Section (Results). We then conclude this paper with a discussion in Section (Discussions). Finally, conclusions are given in Section (Conclusion).
Methods
Let X be a random vector of p biomarkers related to the disease of a subject, and D be the binary disease status, where D = 1 indicates a subject from the diseased population, and D = 0 indicates a subject from the nondiseased population. Suppose
where the covariance matrices Σ_{ 0 } and Σ_{ 1 } are positive definite. For any given real vector a ∈ ℝ^{p}, the linear combination of p biomarkers, a^{T}X, has a distribution as follows:
where Q_{ d } = a^{T}Σ_{ d }a, for d = 0,1. Let Ф(·) denote the cumulative distribution function of N(0,1) and Ф^{1}(·) be its inverse function. Also c(u) = Φ^{ 1}(1  u) and Δ_{ μ } = μ_{1}  μ_{0}, then for a given threshold at specificity (1u), the sensitivity of a^{T}X is equal to
Therefore, for a given specificity region (1t,1) for some predetermined t ∈ (0,1), the partial area under the ROC curve (pAUC) of the linear combination, a^{T}X, is equal to
Similar to the AUC, the pAUC has the scale invariant property. For identification purposes, in this study the search for the optimal linear combination vector is restricted to the hypersphere with a unit radius. Let a^{*} be such a pAUC maximizer; that is,
where E_{ p } = {a‖a‖ = 1, a ∈ ℝ^{p}}.
Assume two independent random samples are drawn from the nondiseased and diseased populations. Let n_{0} and n_{1} be the sample sizes of the nondiseased and diseased groups, respectively, and denote their minimum as n = min {n_{0},n_{1}}. Under the normality assumption, the maximum likelihood estimates (MLEs) are employed in a sample version of the optimization problem, when the population parameters are unknown. The estimated mean vectors and covariance matrices are respectively denoted as follows: ${\widehat{\mathit{\mu}}}_{0}$, ${\widehat{\mathit{\mu}}}_{1}$, and ${\widehat{\mathbf{\Sigma}}}_{0}$, ${\widehat{\mathbf{\Sigma}}}_{1}$. Moreover, let ${\widehat{\mathbf{\Delta}}}_{\mathrm{\mu}}={\widehat{\mathit{\mu}}}_{1}\u2010{\widehat{\mathit{\mu}}}_{0}$ and ${\widehat{\mathit{Q}}}_{d}={\mathit{a}}^{\mathrm{T}}{\widehat{\mathbf{\Sigma}}}_{d}\mathit{a}$, for d = 0,1. Replacing the unknown parameters in Equation (1) by their corresponding MLEs, we have a sample version of the pAUC below:
where
Thus, the coefficients a^{*} are estimated by the maximizer of Equation (2):
The next theorem shows that the sample pAUC maximizer ${\widehat{\mathit{a}}}_{n}$, is strong consistent.
Theorem 1: Suppose that the conditional distribution of XD = d follows N (μ_{ d }, Σ_{ d }) and Σ_{ d } is positive definite for d = 0,1. Assume that pAUC ( a ) in Equation (1) has a unique maximizer a^{*} in E_{ p }. Then the maximizer, ${\widehat{\mathit{a}}}_{n}$, of the sample pAUC, ${\widehat{\mathit{pAUC}}}_{n}\left(\mathit{a}\right)$, in Equation (2) converges to a^{*} with probability 1 as n → ∞. (The proof is given in Additional file 1).
Previously, we found that the pAUC function sometimes has local extrema or multiple maxima [13]. Therefore, we proposed a multipleinitial algorithm, which utilizes multiple initial points in a conventional optimization algorithm, to reduce the risk of not finding the global maximum. The uniqueness of the maximum is assumed in Theorem 1 to ease the complications brought on by the existence of multiple maxima.
In real applications, occasionally the calculated best linear combination had a low pAUC value, or some coefficients in the best linear combination were found to be nearly zero. Numerically, the relevant biomarkers might have a limited contribution to the disease prediction. In the following section, we will discuss how to assess the significance of biomarkers in terms of their discriminatory power. The proposed testing procedures will be utilized in our biomarker selection approaches to find a compact biomarker set which consists of only significant biomarkers for disease diagnosis.
Hypothesis testing and biomarker selection
Testing the discriminatory power
When an optimal linear combination is available, the solution is useful in evaluating either the entire biomarker set or one specific biomarker in the set. The first hypothesis testing problem of interest is to assess the overall discriminatory power of a biomarker set through its maximal pAUC, which is the best discriminatory power that the biomarker set can achieve. Once the overall diagnostic power is “statistically confirmed,” the next important issue is to evaluate the contribution of each biomarker. This type of information can provide more insight about the causal relationship between each biomarker and the disease. In this subsection, the statistical procedures for testing the discriminatory power of a set or of an individual biomarker are developed.
Considering only the class of linear combinations, we evaluate the global discriminatory power of a set of p ≥ 1 biomarkers, X_{,} by testing the following hypotheses:
H_{0,g}: The biomarker set has no discriminatory power to the disease
versus
H_{1,g}: The biomarker set has a discriminatory power to the disease.
The null hypothesis H_{0,g} is true if the optimal linear combination of the biomarker set has no discriminatory power. Or equivalently, the maximal pAUC that the set can achieve through its linear combinations is not greater than the reference limit t^{2}/2, which is the pAUC value of the noninformative diagnosis with a diagonal ROC curve. That is,
By maximizing the sample pAUC defined in Equation (2), we obtain the maximal sample pAUC and use it as the test statistic. That is,
In fact, T_{ g } is the estimated pAUC of the best linear combination ${{\widehat{\mathit{a}}}_{n}}^{\mathit{T}}\mathit{X}$. The null hypothesis H_{0,g} is rejected if T_{ g } is sufficiently large.
Due to the complex formulation of the test statistic, the null distribution and the righttailed critical value are estimated by a parametric bootstrapping method. Under H_{0,g}, X has a common multivariatenormal distribution in the two population groups. The common mean and covariance matrix are estimated from the pooled sample, and are denoted as ${\tilde{\mathit{\mu}}}_{p},{\tilde{\mathbf{\Sigma}}}_{p}$. Consider drawing two independent random samples of size n_{1} and n_{0} from the estimated common null distribution, $\mathit{MVN}\left({\tilde{\mathit{\mu}}}_{p},{\tilde{\mathbf{\Sigma}}}_{p}\right)$. Then use the bootstrap samples to find the test statistic, say ${{T}_{\mathrm{g}}}^{\left(b\right)}.$ Repeat the sampling B times. The critical value at the significance level α is then equal to the 100 (1α)^{th} percentile among these ${{T}_{\mathrm{g}}}^{\left(b\right)}$ values. The null hypothesis H_{0},_{ g } is rejected if T_{ g } is greater than or equal to the critical value.
When a set consists of only one biomarker, say X_{ i }, the global effect becomes the marginal discriminatory power of X_{ i } alone. Using the correspondent pAUC to describe its discriminatory power, we can assess the biomarker by testing the following hypothesis:
where 1_{ i } is the vector having zero components, except for a 1 in the position correspondent to X_{ i }. Again, we use the estimated pAUC value as the test statistic,
where ${\widehat{\mu}}_{1,i},{\widehat{\Sigma}}_{1,i}$ and ${\widehat{\mu}}_{0,i},{\widehat{\Sigma}}_{0,i}$ are the MLEs of the mean and variance of X_{ i } in the two groups. The critical value is determined by the parametric bootstrapping method described previously. Here, only one single biomarker is involved, so the computation is even simpler.
When multiple biomarkers, X are simultaneously taken into account, we consider assessing one specific biomarker given the existence of other biomarkers. Let ${\mathit{X}}^{T}=\left({\mathit{X}}_{i}^{T},{X}_{i}\right)$, where X_{ i } denotes the target biomarker and X_{ i } includes the remaining ones in the set. Now the goal is to test the following hypothesis:
H_{0c}: Given X_{ i }, X_{ i }has no discriminatory power to the disease.
The coefficients of the optimal linear combination of X are written as ${\mathit{a}}^{\phantom{\rule{.07em}{0ex}}*T}=\left({\mathit{a}}_{i}^{*T},{a}_{i}^{*}\right)$, where ${a}_{i}^{*}$ is the corresponding coefficient of X_{ i }. In this problem, we propose evaluating the biomarker X_{ i } from ${a}_{i}^{*}$. Given X_{ i }, this biomarker has no discriminatory power to the disease, if it does not contribute to the linear combination in terms of having a zero coefficient. That is, H_{0,c} is equivalent to
The test statistic is the estimator of ${a}_{i}^{*},$ denoted by ${T}_{c,i}={\widehat{a}}_{n,i}$. The null hypothesis H_{0,c} is then rejected if T_{ c, i } is either too small or too large.
To generate the bootstrap samples, the null scenario under H_{0,c} is discussed. Under the normality assumption, given D = d, d ∈ {0, 1},
Then in H_{0,c}P(X_{ i }D, X_{i }) = P(X_{ i }X_{i }), which holds providing that for each realization, X_{ i } = x_{ i },
Therefore, estimating the null distribution involves a nontrivial constrained inference. For simplicity, we consider a narrower null scenario, where P(X_{ i }D, X_{i }) = P(X_{ i }). That is, within the two groups, not only does X_{ i } have a common distribution, but X_{i} is also independent from X_{ i }. As a consequence, we then consider the following model for bootstrap samples: for d = 0,1,
Notations ${\widehat{\mathit{\mu}}}_{d,i}$ and ${\widehat{\mathbf{\Sigma}}}_{d,i}$ represent the MLEs of the mean and covariance matrix of X_{ i } respectively from the two samples; ${\tilde{\mu}}_{p,i},{\tilde{\sigma}}_{p,i}$ are estimates of the mean and variance of X_{ i } from the pooled sample; 0 is the (p1) x 1 zero vector. Repeat the bootstrap sampling B times, find the sample pAUC maximizers of the bootstrap samples, and record the B estimated coefficient ${\widehat{a}}_{n,i}^{\left(b\right)}$ correspondent to X_{ i }. The critical values are then the 100 (α/2)^{th} and the 100(1α/2)^{th} percentiles among the B coefficients. The null hypothesis is rejected if the test statistic T_{ c,i } is greater than or equal to the 100 (1α/2)^{th} percentile, or is less than or equal to the 100 (α/2)^{th} percentile.
Note that this conditional test is powerless to detect the significance of X_{ i } when X_{ i } solely is independent of the disease D. Under H_{0,c}, it’s known that
Combining the fact that P(X_{ i }D) = P(X_{ i }), it then leads to the complete null scenario that all biomarkers are independent of the disease. Under the circumstance, the estimated coefficients have great variability subject to the requirement of unit length in the algorithm. As a consequence, the critical values become so extreme that obtaining a significant finding is unlikely, even when in fact X_{ i } is strongly correlated with the disease.
Biomarker selection
We now turn to the biomarker selection problem. By using the statistical tests in the last subsection, we are able to determine the significance of a biomarker. The amount of data is reduced by selecting the significant biomarkers.
Assume that X is the vector of the full biomarker set and let ${\widehat{\mathit{a}}}_{n}^{\mathrm{T}}=\left({\widehat{a}}_{n,1},\dots ,{\widehat{a}}_{n,p}\right)$ be the estimate of the optimal linear combination as before. We then employ the idea of a classical stepwise variable selection method. First, an ordering criterion for all biomarkers is determined. Here, the biomarkers are rearranged according to their corresponding $\left{\widehat{a}}_{n,i}\right$ values in ascending order. The ordered biomarker set is denoted by X^{T} = (X_{(1)},…, X_{(p)}). Hence, X_{(1)} is potentially the least important biomarker and X_{(p)} is potentially the most important one. Note that the ordering criterion is reasonable only when all biomarkers are expressed in a common unit, hence an adequate standardization should be applied before we proceed to the selection procedure.
We consider two stepwise selection methods: the Forward and the Backward approaches. For convenience, define A as the set of biomarkers under consideration for the disease diagnosis in each step. The Forward procedure starts with a null A, and tests the contribution of the potentially most discriminatory biomarker X_{(p)}. The biomarker is added to A if it is significant. Then it consecutively assesses X_{(p1)}, X_{(p2)} and so on. On the other hand, the Backward procedure begins with testing the overall discriminatory power of A = {X}. If there is a significant global effect, one further determines whether the potentially least discriminatory biomarker X_{(1)} is significant. Remove the biomarker from A if an insignificant result is present. Given the result, this procedure consecutively assesses the conditional contribution of X_{(2)}, of X_{(3)} and so on. The details are presented below:
Forward method
Step 1. Set A = Ø. Test the marginal effect of X_{(p)} with respect to
H_{0,(p)} : X_{(p)}has no discriminatory power.
If H_{0,(p)} is rejected, add X_{(p)} to A.
Go to the next step.
Step 2. Test the significance of X_{(p1)} with respect to
H _{0(p1):} Given A, X _{(p1)} has no discriminatory power.
If H_{0,(p1)} is rejected, add X_{(p1)} to A.
Go to the next step.
Step p. Test the significance of X_{(1)} with respect to
H_{0,(1)}: Given A, X_{(1)}has no discriminatory power.
If H_{0,(1)} is rejected, add X_{(1)} to A.
Stop.
Backward method
Step 0. Set A = {X}. Test the global effect of A with respect to
H_{0,(0)}: A has no discriminatory power.
If H_{0,(0)} is rejected, go to the next step; otherwise, stop and conclude A = Ø.
Step 1. Assess X_{(1)} by removing X_{(1)} from A and test the hypothesis,
H_{0,(1)}: Given A, X_{(1)}has no discriminatory power.
If H_{0,(1)} is rejected, add X_{(1)} to A.
Go to the next step.
Step 2. Assess X_{(2)} by removing X_{(2)} from A and test the hypothesis,
H_{0,(2)}: Given A, X_{(2)}has no discriminatory power.
If H_{0,(2)} is rejected, add X_{(2)} to A.
Go to the next step.
⋮
Step p. Assess the effect of X_{(p)}. If A = {X_{(p)}}, stop; otherwise, remove X_{(p)} from A and test the following null hypothesis,
H_{0,(p)}: Given A, X_{(p)}has no discriminatory power.
If H_{0,(p)} is rejected, add X_{(p)} to A.
Stop.
In the end of the selection process, we conclude that the biomarkers in A have a significant contribution to disease diagnosis. At Step 0 of the Backward approach, the global test is conducted; see H_{0,g} and T_{ g } in Section 3.1. Moreover, during the selection, in testing the contribution of a specific biomarker, two different tests are applied depending on whether A is empty or not. If A = Ø, this is the problem of testing the marginal contribution of the target biomarker; see H_{0,m} and T_{ m,i } in Section 3.1. If A ≠ Ø, then the conditional contribution of the target biomarker is tested; see H_{0,c} and T_{ c,i } in Section 3.1.
For a study of p biomarkers, the Forward approach needs p tests for the final conclusion. However, the Backward approach is not that simple. It might stop immediately at Step 0 if an insignificant global discriminatory power is obtained. When the global significance is achieved and the first p  1 biomarkers have all been concluded to be insignificant, we directly draw the conclusion of selecting only X_{(p)} without verifying its significance. If none of the above is the case, the evaluation of X_{(p)} is necessary. Hence, the Backward approach may take 1, p or p + 1 test(s) to reach its final conclusion. The stepwise method, which combines the forward and the backward selections, is another potential approach. However, it will take much longer computational time.
Sometimes a biomarker has no discriminatory power by itself, but has a contribution given the existence of other biomarkers. The contribution mainly comes from high correlations with other major biomarkers. In a selection procedure, this biomarker is likely to be selected. However, given this biomarker, the conditional test is powerless to detect other important biomarkers, as described in the last subsection. As a consequence, the Backward approach may produce a confusing conclusion: select a minor biomarker but discard a major one. On the other hand, because the Forward approach starts by assessing the marginal contribution of every biomarker, it tends to yield less positive findings if the effect sizes or the pAUCs of the biomarkers are small to moderate. In the next section, we will further explain these findings by way of a simulation study and real examples.
Results
In this section, we perform simulation results to validate our proposed procedures, including the estimation of the best linear combination of the biomarkers, the global test of the discriminatory power of a set of biomarkers, and the two biomarker selection approaches. We generate samples of two, three and four biomarkers (p = 2,3,4) in various scenarios. To prevent the report from becoming too lengthy, we only provide a discussion on the case of two biomarkers and partial results for the cases of three and four biomarkers. More numerical results are provided in the additional files (see Additional file 1).
In the following, given the parameters values, the true best linear combinations maximizing the pAUC are found via gridsearch with 10^{6} grids. When the data dimension p ≤ 2, fixed grids are considered. When the data dimension is greater than two, the grids are drawn uniformly on the surface of a sphere [22, 23]. On the other hand, based on the sample data, the estimated best linear combinations are computed via the multipleinitial algorithm proposed in our previous study [13].
Assume that the two biomarkers X = (X_{1},X_{2})^{T}, given D = d, follow a bivariatenormal distribution with mean μ_{ d } and covariance Σ_{ d }, where d = 0 or 1 indicates a nondiseased or diseased group, respectively. Suppose that μ_{0} = 0 and consequently, μ_{1} is equal to the mean difference, μ_{1} = Δ = (Δ_{1},Δ_{2})^{T}. Three values, 0.3, 0.5, and 1 are considered for Δ_{ i }’s. To mimic a standardized data set, the two biomarkers have unit variance, and correlation coefficient ρ_{ d }. The correlation coefficient ρ_{ d } takes on one of three values: 0, 0.5 or 0.9, see Table 1. Consider the pAUC with t = 0.1. Table 1 also reports the distribution of a^{*T}X in the two groups. Further, the last column displays the true maximal pAUC values attained.
The first case is the complete null scenario, where the two biomarkers have the same distribution in the diseased and nondiseased groups. Each linear combination provides no discriminatory power to the disease and has the reference pAUC value t^{2} /2 = 0.005. Define a^{*} = 0 in this case. In Case 2–22, Δ_{1} = 0, Δ_{2} > 0, hence the second biomarker is the dominant biomarker. In Case 2–4, the two biomarkers are conditionally independent, and thus the first biomarker is completely uncorrelated with the disease while the second biomarker is the only contributor to the disease diagnosis. In Case 5–10, we find that the first biomarker can provide a nonignorable contribution when it is correlated with the major contributor. Comparing this with Case 2–4, we observe that the global discriminatory power is significantly increased by the presence of the positive correlation. To further investigate the effect of correlation, we consider various covariance matrices. The two biomarkers are correlated only in the nondiseased group in Case 11–16, and only in the diseased group in Case 17–22. It can be seen that the existence of a positive correlation in the nondiseased group has a greater improvement in pAUC than in the diseased group. In the last three cases, Δ_{1} = Δ_{2}, ρ_{ d } = 0, and hence both biomarkers are of equal importance. The pAUC of the best linear combination increases with the common mean difference as expected.
Next, we study the empirical performances of the proposed estimated best linear combination $\left({\widehat{\mathit{a}}}_{n}\right)$ and the correspondent pAUC $\left(\mathit{pAUC}\left({\widehat{\mathit{a}}}_{n}\right)\right)$. Consider a balanced study, in which n_{0} = n_{1} = 100. In Table 2, the empirical mean and standard error of these estimators among 1,000 replicates, denoted by Ave and SE, are reported.
In estimating the best linear combination, we find that it tends to give conservative results that are biased towards zero. The estimators have the greatest variations in the complete null scenario, and the variations decrease as the discriminating power of the two biomarkers increases. The estimated pAUC tends to overestimate the true value, and similarly this tendency increases as the set of the two biomarkers have a greater diagnostic power. As suggested by a referee, the use of an independent validation test set can be expected to reduce the overestimation. The last column displays the empirical power of the global discriminatory power test at significance level α = 5% with bootstrapping size 500. We find that the test controls the type I error rate well and has satisfactory performance in alternative cases.
Next, we apply the two biomarker selection approaches. At each step, the significance level is α = 5% and the bootstrapping size is 500. There are four possible conclusions: (i) (c_{ 1 },c_{ 2 }), if both biomarkers are selected; (ii) (1,0), if only the first biomarker is selected; (iii) (0,1), if only the second is selected; (iv) (0,0), if both are discarded. If at least one biomarker is selected, the best linear combination of the reduced biomarker set, as well as its correspondent pAUC value, is solved. The mean and the standard error of the maximal pAUC among the nonempty reduced sets are reported in Table 3. Table 4 lists the proportions of the four possible conclusions of the two approaches among the 1,000 replications. In each scenario, the figure in boldface corresponds to the most likely outcome.
From Table 3, we can see that the Forward approach generally outperforms the Backward approach except in the null case. When the first biomarker has a nonignorable contribution mainly due to the existence of a positive correlation between the two biomarkers, such as in Case 7–16, the Backward approach has unsatisfactory performance. From Table 4, we find that in these cases, a quite certain proportion of samples select only the first biomarker, which in fact has no marginal discriminatory power at all. More specifically, after obtaining a significant global effect at step 0, the potentially less important biomarker, which is likely the first one in the simulation, is assessed. We often obtain significance due to the obvious decrease in pAUC caused by removing the biomarker. Next, the conditional discriminatory power of the second biomarker, given the first biomarker, is assessed. As explained in Section 3, the conditional test is powerless when the given biomarker is independent of the disease. Thus, this major biomarker is likely discarded after the minor biomarker is selected.
On the other hand, in these scenarios the Forward approach, which begins by assessing the most discriminatory biomarker, is not able to derive the benefits from the correlation, and has less positive discoveries, as seen in Case 8–9, 11–12 and 14–15. However, as the effect size of the biomarker increases, the Forward approach has adequate power in identification of both important biomarkers, and hence it has better performance in terms of achievement of pAUC as seen in Table 3.
To investigate the robustness of our methods with respect to deviation from the binormality assumption, we generate 1,000 random samples of two biomarkers from multivariatet distributions with degree of freedom 3. In Table 5, the true maximal pAUC value, pAUC(a^{*}), is found via a grid search under the multivariatet distribution. Additionally, we report the average and the standard error of the estimated maximal pAUC value of the reduced biomarker set, which is selected via our proposed methods on the basis of binormality. We find that in this case, our methods tend to produce optimistic conclusions. The proposed pAUC estimation and the resultant biomarker selection procedures are sensitive to the binormality assumption.
Next, we study the cases consisting of three and four biomarkers (p = 3 or 4). Again, assume μ_{0} = 0 and μ_{1} = Δ = (Δ_{1},…,Δ_{ p })^{T}. Further, the covariance matrices are of the following form: for d = 0,1,
The performance of the estimated pAUC of the best linear combination of the full biomarker set, and that of the reduced biomarker set found from the two biomarker selection approaches, are presented in Table 6. Similar to the cases of p = 2, we can see that the estimated pAUC tends to overestimate the true value. By using the Backward approach, we are less likely to obtain a confusing conclusion as in the case of p = 2. Currently, the two selection approaches have comparable performance in most cases, except Case 11 of p = 3 and Case 8 of p = 4.
Applications to real data sets
We apply our procedures to some real examples in [10, 24, 25]. The 1specificity upper limit is t = 0.1, the stepwise significance level is α = 5%, and the bootstrapping size is 500 during the biomarker selection. We use a multipleinitial algorithm to find the estimated best linear combinations of these real examples [13]. Before the biomarker selection, standardization is conducted. After subtracting the nondiseased group mean, every biomarker is divided by its pooled sample standard deviation from the two groups for a more constant unit across biomarkers. In addition, the analytical results of the data without standardization can be found in the additional files (see Additional file 1). With regard to the distributional assumption, it has been concluded that the first two example data sets do not deviate significantly from the binormality in their original papers [10, 24]. However, in the last example, we obtain significant evidence (pvalue < 0.0000) against the normality hypothesis for both samples via the package myShapiroTest of R software. Although the binormality assumption fails, this data set is still analyzed to demonstrate the applicability of our proposed methods to larger data sets. The famous algorithmbased variable selection method, LASSO, is also applied to this example for comparison.
The first example is a study of Duchenne Muscular Dystrophy (DMD) [24]. The DMD carriers generally are elevated by certain serum enzymes, not by physical symptoms. The measurements of 3 biomarkers of DMD of 87 normal and 38 carrier females were collected in this data set. The sample means of the three biomarkers in the normal and carrier groups are, respectively,
and the sample covariance matrices are
Table 7 presents the results of biomarker selection. Both the Forward and Backward approaches select the first and the third biomarkers. We find that the decrease in the pAUC, which occurs when removing the second biomarker, is slim. The stepwise details are provided in Table 8.
Another real example, four biomarkers (lutein, TBARS, HDL cholesterol, and uric acid) are used for construction of a classification tool for atherosclerotic coronary heart disease [10]. A cohort of 434 subjects, which includes 72 cases and 362 controls, was selected for the analysis. One obtains an insignificant conclusion in testing the null hypothesis of normality. For the nondiseased and diseased groups, the estimated means of the four markers are
and the two sample covariance matrices are
From Table 7, we obtain a different optimal linear combination of the full data set, in which the impact of the first biomarker lutein is diminished, while those of the other three are increased. Before the biomarker selection, the first two biomarkers, lutein and TBARS, seem to be important to the disease as evidenced by the magnitudes of their coefficients. However, after the biomarker selection, the two stepwise selections produce the same conclusion that only the biomarker lutein achieves statistical significance, as seen in Table 7 and 8.
The third example consists of 106 breast tissue samples [25]. Among them, 54 are classified as diseased and 52 as nondiseased. Nine biomarkers are available. The data can be downloadable from the additional files (see Additional file 2, [26]). Table 9 reports the results of the two biomarker selections of the standardized data. The biomarker set selected by the Forward method surpasses the set selected by the Backward method. Further, the two methods select two different sets of significant biomarkers. While the Backward approach discards the biomarkers more likely to be in the bottom group (in terms of the magnitude of the correspondent coefficient in the optimal linear combination of the full data set), the Forward approach does not select the four biomarkers with the largest coefficients in the full model. The latter implies an inconsistency between the coefficient of the optimal linear combination and the marginal discriminatory power of a biomarker. From an indepth investigation, we found that in these top four biomarkers the nondiseased population is far more varied than the diseased population (see Additional file 1). This leads to a low pAUC value and hence an insignificance in testing the marginal discriminatory power. In contrast, a biomarker with a more homogeneous nondiseased population is preferred under the pAUC criterion. Since our proposed methods do not terminate after an insignificant finding, the impact of the variable ordering during selection is narrowed.
For a comparison, we also report the result of the optimal linear combination of the reduced biomarker sets, which are selected using the LASSO. Two different λ’s are used: the one achieving the minimum mean crossvalidation error, denoted as λ_{min}; and the maximal value such that the corresponding mean error is within 1 standard error of the minimum, denoted as λ_{1SE}. From Table 9, we find that using λ_{min} in the LASSO produces the most conservative selection, in which none of the biomarkers are discarded. Using λ_{1SE}, the LASSO selects a quite different biomarker set from those selected by our two approaches. This method is better than the Backward method but is surpassed by the Forward method for this application in terms of the sample maximal pAUC of the selected biomarker set. The analyses were performed by using the package cv.glment of R software with deviance loss and 10fold crossvalidation.
These three biomarkers of the third example, I0, A/DA and MAX IP were considered as the most discriminatory biomarkers in original paper [25]. From Table 9, we can observe that none of the biomarker sets selected by the discussed methods include all three biomarkers at the same time. One major reason for this is that the response, which originally had a more detailed categorization of six classes, is condensed into a binary variable here. Further, the objective function of original paper was the accuracy, while we consider the pAUC in this study [25]. Thus, different relevant statistical information is captured.
Discussion
In this study, we focus on disease diagnosis with the presence of multiple biomarkers. We consider the class of linear combinations for an effective and easytointerpret summarization of the multiple biomarkers. The diagnostic power of a linear combination is evaluated based upon its pAUC over a clinically relevant threshold region. To be more precise, we consider the requirement of a high specificity for the purpose of population screening.
Under the binormality assumption, the pAUC of a linear combination is estimated via the employment of MLEs of the population parameters. In addition, the strong consistency of the estimated optimal linear combination is proved. We also introduce a testing procedure to assess the overall diagnostic power of a set of biomarkers based on the greatest pAUC it can achieve in the class of linear combinations. Furthermore, a testing procedure for determining the conditional contribution of a single biomarker given the existence of other biomarkers is developed. The parametric bootstrap method is applied to find the critical value(s) of the tests. These proposed tests are then embedded in two biomarker selection approaches. The finite sample performance of the proposed methods is studied by using both synthetic and real data sets. In addition, the robustness of our approaches with regard to the deviation from the binormality assumption is investigated via a simulation, and a comparison of our biomarker selection methods with the LASSO is conducted in a real data analysis.
Our methods differ with other algorithmbased markerselection approaches in that we propose to select or discard a biomarker based upon evidence of statistical significance. As a tradeoff, our methods involve many computations in order to acquire statistical evidence. This decreases the feasibility of applying these methods to larger data sets. Consequently, our methods are less appropriate in an exploratory study. We suggest the application of adequate data filtering for dimension reduction prior to advanced statistical confirmatory analysis, such as the construction of a diagnostic rule.
One common issue of selecting biomarkers based on the observed data is overfitting. To prevent such a problem, one may use the method of crossvalidation. This method can be easily applied to our proposed procedure. Hence, if the prediction power is the primary goal and the overfitting is a concern in a real application, then the investigators can easily integrate the crossvalidation method into our procedure. Although in this paper, we did not discuss more on overfitting, the bootstrap resampling method we used in our procedure, which takes the sampling variation into account, can guard against overfitting to some extents.
This research is conducted under the assumption that the biomarkers follow a multivariate normal distribution. The proposed statistical procedures are shown to be moderately sensitive to the distributional assumption via a numerical study. By using a nonparametric estimation of the pAUC as an alternative (for example, the empirical pAUC), the proposed methods can be generalized. But, theoretical verifications are still necessary for the resultant estimation of the optimizer. The nonsmoothed functional form greatly increases computational difficulty. Development of nonparametric approaches may be more challenging, yet they can be more broadly applied. However, this topic is beyond the scope of our study.
Conventionally, a biomarker is often characterized by its mean and variance. However, from the simulation, we find that the correlation between biomarkers can play a critical role yet is often less emphasized. The pAUC of the linear combination of a set of biomarkers may be increased by including another biomarker, which is individually independent of the disease but highly correlated with other important biomarkers. The improvement of the pAUC can be substantial. Further, we observe that the correlation between biomarkers in the nondiseased group has a greater effect than that of in the diseased group. On the other hand, from the real example we observe that a biomarker with a more homogeneous nondiseased population is more likely to have a greater pAUC.
Before proceeding to the proposed testbased biomarker selection, suitable data standardization is recommended in order to have a fair ordering of the biomarkers by their coefficients in the best linear combination. Different standardizations can lead to different results in the best linear combination and hence differences in the ordering. However, in our methods, because all biomarkers enter the evaluation process and are assessed by incorporating their sampling variations, the effect of standardization is minimized. In fact, in the first two real examples of this study, the same conclusions are obtained with or without the standardization, which shows that our testbased procedures are robust with respect to the choice of standardization. The analysis of the raw data is provided in the additional files (see Additional file 1).
There are other options for ranking the biomarkers. For example, consider a ranking based on the association between every individual biomarker and the disease response measured by the pvalue of a univariate ttest under the normality assumption. Or, because our article emphasizes the pAUC criterion, another possible ranking can be based upon the estimated marginal pAUC, as well as the sampling error, of a biomarker. However, these methods are more computationally intensive, and furthermore, they are unable to recognize associations between a biomarker and the disease in the presence of other biomarkers. Here, we propose using the coefficients of the optimal linear combination of the complete biomarker set as a ranking criterion. Our ranking criterion is relatively simple and roughly maps out biomarkers based on their importance. The limitation of this method is that in order to avoid the computational difficulty, the sampling error is not taken into consideration. We learn from one of the examples that an inconsistency between the coefficient of the optimal linear combination and the marginal discriminatory power may occur. Despite this, there is no criterion of an early stop and every biomarker is evaluated throughout the biomarker selection procedure in order to minimize the ranking effect.
As in a conventional regression analysis, we do not apply any multiplicity adjustment to strictly control a familywise type I error rate in the selection procedures. However, if the investigators require a more confirmatory conclusion, a multiplicity adjustment may be necessary. The Forward selection has a fixed number of steps, and hence it involves a simple multiple comparison problem. The conventional Bonferroni’s adjustment, by using the significance level α/ p at each step, can be applied directly. The Backward selection may take 1, p or p + 1 step(s) to reach the final conclusion. Then, the simplest and most conservative way is to use the significance level α/ (p + 1) at each step for a control of the familywise error rate. Of course, with multiplicity adjustment, the comparison of the two biomarker selection approaches may yield different results.
Conclusions
Our proposed biomarker selection approaches can be used to find the significant biomarkers based on hypothesis testing.
Abbreviations
 aSAH:

Aneurysmal subarachnoid hemorrhage
 PSA:

Prostatespecific antigen
 DMD:

Duchenne muscular dystrophy.
References
 1.
National Cancer Institute: PDQ® Prostate Cancer Screening. Bethesda, MD: National Cancer Institute, Date last modified 06/08/2012. Available at: http://www.cancer.gov/cancertopics/pdq/screening/prostate/HealthProfessional/Page3#Section_67. Accessed 06/08/2012
 2.
Etzioni R, Kooperberg C, Pepe M, Smith R, Gann PH: Combining biomarkers to detect disease with application to prostate cancer. Biostatistics. 2003, 4: 523538. 10.1093/biostatistics/4.4.523.
 3.
Madu CO, Lu Y: Novel diagnostic biomarkers for prostate cancer. J Cancer Educ. 2010, 1: 150177.
 4.
Weng CG, Poon J:Proceedings of the Seventh Australasian Data Mining Conference. A new evaluation measure for imbalanced datasets. 2008, Glenelg, South Australia: Roddick JF, Li J, Christen P, Kennedy PJ: ACS, 2732.
 5.
Pepe MS, Longton G, Anderson GL, Schummer M: Selecting differentially expressed genes from microarray experiments. Biometrics. 2003, 59: 133142. 10.1111/15410420.00016.
 6.
Lasko TA, Bhagwat JG, Zou KH, OhnoMachado L: The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform. 2005, 38: 404415. 10.1016/j.jbi.2005.02.008.
 7.
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M: pROC: an opensource package for R and S + to analyze and compare ROC curves. BMC Bioinforma. 2011, 12: 7784. 10.1186/147121051277.
 8.
Turck N, Vutskits L, SanchezPena P, Robin X, Hainard A, GexFabry M, Fouda C, Bassem H, Muller M, Lisacek F, Puybasset L, Sanchez JC: A multiparameter panel method for outcome prediction following aneurysmal subarachnoid hemorrhage. Intensive Care Med. 2010, 36: 107115. 10.1007/s001340091641y.
 9.
Su JQ, Liu JS: Linear combinations of multiple diagnostic markers. J Am Stat Assoc. 1993, 88: 13501355. 10.1080/01621459.1993.10476417.
 10.
Liu A, Schisterman EF, Zhu Y: On linear combinations of biomarkers to improve diagnostic accuracy. Stat Med. 2005, 24: 3747. 10.1002/sim.1922.
 11.
Pepe MS, Thompson ML: Combining diagnostic test results to increase accuracy. Biostatistics. 2000, 1: 123140. 10.1093/biostatistics/1.2.123.
 12.
Pepe MS, Cai T, Longton G: Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics. 2006, 62: 221229. 10.1111/j.15410420.2005.00420.x.
 13.
Hsu MJ, Hsueh HM: The linear combinations of biomarkers which maximize the partial area under the ROC curves. Comput Stat. 2013, 28: 647666. 10.1007/s0018001203215.
 14.
Ma S, Huang J: Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics. 2005, 21: 43564362. 10.1093/bioinformatics/bti724.
 15.
Ma S, Huang J: Combining multiple markers for classification using ROC. Biometrics. 2007, 63: 751757. 10.1111/j.15410420.2006.00731.x.
 16.
Zhou XH, Chen B, Xie YM, Tian F, Liu H, Liang X: Variable selection using the optimal ROC curve: An application to a traditional Chinese medicine study on osteoporosis disease. Stat Med. 2012, 31: 628635.
 17.
Lin H, Zhou L, Peng H, Zhou XH: Selection and combination of biomarkers using ROC method for disease classification and prediction. Can J Stat. 2011, 39: 324343. 10.1002/cjs.10107.
 18.
Marrocco C, Duin RPW, Tortorella F: Maximizing the area under the ROC curve by pairwise feature combination. Pattern Recogn. 2008, 41: 19611974. 10.1016/j.patcog.2007.11.017.
 19.
Ricamato MT, Tortorella F: Partial AUC maximization in a linear combination of dichotomizers. Pattern Recogn. 2011, 44: 26692677. 10.1016/j.patcog.2011.03.022.
 20.
Komori O, Eguchi S: A boosting method for maximizing the partial area under the ROC curve. BMC Bioinforma. 2010, 11: 314330. 10.1186/1471210511314.
 21.
Wang Z, Chang YCI: Marker selection via maximizing the partial area under the ROC curve of linear risk scores. Biostatistics. 2011, 12: 369385. 10.1093/biostatistics/kxq052.
 22.
Marsaglia G: Choosing a point from the surface of a sphere. The Annals of Mathematical Statistics. 1972, 43: 645646. 10.1214/aoms/1177692644.
 23.
Muller M: A note on a method for generating points uniformly on ndimensional spheres. Commun ACM. 1959, 2: 1920.
 24.
Tian L: Confidence interval estimation of partial area under curve based on combined biomarkers. Computational Statistics & Data Analysis. 2010, 54: 466472. 10.1016/j.csda.2009.09.016.
 25.
Silva JE, Marques JP, Jossinet J: Classification of breast tissue by electrical impedance spectroscopy. Med Biol Eng Comput. 2000, 38: 2630. 10.1007/BF02344684.
 26.
UCI Machine Learning Repository. : , http://archive.ics.uci.edu/ml/datasets/Breast+Tissue
Acknowledgments
The authors sincerely thank the referee for their helpful suggestions in improving their manuscript. The authors would also like to thank Drew McNeil for his careful editing of their manuscript. This work was supported by the National Science Council of Taiwan, R.O.C. under the grants (NSC 1012118M004 004) and (NSC 1012118M001 001 MY2).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
All authors participated in the design and interpretation of the study. MH proved Theorem 1 and performed the simulation study. All authors contributed to the draft and have approved the final manuscript.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Discriminatory power
 Hypothesis testing
 Optimal linear combination
 Partial area under ROC curve
 Stepwise biomarker selection