Skip to main content

Biomarker selection for medical diagnosis using the partial area under the ROC curve



A biomarker is usually used as a diagnostic or assessment tool in medical research. Finding an ideal biomarker is not easy and combining multiple biomarkers provides a promising alternative. Moreover, some biomarkers based on the optimal linear combination do not have enough discriminatory power. As a result, the aim of this study was to find the significant biomarkers based on the optimal linear combination maximizing the pAUC for assessment of the biomarkers.


Under the binormality assumption we obtain the optimal linear combination of biomarkers maximizing the partial area under the receiver operating characteristic curve (pAUC). Related statistical tests are developed for assessment of a biomarker set and of an individual biomarker. Stepwise biomarker selections are introduced to identify those biomarkers of statistical significance.


The results of simulation study and three real examples, Duchenne Muscular Dystrophy disease, heart disease, and breast tissue example are used to show that our methods are most suitable biomarker selection for the data sets of a moderate number of biomarkers.


Our proposed biomarker selection approaches can be used to find the significant biomarkers based on hypothesis testing.


A biomarker is a biological indicator showing the absence, presence, or the condition of a disease, and it can be used to determine the status of a subject, the effectiveness of a treatment, and so on. Ideally, a biomarker with both high sensitivity and specificity for accurate prediction is preferred. However, it is not easy to find such a biomarker in practice. Combining biomarkers provides an alternative to improve the performance of those individual biomarkers that are currently available. The serum prostate-specific antigen PSA is a typical example. It is a well-accepted prognostic biomarker used to screen for prostate cancer. However, this test has a low specificity and therefore might lead to over-diagnosis and over-treatment. In addition to PSA, several other alternatives have also been investigated [1]. Nevertheless, there is no single alternative which outperforms PSA, and therefore most investigators propose the use of a combination of PSA and other biomarkers. The combination of PSA and percent-free PSA is an alternative method [2]. Recently, due to significant advances in biotechnology, many genetic and genomic biomarkers have been discovered that could be potential candidates [3]. Once their clinical evidence is validated, integrating multiple biomarkers in order to obtain a better prediction will become an essential and important task.

The ROC curve is the most popular graphical tool for evaluating the diagnostic power of a biomarker. It provides an exhaustive look at the trend of sensitivity over all cutoffs, and thus provides information about the relationship between the sensitivity and the specificity of a biomarker. However, the abundance of information it provides makes the comparison between biomarkers difficult, because the underlying ROC curves are often likely to cross. The area under the ROC curve (AUC), which integrates the curve over all cutoffs, is proposed for an efficient summarization. This criterion can be extended by giving different weights at various cutoffs according to, for example, the cost resulting from the prediction error in the diseased or in the non-diseased population, and the prevalence rate of the disease [4]. In some applications, investigators focus only on a part of the curve. For example, a high level of specificity is required for a biomarker serving as a population screening tool. As a consequence, a biomarker is assessed on the partial area under the ROC curve (pAUC) in a region of specificity above a certain level [57].

This study focuses on combining multiple continuous-scaled biomarkers into one single diagnostic or predictive rule for a disease with emphases on assessment of each biomarker. For better interpretability, we propose the use of a linear combination for summarization. The discriminatory power of a linear combination of biomarkers is evaluated based on the pAUC. The optimal linear combination, which provides the best discriminatory power among all combinations, is the target solution of research interest.

In the presence of multiple biomarkers, a traditional method of medical diagnosis is to fit a multiple logistic regression model to the data set. An example of this is the study of outcome prediction of aneurysmal subarachnoid hemorrhage (aSAH) patients [8]. Alternatively, seeking the maximal discriminatory power, the explicit form of the best linear combination in terms of AUC under a binormal model is derived [9]. Following their study, a solution that is superior to all others in certain scenarios when a high specificity or a high sensitivity is required was found [10]. Nevertheless, these scenarios are not universal. The use of empirical AUC estimates in finding the optimal linear combination was proposed [11, 12]. In our earlier study, we found that not only the analytical derivation, but also the computation, became much more complicated with the use of the pAUC criterion [13].

When an optimal linear combination is available, the solution is useful in evaluating either the entire biomarker set or one specific biomarker in the set. For example, the maximal pAUC of a biomarker set provides the best discriminatory power that the biomarker set can achieve. If even the best linear combination does not have a significant discriminatory power, none of the biomarkers should be considered to be associated with the disease. In addition to the global predictability, some insights on the importance of an individual biomarker can be obtained from the coefficients in the optimal linear combination. If a coefficient is nearly zero, the corresponding biomarker contributes little to disease diagnosis and is regarded as less important. In this study, we propose three testing procedures based on the optimal linear combination maximizing the pAUC for assessment of the biomarkers.

The proposed statistical tests will be embedded in two stepwise biomarker selection methods to identify biomarkers of statistical significance. It’s known that a classification is parallel to a diagnostic rule. Recently, in order to deal with big data several algorithm-based classification approaches have been proposed which also directly use either AUC or pAUC as the objective function [1421]. The computational feasibility and efficiency are usually the major considerations in development of the methods. One popular way is to add some penalty in the optimization to stabilize the calculation. The penalization naturally leads to variable selection, which is a desirable outcome in an analysis of a huge data set. In contrast, we consider the conventional stepwise selection methods, which select or discard a biomarker on the basis of the statistical significance. However, acquiring the evidence of significance necessitates intensive computation. Therefore, our methods are most suitable for the data sets of a moderate number of biomarkers.

The paper is organized as follows: In the first part of Section (Methods), the sample version of the optimal linear combination will be defined. The testing procedures for the global and individual discriminatory power will be proposed in the second part of Section (Methods). Furthermore, two biomarker selection approaches adopting the proposed tests will be developed in the third part of Selection (Methods). Numerical results, including an intensive simulation and real example analysis, are given in the first part and the second part of Section (Results). We then conclude this paper with a discussion in Section (Discussions). Finally, conclusions are given in Section (Conclusion).


Let X be a random vector of p biomarkers related to the disease of a subject, and D be the binary disease status, where D = 1 indicates a subject from the diseased population, and D = 0 indicates a subject from the non-diseased population. Suppose

X D = d MVN μ d , Σ d , d = 0 , 1 ,

where the covariance matrices Σ 0 and Σ 1 are positive definite. For any given real vector ap, the linear combination of p biomarkers, aTX, has a distribution as follows:

a T X D = d N a T μ d , Q d ,

where Q d =aTΣ d a, for d = 0,1. Let Ф(·) denote the cumulative distribution function of N(0,1) and Ф-1(·) be its inverse function. Also c(u) = Φ- 1(1 - u) and Δ μ  = μ1 - μ0, then for a given threshold at specificity (1-u), the sensitivity of aTX is equal to

F a , u = Φ a T Δ μ - c u Q 0 Q 1 .

Therefore, for a given specificity region (1-t,1) for some predetermined t (0,1), the partial area under the ROC curve (pAUC) of the linear combination, aTX, is equal to

pAUC a = 0 t F a , u du .

Similar to the AUC, the pAUC has the scale invariant property. For identification purposes, in this study the search for the optimal linear combination vector is restricted to the hyper-sphere with a unit radius. Let a* be such a pAUC maximizer; that is,

a * = arg max a E p pAUC a ,

where E p  = {a|a = 1, ap}.

Assume two independent random samples are drawn from the non-diseased and diseased populations. Let n0 and n1 be the sample sizes of the non-diseased and diseased groups, respectively, and denote their minimum as n = min {n0,n1}. Under the normality assumption, the maximum likelihood estimates (MLEs) are employed in a sample version of the optimization problem, when the population parameters are unknown. The estimated mean vectors and covariance matrices are respectively denoted as follows: μ ^ 0 , μ ^ 1 , and Σ ^ 0 , Σ ^ 1 . Moreover, let Δ ^ μ = μ ^ 1 μ ^ 0 and Q ^ d = a T Σ ^ d a , for d = 0,1. Replacing the unknown parameters in Equation (1) by their corresponding MLEs, we have a sample version of the pAUC below:

pAUC ^ n a = 0 t F ^ n a , u du ,


F ^ n a , u = Φ a T Δ ^ μ - c u Q ^ 0 Q ^ 1 .

Thus, the coefficients a* are estimated by the maximizer of Equation (2):

a ^ n = arg max a E p pAUC ^ n a .

The next theorem shows that the sample pAUC maximizer a ^ n , is strong consistent.

Theorem 1: Suppose that the conditional distribution of  X|D = d follows N (μ d , Σ d ) and Σ d is positive definite for d = 0,1. Assume that pAUC ( a ) in Equation (1) has a unique maximizer a* in E p . Then the maximizer, a ^ n , of the sample pAUC, pAUC ^ n a , in Equation (2) converges to a* with probability 1 as n → . (The proof is given in Additional file 1).

Previously, we found that the pAUC function sometimes has local extrema or multiple maxima [13]. Therefore, we proposed a multiple-initial algorithm, which utilizes multiple initial points in a conventional optimization algorithm, to reduce the risk of not finding the global maximum. The uniqueness of the maximum is assumed in Theorem 1 to ease the complications brought on by the existence of multiple maxima.

In real applications, occasionally the calculated best linear combination had a low pAUC value, or some coefficients in the best linear combination were found to be nearly zero. Numerically, the relevant biomarkers might have a limited contribution to the disease prediction. In the following section, we will discuss how to assess the significance of biomarkers in terms of their discriminatory power. The proposed testing procedures will be utilized in our biomarker selection approaches to find a compact biomarker set which consists of only significant biomarkers for disease diagnosis.

Hypothesis testing and biomarker selection

Testing the discriminatory power

When an optimal linear combination is available, the solution is useful in evaluating either the entire biomarker set or one specific biomarker in the set. The first hypothesis testing problem of interest is to assess the overall discriminatory power of a biomarker set through its maximal pAUC, which is the best discriminatory power that the biomarker set can achieve. Once the overall diagnostic power is “statistically confirmed,” the next important issue is to evaluate the contribution of each biomarker. This type of information can provide more insight about the causal relationship between each biomarker and the disease. In this subsection, the statistical procedures for testing the discriminatory power of a set or of an individual biomarker are developed.

Considering only the class of linear combinations, we evaluate the global discriminatory power of a set of p ≥ 1 biomarkers, X, by testing the following hypotheses:

H0,g: The biomarker set has no discriminatory power to the disease


H1,g: The biomarker set has a discriminatory power to the disease.

The null hypothesis H0,g is true if the optimal linear combination of the biomarker set has no discriminatory power. Or equivalently, the maximal pAUC that the set can achieve through its linear combinations is not greater than the reference limit t2/2, which is the pAUC value of the non-informative diagnosis with a diagonal ROC curve. That is,

H 0 , g : pAUC a * t 2 2 versus H 1 , g : pAUC a * > t 2 2 .

By maximizing the sample pAUC defined in Equation (2), we obtain the maximal sample pAUC and use it as the test statistic. That is,

T g = max a E p pAUC ^ n a = pAUC ^ n a ^ n = 0 t Φ a ^ n T Δ ^ μ - c u Q ^ 0 Q ^ 1 du .

In fact, T g is the estimated pAUC of the best linear combination a ^ n T X . The null hypothesis H0,g is rejected if T g is sufficiently large.

Due to the complex formulation of the test statistic, the null distribution and the right-tailed critical value are estimated by a parametric bootstrapping method. Under H0,g, X has a common multivariate-normal distribution in the two population groups. The common mean and covariance matrix are estimated from the pooled sample, and are denoted as μ ˜ p , Σ ˜ p . Consider drawing two independent random samples of size n1 and n0 from the estimated common null distribution, MVN μ ˜ p , Σ ˜ p . Then use the bootstrap samples to find the test statistic, say T g b . Repeat the sampling B times. The critical value at the significance level α is then equal to the 100 (1-α)th percentile among these T g b values. The null hypothesis H0, g is rejected if T g is greater than or equal to the critical value.

When a set consists of only one biomarker, say X i , the global effect becomes the marginal discriminatory power of X i alone. Using the correspondent pAUC to describe its discriminatory power, we can assess the biomarker by testing the following hypothesis:

H 0 , m : pAUC 1 i t 2 2 ,

where 1 i is the vector having zero components, except for a 1 in the position correspondent to X i . Again, we use the estimated pAUC value as the test statistic,

T m , i = pAUC ^ n 1 i = 0 t Φ μ ^ 1 , i - μ ^ 0 , i - c u σ ^ 0 , i σ ^ 1 , i du ,

where μ ^ 1 , i , Σ ^ 1 , i and μ ^ 0 , i , Σ ^ 0 , i are the MLEs of the mean and variance of X i in the two groups. The critical value is determined by the parametric bootstrapping method described previously. Here, only one single biomarker is involved, so the computation is even simpler.

When multiple biomarkers, X are simultaneously taken into account, we consider assessing one specific biomarker given the existence of other biomarkers. Let X T = X i - T , X i , where X i denotes the target biomarker and X i- includes the remaining ones in the set. Now the goal is to test the following hypothesis:

H0c: Given X i- , X i has no discriminatory power to the disease.

The coefficients of the optimal linear combination of X are written as a * T = a i - * T , a i * , where a i * is the corresponding coefficient of X i . In this problem, we propose evaluating the biomarker X i from a i * . Given X i- , this biomarker has no discriminatory power to the disease, if it does not contribute to the linear combination in terms of having a zero coefficient. That is, H0,c is equivalent to

H 0 , c : a i * = 0

The test statistic is the estimator of a i * , denoted by T c , i = a ^ n , i . The null hypothesis H0,c is then rejected if T c, i is either too small or too large.

To generate the bootstrap samples, the null scenario under H0,c is discussed. Under the normality assumption, given D = d, d {0, 1},

X = X i - X i | D = d ~ MVN μ d , i - μ d , i , Σ d , i - Σ d , i - i Σ d , i - i T Σ d , i .

Then in H0,cP(X i |D, Xi -) = P(X i |Xi -), which holds providing that for each realization, X i-  = x i- ,

μ 1 , i + Σ 1 , i - i T Σ 1 , i - - 1 x i - - μ 1 , i - = μ 0 , i + Σ 0 , i - i T Σ 0 , i - - 1 x i - - μ 0 , i - , σ 1 , i - Σ 1 , i - i T Σ 1 , i - - 1 Σ 1 , i - i = σ 0 , i - Σ 0 , i - i T Σ 0 , i - - 1 Σ 0 , i - i

Therefore, estimating the null distribution involves a non-trivial constrained inference. For simplicity, we consider a narrower null scenario, where P(X i |D, Xi -) = P(X i ). That is, within the two groups, not only does X i have a common distribution, but Xi is also independent from X i- . As a consequence, we then consider the following model for bootstrap samples: for d = 0,1,

X D = d MVN μ ^ d , i - μ ˜ p , i , Σ ^ d , i - 0 0 T σ ˜ p , i .

Notations μ ^ d , i - and Σ ^ d , i - represent the MLEs of the mean and covariance matrix of X i- respectively from the two samples; μ ˜ p , i , σ ˜ p , i are estimates of the mean and variance of X i from the pooled sample; 0 is the (p-1) x 1 zero vector. Repeat the bootstrap sampling B times, find the sample pAUC maximizers of the bootstrap samples, and record the B estimated coefficient a ^ n , i b correspondent to X i . The critical values are then the 100 (α/2)th and the 100(1-α/2)th percentiles among the B coefficients. The null hypothesis is rejected if the test statistic T c,i is greater than or equal to the 100 (1-α/2)th percentile, or is less than or equal to the 100 (α/2)th percentile.

Note that this conditional test is powerless to detect the significance of X i when X i- solely is independent of the disease D. Under H0,c, it’s known that

P X i , X i - | D = P X i | X i - P X i - | D .

Combining the fact that P(X i- |D) = P(X i- ), it then leads to the complete null scenario that all biomarkers are independent of the disease. Under the circumstance, the estimated coefficients have great variability subject to the requirement of unit length in the algorithm. As a consequence, the critical values become so extreme that obtaining a significant finding is unlikely, even when in fact X i is strongly correlated with the disease.

Biomarker selection

We now turn to the biomarker selection problem. By using the statistical tests in the last subsection, we are able to determine the significance of a biomarker. The amount of data is reduced by selecting the significant biomarkers.

Assume that X is the vector of the full biomarker set and let a ^ n T = a ^ n , 1 , , a ^ n , p be the estimate of the optimal linear combination as before. We then employ the idea of a classical stepwise variable selection method. First, an ordering criterion for all biomarkers is determined. Here, the biomarkers are rearranged according to their corresponding a ^ n , i values in ascending order. The ordered biomarker set is denoted by XT= (X(1),…, X(p)). Hence, X(1) is potentially the least important biomarker and X(p) is potentially the most important one. Note that the ordering criterion is reasonable only when all biomarkers are expressed in a common unit, hence an adequate standardization should be applied before we proceed to the selection procedure.

We consider two stepwise selection methods: the Forward and the Backward approaches. For convenience, define A as the set of biomarkers under consideration for the disease diagnosis in each step. The Forward procedure starts with a null A, and tests the contribution of the potentially most discriminatory biomarker X(p). The biomarker is added to A if it is significant. Then it consecutively assesses X(p-1), X(p-2) and so on. On the other hand, the Backward procedure begins with testing the overall discriminatory power of A = {X}. If there is a significant global effect, one further determines whether the potentially least discriminatory biomarker X(1) is significant. Remove the biomarker from A if an insignificant result is present. Given the result, this procedure consecutively assesses the conditional contribution of X(2), of X(3) and so on. The details are presented below:

Forward method

Step 1. Set A = Ø. Test the marginal effect of X(p) with respect to

H0,(p) : X(p)has no discriminatory power.

If H0,(p) is rejected, add X(p) to A.

Go to the next step.

Step 2. Test the significance of X(p-1) with respect to

H 0(p-1): Given A, X (p-1) has no discriminatory power.

If H0,(p-1) is rejected, add X(p-1) to A.

Go to the next step.

Step p. Test the significance of X(1) with respect to

H0,(1): Given A, X(1)has no discriminatory power.

If H0,(1) is rejected, add X(1) to A.


Backward method

Step 0. Set A = {X}. Test the global effect of A with respect to

H0,(0): A has no discriminatory power.

If H0,(0) is rejected, go to the next step; otherwise, stop and conclude A = Ø.

Step 1. Assess X(1) by removing X(1) from A and test the hypothesis,

H0,(1): Given A, X(1)has no discriminatory power.

If H0,(1) is rejected, add X(1) to A.

Go to the next step.

Step 2. Assess X(2) by removing X(2) from A and test the hypothesis,

H0,(2): Given A, X(2)has no discriminatory power.

If H0,(2) is rejected, add X(2) to A.

Go to the next step.

Step p. Assess the effect of X(p). If A = {X(p)}, stop; otherwise, remove X(p) from A and test the following null hypothesis,

H0,(p): Given A, X(p)has no discriminatory power.

If H0,(p) is rejected, add X(p) to A.


In the end of the selection process, we conclude that the biomarkers in A have a significant contribution to disease diagnosis. At Step 0 of the Backward approach, the global test is conducted; see H0,g and T g in Section 3.1. Moreover, during the selection, in testing the contribution of a specific biomarker, two different tests are applied depending on whether A is empty or not. If A = Ø, this is the problem of testing the marginal contribution of the target biomarker; see H0,m and T m,i in Section 3.1. If A ≠ Ø, then the conditional contribution of the target biomarker is tested; see H0,c and T c,i in Section 3.1.

For a study of p biomarkers, the Forward approach needs p tests for the final conclusion. However, the Backward approach is not that simple. It might stop immediately at Step 0 if an insignificant global discriminatory power is obtained. When the global significance is achieved and the first p - 1 biomarkers have all been concluded to be insignificant, we directly draw the conclusion of selecting only X(p) without verifying its significance. If none of the above is the case, the evaluation of X(p) is necessary. Hence, the Backward approach may take 1, p or p + 1 test(s) to reach its final conclusion. The stepwise method, which combines the forward and the backward selections, is another potential approach. However, it will take much longer computational time.

Sometimes a biomarker has no discriminatory power by itself, but has a contribution given the existence of other biomarkers. The contribution mainly comes from high correlations with other major biomarkers. In a selection procedure, this biomarker is likely to be selected. However, given this biomarker, the conditional test is powerless to detect other important biomarkers, as described in the last subsection. As a consequence, the Backward approach may produce a confusing conclusion: select a minor biomarker but discard a major one. On the other hand, because the Forward approach starts by assessing the marginal contribution of every biomarker, it tends to yield less positive findings if the effect sizes or the pAUCs of the biomarkers are small to moderate. In the next section, we will further explain these findings by way of a simulation study and real examples.


In this section, we perform simulation results to validate our proposed procedures, including the estimation of the best linear combination of the biomarkers, the global test of the discriminatory power of a set of biomarkers, and the two biomarker selection approaches. We generate samples of two, three and four biomarkers (p = 2,3,4) in various scenarios. To prevent the report from becoming too lengthy, we only provide a discussion on the case of two biomarkers and partial results for the cases of three and four biomarkers. More numerical results are provided in the additional files (see Additional file 1).

In the following, given the parameters values, the true best linear combinations maximizing the pAUC are found via grid-search with 106 grids. When the data dimension p ≤ 2, fixed grids are considered. When the data dimension is greater than two, the grids are drawn uniformly on the surface of a sphere [22, 23]. On the other hand, based on the sample data, the estimated best linear combinations are computed via the multiple-initial algorithm proposed in our previous study [13].

Assume that the two biomarkers X = (X1,X2)T, given D = d, follow a bivariate-normal distribution with mean μ d and covariance Σ d , where d = 0 or 1 indicates a non-diseased or diseased group, respectively. Suppose that μ0 = 0 and consequently, μ1 is equal to the mean difference, μ1 = Δ = (Δ12)T. Three values, 0.3, 0.5, and 1 are considered for Δ i ’s. To mimic a standardized data set, the two biomarkers have unit variance, and correlation coefficient ρ d . The correlation coefficient ρ d takes on one of three values: 0, 0.5 or 0.9, see Table 1. Consider the pAUC with t = 0.1. Table 1 also reports the distribution of a*TX in the two groups. Further, the last column displays the true maximal pAUC values attained.

Table 1 The setting of populations

The first case is the complete null scenario, where the two biomarkers have the same distribution in the diseased and non-diseased groups. Each linear combination provides no discriminatory power to the disease and has the reference pAUC value t2 /2 = 0.005. Define a* = 0 in this case. In Case 2–22, Δ1 = 0, Δ2 > 0, hence the second biomarker is the dominant biomarker. In Case 2–4, the two biomarkers are conditionally independent, and thus the first biomarker is completely uncorrelated with the disease while the second biomarker is the only contributor to the disease diagnosis. In Case 5–10, we find that the first biomarker can provide a non-ignorable contribution when it is correlated with the major contributor. Comparing this with Case 2–4, we observe that the global discriminatory power is significantly increased by the presence of the positive correlation. To further investigate the effect of correlation, we consider various covariance matrices. The two biomarkers are correlated only in the non-diseased group in Case 11–16, and only in the diseased group in Case 17–22. It can be seen that the existence of a positive correlation in the non-diseased group has a greater improvement in pAUC than in the diseased group. In the last three cases, Δ1 = Δ2, ρ d  = 0, and hence both biomarkers are of equal importance. The pAUC of the best linear combination increases with the common mean difference as expected.

Next, we study the empirical performances of the proposed estimated best linear combination a ^ n and the correspondent pAUC pAUC a ^ n . Consider a balanced study, in which n0 = n1 = 100. In Table 2, the empirical mean and standard error of these estimators among 1,000 replicates, denoted by Ave and SE, are reported.

Table 2 The related optimal coefficients a * , pAUC ( a * ), and the power of the global test

In estimating the best linear combination, we find that it tends to give conservative results that are biased towards zero. The estimators have the greatest variations in the complete null scenario, and the variations decrease as the discriminating power of the two biomarkers increases. The estimated pAUC tends to overestimate the true value, and similarly this tendency increases as the set of the two biomarkers have a greater diagnostic power. As suggested by a referee, the use of an independent validation test set can be expected to reduce the over-estimation. The last column displays the empirical power of the global discriminatory power test at significance level α = 5% with bootstrapping size 500. We find that the test controls the type I error rate well and has satisfactory performance in alternative cases.

Next, we apply the two biomarker selection approaches. At each step, the significance level is α = 5% and the bootstrapping size is 500. There are four possible conclusions: (i) (c 1 ,c 2 ), if both biomarkers are selected; (ii) (1,0), if only the first biomarker is selected; (iii) (0,1), if only the second is selected; (iv) (0,0), if both are discarded. If at least one biomarker is selected, the best linear combination of the reduced biomarker set, as well as its correspondent pAUC value, is solved. The mean and the standard error of the maximal pAUC among the non-empty reduced sets are reported in Table 3. Table 4 lists the proportions of the four possible conclusions of the two approaches among the 1,000 replications. In each scenario, the figure in boldface corresponds to the most likely outcome.

Table 3 The pAUC and pAUC estimate after the biomarker-selection
Table 4 The proportion of outcomes from the two biomarker selection methods among 1000 replications

From Table 3, we can see that the Forward approach generally outperforms the Backward approach except in the null case. When the first biomarker has a non-ignorable contribution mainly due to the existence of a positive correlation between the two biomarkers, such as in Case 7–16, the Backward approach has unsatisfactory performance. From Table 4, we find that in these cases, a quite certain proportion of samples select only the first biomarker, which in fact has no marginal discriminatory power at all. More specifically, after obtaining a significant global effect at step 0, the potentially less important biomarker, which is likely the first one in the simulation, is assessed. We often obtain significance due to the obvious decrease in pAUC caused by removing the biomarker. Next, the conditional discriminatory power of the second biomarker, given the first biomarker, is assessed. As explained in Section 3, the conditional test is powerless when the given biomarker is independent of the disease. Thus, this major biomarker is likely discarded after the minor biomarker is selected.

On the other hand, in these scenarios the Forward approach, which begins by assessing the most discriminatory biomarker, is not able to derive the benefits from the correlation, and has less positive discoveries, as seen in Case 8–9, 11–12 and 14–15. However, as the effect size of the biomarker increases, the Forward approach has adequate power in identification of both important biomarkers, and hence it has better performance in terms of achievement of pAUC as seen in Table 3.

To investigate the robustness of our methods with respect to deviation from the binormality assumption, we generate 1,000 random samples of two biomarkers from multivariate-t distributions with degree of freedom 3. In Table 5, the true maximal pAUC value, pAUC(a*), is found via a grid search under the multivariate-t distribution. Additionally, we report the average and the standard error of the estimated maximal pAUC value of the reduced biomarker set, which is selected via our proposed methods on the basis of binormality. We find that in this case, our methods tend to produce optimistic conclusions. The proposed pAUC estimation and the resultant biomarker selection procedures are sensitive to the binormality assumption.

Table 5 The related pAUCs based on multivariate t distribution with degree of freedom 3

Next, we study the cases consisting of three and four biomarkers (p = 3 or 4). Again, assume μ0 = 0 and μ1 = Δ = (Δ1,…,Δ p )T. Further, the covariance matrices are of the following form: for d = 0,1,

if p = 3 , Σ d = 1 ρ d 0 ρ d 1 0 0 0 1 , and if p = 4 , Σ d = 1 ρ d 0 0 ρ d 1 0 0 0 0 1 0 0 0 0 1 .

The performance of the estimated pAUC of the best linear combination of the full biomarker set, and that of the reduced biomarker set found from the two biomarker selection approaches, are presented in Table 6. Similar to the cases of p = 2, we can see that the estimated pAUC tends to overestimate the true value. By using the Backward approach, we are less likely to obtain a confusing conclusion as in the case of p = 2. Currently, the two selection approaches have comparable performance in most cases, except Case 11 of p = 3 and Case 8 of p = 4.

Table 6 The related pAUCs and the global test for three and four dimensions

Applications to real data sets

We apply our procedures to some real examples in [10, 24, 25]. The 1-specificity upper limit is t = 0.1, the stepwise significance level is α = 5%, and the bootstrapping size is 500 during the biomarker selection. We use a multiple-initial algorithm to find the estimated best linear combinations of these real examples [13]. Before the biomarker selection, standardization is conducted. After subtracting the non-diseased group mean, every biomarker is divided by its pooled sample standard deviation from the two groups for a more constant unit across biomarkers. In addition, the analytical results of the data without standardization can be found in the additional files (see Additional file 1). With regard to the distributional assumption, it has been concluded that the first two example data sets do not deviate significantly from the binormality in their original papers [10, 24]. However, in the last example, we obtain significant evidence (p-value < 0.0000) against the normality hypothesis for both samples via the package myShapiroTest of R software. Although the binormality assumption fails, this data set is still analyzed to demonstrate the applicability of our proposed methods to larger data sets. The famous algorithm-based variable selection method, LASSO, is also applied to this example for comparison.

The first example is a study of Duchenne Muscular Dystrophy (DMD) [24]. The DMD carriers generally are elevated by certain serum enzymes, not by physical symptoms. The measurements of 3 biomarkers of DMD of 87 normal and 38 carrier females were collected in this data set. The sample means of the three biomarkers in the normal and carrier groups are, respectively,

μ ^ 0 = 3.393 , 4.521 , 2.486 T , μ ^ 1 = 4.762 , 4.523 , 3.011 T ;

and the sample covariance matrices are

Σ ^ 0 = 0.032 - 0.004 0.002 - 0.004 0.007 0.001 0.002 0.001 0.011 , Σ ^ 1 = 0.768 - 0.005 0.305 - 0.005 0.009 - 0.006 0.305 - 0.006 0.227 .

Table 7 presents the results of biomarker selection. Both the Forward and Backward approaches select the first and the third biomarkers. We find that the decrease in the pAUC, which occurs when removing the second biomarker, is slim. The stepwise details are provided in Table 8.

Table 7 The estimated best linear combination and the corresponding pAUC in DMD and heart disease examples
Table 8 The Forward and Backward selections in DMD and heart disease examples

Another real example, four biomarkers (lutein, TBARS, HDL cholesterol, and uric acid) are used for construction of a classification tool for atherosclerotic coronary heart disease [10]. A cohort of 434 subjects, which includes 72 cases and 362 controls, was selected for the analysis. One obtains an insignificant conclusion in testing the null hypothesis of normality. For the non-diseased and diseased groups, the estimated means of the four markers are

μ ^ 0 = 0.128 , 0.885 , 4.077 , 6.772 T , μ ^ 1 = 0.140 , 0.934 , 4.123 , 6.911 T

and the two sample covariance matrices are

Σ ^ 0 = 0.003 - 0.000 - 0.000 - 0.005 - 0.000 0.029 0.004 0.042 - 0.000 0.004 0.049 0.027 - 0.005 0.042 0.027 0.285 Σ ^ 1 = 0.004 0.003 0.007 0.007 0.003 0.042 0.002 0.043 0.007 0.002 0.039 0.001 0.007 0.043 0.001 0.150 . ,

From Table 7, we obtain a different optimal linear combination of the full data set, in which the impact of the first biomarker lutein is diminished, while those of the other three are increased. Before the biomarker selection, the first two biomarkers, lutein and TBARS, seem to be important to the disease as evidenced by the magnitudes of their coefficients. However, after the biomarker selection, the two stepwise selections produce the same conclusion that only the biomarker lutein achieves statistical significance, as seen in Table 7 and 8.

The third example consists of 106 breast tissue samples [25]. Among them, 54 are classified as diseased and 52 as non-diseased. Nine biomarkers are available. The data can be downloadable from the additional files (see Additional file 2, [26]). Table 9 reports the results of the two biomarker selections of the standardized data. The biomarker set selected by the Forward method surpasses the set selected by the Backward method. Further, the two methods select two different sets of significant biomarkers. While the Backward approach discards the biomarkers more likely to be in the bottom group (in terms of the magnitude of the correspondent coefficient in the optimal linear combination of the full data set), the Forward approach does not select the four biomarkers with the largest coefficients in the full model. The latter implies an inconsistency between the coefficient of the optimal linear combination and the marginal discriminatory power of a biomarker. From an in-depth investigation, we found that in these top four biomarkers the non-diseased population is far more varied than the diseased population (see Additional file 1). This leads to a low pAUC value and hence an insignificance in testing the marginal discriminatory power. In contrast, a biomarker with a more homogeneous non-diseased population is preferred under the pAUC criterion. Since our proposed methods do not terminate after an insignificant finding, the impact of the variable ordering during selection is narrowed.

Table 9 The estimated best linear combination and the corresponding pAUC in the breast tissue example

For a comparison, we also report the result of the optimal linear combination of the reduced biomarker sets, which are selected using the LASSO. Two different λ’s are used: the one achieving the minimum mean cross-validation error, denoted as λmin; and the maximal value such that the corresponding mean error is within 1 standard error of the minimum, denoted as λ1SE. From Table 9, we find that using λmin in the LASSO produces the most conservative selection, in which none of the biomarkers are discarded. Using λ1SE, the LASSO selects a quite different biomarker set from those selected by our two approaches. This method is better than the Backward method but is surpassed by the Forward method for this application in terms of the sample maximal pAUC of the selected biomarker set. The analyses were performed by using the package cv.glment of R software with deviance loss and 10-fold cross-validation.

These three biomarkers of the third example, I0, A/DA and MAX IP were considered as the most discriminatory biomarkers in original paper [25]. From Table 9, we can observe that none of the biomarker sets selected by the discussed methods include all three biomarkers at the same time. One major reason for this is that the response, which originally had a more detailed categorization of six classes, is condensed into a binary variable here. Further, the objective function of original paper was the accuracy, while we consider the pAUC in this study [25]. Thus, different relevant statistical information is captured.


In this study, we focus on disease diagnosis with the presence of multiple biomarkers. We consider the class of linear combinations for an effective and easy-to-interpret summarization of the multiple biomarkers. The diagnostic power of a linear combination is evaluated based upon its pAUC over a clinically relevant threshold region. To be more precise, we consider the requirement of a high specificity for the purpose of population screening.

Under the binormality assumption, the pAUC of a linear combination is estimated via the employment of MLEs of the population parameters. In addition, the strong consistency of the estimated optimal linear combination is proved. We also introduce a testing procedure to assess the overall diagnostic power of a set of biomarkers based on the greatest pAUC it can achieve in the class of linear combinations. Furthermore, a testing procedure for determining the conditional contribution of a single biomarker given the existence of other biomarkers is developed. The parametric bootstrap method is applied to find the critical value(s) of the tests. These proposed tests are then embedded in two biomarker selection approaches. The finite sample performance of the proposed methods is studied by using both synthetic and real data sets. In addition, the robustness of our approaches with regard to the deviation from the binormality assumption is investigated via a simulation, and a comparison of our biomarker selection methods with the LASSO is conducted in a real data analysis.

Our methods differ with other algorithm-based marker-selection approaches in that we propose to select or discard a biomarker based upon evidence of statistical significance. As a trade-off, our methods involve many computations in order to acquire statistical evidence. This decreases the feasibility of applying these methods to larger data sets. Consequently, our methods are less appropriate in an exploratory study. We suggest the application of adequate data filtering for dimension reduction prior to advanced statistical confirmatory analysis, such as the construction of a diagnostic rule.

One common issue of selecting biomarkers based on the observed data is over-fitting. To prevent such a problem, one may use the method of cross-validation. This method can be easily applied to our proposed procedure. Hence, if the prediction power is the primary goal and the over-fitting is a concern in a real application, then the investigators can easily integrate the cross-validation method into our procedure. Although in this paper, we did not discuss more on over-fitting, the bootstrap resampling method we used in our procedure, which takes the sampling variation into account, can guard against over-fitting to some extents.

This research is conducted under the assumption that the biomarkers follow a multivariate normal distribution. The proposed statistical procedures are shown to be moderately sensitive to the distributional assumption via a numerical study. By using a non-parametric estimation of the pAUC as an alternative (for example, the empirical pAUC), the proposed methods can be generalized. But, theoretical verifications are still necessary for the resultant estimation of the optimizer. The non-smoothed functional form greatly increases computational difficulty. Development of non-parametric approaches may be more challenging, yet they can be more broadly applied. However, this topic is beyond the scope of our study.

Conventionally, a biomarker is often characterized by its mean and variance. However, from the simulation, we find that the correlation between biomarkers can play a critical role yet is often less emphasized. The pAUC of the linear combination of a set of biomarkers may be increased by including another biomarker, which is individually independent of the disease but highly correlated with other important biomarkers. The improvement of the pAUC can be substantial. Further, we observe that the correlation between biomarkers in the non-diseased group has a greater effect than that of in the diseased group. On the other hand, from the real example we observe that a biomarker with a more homogeneous non-diseased population is more likely to have a greater pAUC.

Before proceeding to the proposed test-based biomarker selection, suitable data standardization is recommended in order to have a fair ordering of the biomarkers by their coefficients in the best linear combination. Different standardizations can lead to different results in the best linear combination and hence differences in the ordering. However, in our methods, because all biomarkers enter the evaluation process and are assessed by incorporating their sampling variations, the effect of standardization is minimized. In fact, in the first two real examples of this study, the same conclusions are obtained with or without the standardization, which shows that our test-based procedures are robust with respect to the choice of standardization. The analysis of the raw data is provided in the additional files (see Additional file 1).

There are other options for ranking the biomarkers. For example, consider a ranking based on the association between every individual biomarker and the disease response measured by the p-value of a uni-variate t-test under the normality assumption. Or, because our article emphasizes the pAUC criterion, another possible ranking can be based upon the estimated marginal pAUC, as well as the sampling error, of a biomarker. However, these methods are more computationally intensive, and furthermore, they are unable to recognize associations between a biomarker and the disease in the presence of other biomarkers. Here, we propose using the coefficients of the optimal linear combination of the complete biomarker set as a ranking criterion. Our ranking criterion is relatively simple and roughly maps out biomarkers based on their importance. The limitation of this method is that in order to avoid the computational difficulty, the sampling error is not taken into consideration. We learn from one of the examples that an inconsistency between the coefficient of the optimal linear combination and the marginal discriminatory power may occur. Despite this, there is no criterion of an early stop and every biomarker is evaluated throughout the biomarker selection procedure in order to minimize the ranking effect.

As in a conventional regression analysis, we do not apply any multiplicity adjustment to strictly control a familywise type I error rate in the selection procedures. However, if the investigators require a more confirmatory conclusion, a multiplicity adjustment may be necessary. The Forward selection has a fixed number of steps, and hence it involves a simple multiple comparison problem. The conventional Bonferroni’s adjustment, by using the significance level α/ p at each step, can be applied directly. The Backward selection may take 1, p or p + 1 step(s) to reach the final conclusion. Then, the simplest and most conservative way is to use the significance level α/ (p + 1) at each step for a control of the familywise error rate. Of course, with multiplicity adjustment, the comparison of the two biomarker selection approaches may yield different results.


Our proposed biomarker selection approaches can be used to find the significant biomarkers based on hypothesis testing.



Aneurysmal subarachnoid hemorrhage


Prostate-specific antigen


Duchenne muscular dystrophy.


  1. 1.

    National Cancer Institute: PDQ® Prostate Cancer Screening. Bethesda, MD: National Cancer Institute, Date last modified 06/08/2012. Available at: Accessed 06/08/2012

  2. 2.

    Etzioni R, Kooperberg C, Pepe M, Smith R, Gann PH: Combining biomarkers to detect disease with application to prostate cancer. Biostatistics. 2003, 4: 523-538. 10.1093/biostatistics/4.4.523.

    PubMed  Article  Google Scholar 

  3. 3.

    Madu CO, Lu Y: Novel diagnostic biomarkers for prostate cancer. J Cancer Educ. 2010, 1: 150-177.

    Article  Google Scholar 

  4. 4.

    Weng CG, Poon J:Proceedings of the Seventh Australasian Data Mining Conference. A new evaluation measure for imbalanced datasets. 2008, Glenelg, South Australia: Roddick JF, Li J, Christen P, Kennedy PJ: ACS, 27-32.

    Google Scholar 

  5. 5.

    Pepe MS, Longton G, Anderson GL, Schummer M: Selecting differentially expressed genes from microarray experiments. Biometrics. 2003, 59: 133-142. 10.1111/1541-0420.00016.

    PubMed  Article  Google Scholar 

  6. 6.

    Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L: The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform. 2005, 38: 404-415. 10.1016/j.jbi.2005.02.008.

    PubMed  Article  Google Scholar 

  7. 7.

    Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Muller M: pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinforma. 2011, 12: 77-84. 10.1186/1471-2105-12-77.

    Article  Google Scholar 

  8. 8.

    Turck N, Vutskits L, Sanchez-Pena P, Robin X, Hainard A, Gex-Fabry M, Fouda C, Bassem H, Muller M, Lisacek F, Puybasset L, Sanchez J-C: A multiparameter panel method for outcome prediction following aneurysmal subarachnoid hemorrhage. Intensive Care Med. 2010, 36: 107-115. 10.1007/s00134-009-1641-y.

    PubMed  Article  Google Scholar 

  9. 9.

    Su JQ, Liu JS: Linear combinations of multiple diagnostic markers. J Am Stat Assoc. 1993, 88: 1350-1355. 10.1080/01621459.1993.10476417.

    Article  Google Scholar 

  10. 10.

    Liu A, Schisterman EF, Zhu Y: On linear combinations of biomarkers to improve diagnostic accuracy. Stat Med. 2005, 24: 37-47. 10.1002/sim.1922.

    PubMed  Article  Google Scholar 

  11. 11.

    Pepe MS, Thompson ML: Combining diagnostic test results to increase accuracy. Biostatistics. 2000, 1: 123-140. 10.1093/biostatistics/1.2.123.

    PubMed  Article  Google Scholar 

  12. 12.

    Pepe MS, Cai T, Longton G: Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics. 2006, 62: 221-229. 10.1111/j.1541-0420.2005.00420.x.

    PubMed  Article  Google Scholar 

  13. 13.

    Hsu M-J, Hsueh H-M: The linear combinations of biomarkers which maximize the partial area under the ROC curves. Comput Stat. 2013, 28: 647-666. 10.1007/s00180-012-0321-5.

    Article  Google Scholar 

  14. 14.

    Ma S, Huang J: Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics. 2005, 21: 4356-4362. 10.1093/bioinformatics/bti724.

    PubMed  CAS  Article  Google Scholar 

  15. 15.

    Ma S, Huang J: Combining multiple markers for classification using ROC. Biometrics. 2007, 63: 751-757. 10.1111/j.1541-0420.2006.00731.x.

    PubMed  Article  Google Scholar 

  16. 16.

    Zhou XH, Chen B, Xie YM, Tian F, Liu H, Liang X: Variable selection using the optimal ROC curve: An application to a traditional Chinese medicine study on osteoporosis disease. Stat Med. 2012, 31: 628-635.

    PubMed  CAS  Article  Google Scholar 

  17. 17.

    Lin H, Zhou L, Peng H, Zhou X-H: Selection and combination of biomarkers using ROC method for disease classification and prediction. Can J Stat. 2011, 39: 324-343. 10.1002/cjs.10107.

    Article  Google Scholar 

  18. 18.

    Marrocco C, Duin RPW, Tortorella F: Maximizing the area under the ROC curve by pairwise feature combination. Pattern Recogn. 2008, 41: 1961-1974. 10.1016/j.patcog.2007.11.017.

    Article  Google Scholar 

  19. 19.

    Ricamato MT, Tortorella F: Partial AUC maximization in a linear combination of dichotomizers. Pattern Recogn. 2011, 44: 2669-2677. 10.1016/j.patcog.2011.03.022.

    Article  Google Scholar 

  20. 20.

    Komori O, Eguchi S: A boosting method for maximizing the partial area under the ROC curve. BMC Bioinforma. 2010, 11: 314-330. 10.1186/1471-2105-11-314.

    Article  Google Scholar 

  21. 21.

    Wang Z, Chang Y-CI: Marker selection via maximizing the partial area under the ROC curve of linear risk scores. Biostatistics. 2011, 12: 369-385. 10.1093/biostatistics/kxq052.

    PubMed  CAS  Article  Google Scholar 

  22. 22.

    Marsaglia G: Choosing a point from the surface of a sphere. The Annals of Mathematical Statistics. 1972, 43: 645-646. 10.1214/aoms/1177692644.

    Article  Google Scholar 

  23. 23.

    Muller M: A note on a method for generating points uniformly on n-dimensional spheres. Commun ACM. 1959, 2: 19-20.

    Article  Google Scholar 

  24. 24.

    Tian L: Confidence interval estimation of partial area under curve based on combined biomarkers. Computational Statistics & Data Analysis. 2010, 54: 466-472. 10.1016/j.csda.2009.09.016.

    Article  Google Scholar 

  25. 25.

    Silva JE, Marques JP, Jossinet J: Classification of breast tissue by electrical impedance spectroscopy. Med Biol Eng Comput. 2000, 38: 26-30. 10.1007/BF02344684.

    PubMed  Article  Google Scholar 

  26. 26.

    UCI Machine Learning Repository. : ,

Download references


The authors sincerely thank the referee for their helpful suggestions in improving their manuscript. The authors would also like to thank Drew McNeil for his careful editing of their manuscript. This work was supported by the National Science Council of Taiwan, R.O.C. under the grants (NSC 101-2118-M-004 -004) and (NSC 101-2118-M-001 -001 -MY2).

Author information



Corresponding authors

Correspondence to Man-Jen Hsu or Huey-Miin Hsueh.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors participated in the design and interpretation of the study. MH proved Theorem 1 and performed the simulation study. All authors contributed to the draft and have approved the final manuscript.

Electronic supplementary material

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Cite this article

Hsu, M., Chang, Y.I. & Hsueh, H. Biomarker selection for medical diagnosis using the partial area under the ROC curve. BMC Res Notes 7, 25 (2014).

Download citation


  • Discriminatory power
  • Hypothesis testing
  • Optimal linear combination
  • Partial area under ROC curve
  • Stepwise biomarker selection