Evaluation of long noncoding RNA MALAT1 as a candidate blood-based biomarker for the diagnosis of non-small cell lung cancer

Background The long noncoding RNA MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) is described as a potential biomarker for NSCLC (non-small cell lung cancer). Diagnostic biomarkers need to be detectable in easily accessible body fluids, should be characterized by high specificity, sufficient sensitivity, and robustness against influencing factors. The aim of this study was to evaluate the performance of MALAT1 as a blood based biomarker for NSCLC. Results MALAT1 was shown to be detectable in the cellular fraction of peripheral human blood, showing different expression levels between cancer patients and cancer-free controls. For the discrimination of NSCLC patients from cancer-free controls a sensitivity of 56% was calculated conditional on a high specificity of 96%. No impact of tumor stage, age, gender, and smoking status on MALAT1 levels could be observed, but results based on small numbers. Conclusions The results of this study indicate that MALAT1 complies with key characteristics of diagnostic biomarkers, i.e., minimal invasiveness, high specificity, and robustness. Due to its relatively low sensitivity MALAT1 might not be feasible as a single biomarker for the diagnosis of NSCLC in the cellular fraction of blood. Alternatively, MALAT1 might be applicable as a complementary biomarker within a panel in order to improve the entire diagnostic performance.


Background
Lung cancer is the leading cause of cancer death worldwide [1] with NSCLC (non-small cell lung cancer) as the most prominent subgroup accounting for approximately 80% of all lung cancer cases. Commonly, the disease is detected in late stages resulting in short survival rates, whereas for patients with early-stage lung cancer longer survival rates could be observed [2]. Thus, the detection of lung cancer in early stages when clinical symptoms have not yet occurred appears to be a promising opportunity to decrease mortality, because in more cases a curative therapy might become possible.
In principal, biomarkers should be feasible for the detection of cancer in early stages. Thus, a major aim in cancer research is the identification of proper biomarkers. Key characteristics of diagnostic biomarkers among others are: (i) minimally-invasive to measure the biomarker in easily accessible body fluids, (ii) high specificity to avoid false-positive results in cancer-free individuals, (iii) sufficient sensitivity to detect the tumors, and (iv) robustness against potential influencing factors.
In recent years biomarker research focused on noncoding RNAs (ncRNAs), in particular microRNAs (miRNAs). MiRNAs are small RNA molecules with a length of~22 nucleotides (nt), playing a central role in the regulation of gene expression [3] and acting as tumor suppressors or oncogenes in cancer [4]. Several studies show the feasibility of using miRNAs as biomarkers in body fluids for the diagnosis of lung cancer [5][6][7][8]. However, there is a lack of consistent results between studies focused on the identification of miRNAs as biomarkers [9]. Thus, the discovery of alternative or complementing biomarkers is essential.
In addition to miRNAs, long noncoding RNAs (lncRNAs) are a promising alternative within the group of ncRNAs. LncRNAs are commonly described as RNA molecules with a length > 200 nt, playing regulatory and structural roles in biological processes. As lncRNAs are implicated as tumor suppressors and oncogenes [10], they might be feasible as diagnostic biomarkers [11]. Currently, only few lncRNAs have been described as candidate biomarkers in human body fluids [10]. HULC (highly upregulated in liver cancer) is highly expressed in hepatocellular carcinoma patients and detectable in human blood [12]. PCA3 (prostate cancer gene 3) is detectable in urine of prostate cancer patients, showing high accuracy [13]. In addition, MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) might be a candidate biomarker for NSCLC [14]. MALAT1 is a well-described lncRNA widely expressed in normal tissues [15]. In several human carcinomas MALAT1 was shown to be upregulated [16], particularly in early-stage metastasizing NSCLC.
The aim of this study was the evaluation of MALAT1 as a blood-based biomarker for NSCLC. The expression of MALAT1 was measured in the cellular fraction of peripheral human blood and the expression levels of NSCLC patients and cancer-free controls of the general population were compared.

Study population
The study was designed according to rules guarding patient privacy and with the approval from the ethics committee of the Ruhr-Universität Bochum (No. 3217-08). All participants provided written informed consent.
The cancer group of 45 NSCLC patients consisted of 21 patients with AdCa (adenocarcinoma) and 24 patients with SqCC (squamous cell carcinoma). Participants were recruited at the HELIOS Clinic Emil von Behring, Berlin, Germany. Tumor staging was performed according to the TNM classification of malignant tumors [17]. Cancer patients had not been treated by surgery, chemotherapy, or radiation therapy before blood collection. The control group of 25 cancer-free subjects was drawn from the Heinz Nixdorf Recall study, a population-based cohort of elderly subjects [18]. Characteristics of the study groups are summarized in Table 1. Detailed subject characteristics are listed in Additional file 1.

RNA isolation
Peripheral blood samples were collected from each participant in 9.0 ml S-Monovette EDTA gel tubes (Sarstedt, Nümbrecht, Germany) and centrifuged (2000 x g for 10 minutes) within 30 minutes after collection. The cellular fraction was separated from plasma and stored frozen until RNA isolation.
Samples were thawed at room temperature and RNA isolation including DNase I treatment was performed from 0.5 ml of the cellular fraction using the RiboPure Blood Kit according to the manufacturers' instructions (Life Technologies, Darmstadt, Germany).

Quantitative real-time PCR (qRT-PCR)
TaqMan assays (Life Technologies) were used for quantitative expression analyses of MALAT1 (Hs00273907_s1) as potential biomarker and of GAPDH (Hs99999905_m1), HPRT1 (Hs02800696_m1), and RPLP0 (Hs99999902_m1) as potential reference genes for normalization. Quantitative real-time PCR (qRT-PCR) was performed using a 7900 HT Fast Real-Time PCR System (Life Technologies). For the reverse transcription reaction 12 μl RNA and for the PCR reaction 5 μl cDNA were used as templates. Samples were analyzed in duplicate and non-template controls were included. For cycle threshold (Ct) estimation a fixed threshold of 0.2 was used. Ct values > 35 were considered to be under the detection limit [19] and marked as 35 for analysis [20]. Raw Ct values are presented in Additional file 1.
The performance of potential references was analyzed utilizing RefFinder [21], a web-based comprehensive tool (www.leonxie.com/referencegene.php), including the four commonly used algorithms geNorm [22], NormFinder [23], BestKeeper [24], and comparative ΔCt method [25], to evaluate the most stable reference across study groups. As the geometric mean (GM) of several reference genes is more reasonable than a single reference gene [22], the GM of potential references was calculated. Normalized MALAT1 levels were expressed as ΔCt, with ΔCt = Ct (MALAT1) -Ct (Reference) .

Statistical analysis
Median and inter-quartile range (IQR) were used to describe the distribution of MALAT1 levels. Groups were compared using the non-parametric Kruskal-Wallis test for continuous variables. Sensitivity and specificity of MALAT1 were determined from receiver operating characteristic (ROC) curves illustrating the performance of MALAT1 to discriminate the studied groups. In brief, NSCLC vs. controls, AdCa vs. controls, SqCC vs. controls, and AdCa vs. SqCC were analyzed. The bootstrap procedure (1000 runs) was used for internal validation of the estimates in the ROC analyses. Potential factors influencing MALAT1 levels were evaluated using a linear regression model. Estimates were given as β with 95% confidence intervals (CI) and p values. Here, values of β > 0 indicate a negative association between the influencing factor and MALAT1 levels, values of β < 0 a positive association. Logistic regression modeling was performed to estimate the odds ratio (OR) with 95% CI for detecting NSCLC as a function of normalized MALAT1 levels.
Statistical analyses were performed using SAS/STAT and SAS/IML software, version 9.3 (SAS Institute Inc., Cary, NC).

Expression stability of candidate references
The potential reference genes GAPDH, HPRT1, and RPLP0 were measured in all samples from NSCLC patients and cancer-free controls. Using raw Ct values no significant differences between NSCLC patients and controls could be observed for GAPDH and HPRT1 in contrast to RPLP0 (p = 0.0002), ( Figure 1). Thus, RPLP0 was excluded from further evaluation as reference gene.
In order to identify the most stable reference across the study groups RefFinder was used to rank the analyzed references. Lowest rank represents the most stable reference and highest rank represents the least stable reference ( Table 2). The GM of GAPDH and HPRT1 was identified as the most stable reference and used for normalization of MALAT1.
Distribution of MALAT1 in the study groups Table 3  Differences of normalized MALAT1 levels between cancer patients and cancer-free subjects were significant for NSCLC vs. controls (p < 0.0001), AdCa vs. controls (p = 0.0043), and SqCC vs. controls (p = 0.0001), whereas the difference between AdCa and SqCC was not significant ( Figure 2). Sensitivity and specificity of normalized MALAT1 are shown in Table 4, calculated due false-positive rates (FPR) of 0% (no false-positive test), 4% (one false-positive test), and 8% (two false-positive tests), and to the maximum Youden's Index (YI = sensitivity + specificity −1), respectively. A FPR of 4%, representing 96% specificity, resulted in 56% sensitivity for the discrimination of NSCLC from controls. The sensitivity to discriminate SqCC from controls is higher (63%) than the sensitivity to discriminate AdCa from controls (48%). A FPR of 8% (92% specificity) Figure 1 Scatter dot plots of raw Ct values of candidate reference genes. GAPDH, HPRT1, and RPLP0 were measured in patients with NSCLC (non-small cell lung cancer; N = 45) and cancer-free controls (N = 25). Horizontal bars represent median and IQR. Groups were compared using the Kruskal-Wallis test. Table 2 Results of reference analysis using RefFinder [21] to evaluate the most stable reference across the study groups did not lead to any increase in sensitivities, whereas a FPR of 0% (100% specificity) resulted in lowest sensitivities for the discrimination of controls from patients with NSCLC (47%), AdCa (38%), or SqCC (54%). Use of the maximum YI leads to an increase in sensitivity to 81% only for AdCa vs. controls, but specificity decreased to 64%. For the discrimination of AdCa from SqCC a FPR of 5% (95% specificity, one false-positive test) resulted in 8% sensitivity and a FPR of 10% (90% specificity, two false-positive tests) resulted in 21% sensitivity, whereas using the maximum YI resulted in 33% sensitivity and 86% specificity. ROC analyses on 1000 bootstrap samples resulted in similar cutoffs, sensitivities, and specificities of MALAT1 in comparison to the original analyses. The calculated 95% CI regarding NSCLC vs. controls and SqCC vs. controls indicate a good precision of this assessment, whereas AdCa vs. controls the 95% CI shows a less precision (Additional file 2).

MALAT1 as biomarker of NSCLC
The application of logistic regression models revealed a two-fold increased risk of detecting NSCLC per normalized

Potential factors influencing MALAT1
The influence of tumor characteristics on MALAT1 levels is shown in Table 5. MALAT1 is not affected by tumor size, metastasis status or lymph node status. The impact of potential influencing factors on the expression levels of MALAT1 are shown in Table 6. NSCLC showed a significant 1.63-fold (95% CI 0.75 -2.51) decrease of MALAT1 (p = 0.0003), whereas the factors gender, age, and smoking status showed no impact on the MALAT1 levels in human blood.

Discussion
NSCLC is commonly detected in late stages of the disease. Biomarkers have the potential to detect cancer at early stages, facilitating an earlier and therefore more curative therapy that ideally results in decreased mortality. In NSCLC, Gutschner et al. showed that MALAT1 regulates the expression of several metastasis-associated genes, e.g. CDCP1 (CUB domain containing protein 1) and GPC6 (glypican 6), indicating a major role of MALAT1 in disease progression [26]. Additionally, it was suggested that MALAT1 might also regulate other important cellular processes in lung cancer [26]. Thus, MALAT1 is a candidate biomarker for NSCLC [14].
For quantitative expression analysis of messengerR-NAs (mRNAs) and miRNAs qRT-PCR is considered to be the gold standard [27] and the same might be true for lncRNAs. However, to produce reliable data in qRT-PCR assays the use of appropriate reference genes for normalization is an important issue [28] and candidate reference genes need to be tested prior to application [29]. As no information regarding lncRNAs as references were accessible, mRNAs were selected as potential references. HPRT1 and RPLP0 are well-described reference genes for analyses in NSCLC tissues [30] and GAPDH was already applied for normalization of MALAT1 [15]. However, in this study RPLP0 seems to be no feasible reference which is in agreement with Falkenberg et al., showing that RPLP0 is not appropriate as reference gene in human blood samples [31]. In this study, GAPDH and HPRT1 were suitable reference genes, particularly the GM of GAPDH and HPRT1 showed the best reference performance. This is in accordance with Ulivi et al., using GAPDH and HPRT1 for normalization of mRNAs in blood samples of NSCLC patients and controls [32].
One key characteristic of proper diagnostic biomarkers is the need to be detectable in easily accessible body fluids like peripheral blood. In this study MALAT1 was measured in the cellular fraction of human blood, showing that this matrix is in principle appropriate for the analysis of lncRNAs. Comparable results for the usability of the cellular blood fraction were shown for miRNAs [33,34]. Commonly, the cellular fraction obtained during plasma preparation is discarded, but it might be reasonable to collect this matrix in biobanks for subsequent biomarker discovery.
In this study, a significant downregulation of MALAT1 in NSCLC patients in comparison to cancer-free controls was shown. Comparable results were achieved by Zhang et al., showing a downregulation of MALAT1 in patients with hepatocellular carcinomas [35]. However, MALAT1 was implicated to play an oncogenic role [10] and upregulation of MALAT1 was observed in several Values of β > 0 indicate a negative association between the analyzed factor and MALAT1 levels, values < 0 a positive association. other cancers, e.g. of the breast and prostate [16]. Such differences might be caused by the paradigm that in fact MALAT1 is expressed ubiquitously but fulfills tissuespecific functions depending on the cellular environment [36]. Commonly, MALAT1 is analyzed in tissues [14][15][16], whereas in this study and the study of Zhang et al. [35] MALAT1 was detected in blood. Because MALAT1 was detected in the cellular fraction of blood, it is unlikely to be directly produced by the tumor tissue. Its downregulation in blood cells may be an indirect effect of the tumor, e.g., on the immune system. The source of MALAT1 in the cellular fraction of blood remains unclear. Theoretically, it might originate from leucocytes altered by the tumor. However, further analyses are needed to evaluate the origin of MALAT1 in human blood. Very recently, it was shown that MALAT1 was detectable in plasma of patients with gastric or prostate cancer [37,38]. Thus, it would be reasonable to analyze MALAT1 in plasma of NSCLC patients instead of the cellular fraction, because the presence of MALAT1 in plasma might be a direct effect of the tumor, e.g., release of lncRNA-containing extracellular vesicles [39]. Tani et al. showed that the stability of MALAT1 varied in various cell types and indicated that the half-life of MALAT1 is shorter than the median half-life of mRNA [40]. Such decay of MALAT1 might also prevail in blood cells. It is well known that systems like PAXgene, Tempus, and RNAlater stabilize mRNAs and miRNAs in whole blood samples [41][42][43]. Thus, the performance of the assay was additionally tested in a few available blood samples stabilized by PAXgene or RNAlater. In the stabilized blood samples MALAT1 is detectable at lower Ct values corresponding to larger quantities (data not shown). The results implicate that the use of stabilization systems might be meaningful for lncRNA analyses in blood. However, this assumption needs to be verified in more detail.
Regarding the key characteristics of an obligatory high specificity and a sufficiently high sensitivity of diagnostic biomarkers, the candidate biomarker MALAT1 does not fulfill both criteria. Generally, in screening cohorts a high specificity is needed to avoid an unacceptably high number of false-positive tests that would result in psychological pressure and needless intervention for the patients. Thus, the sensitivity of candidate biomarkers should be calculated at a fixed high specificity level [44]. In regard to the relatively small study group the specificity of 96% is quite high, particularly as this corresponds to only one single false-positively tested control. On the other hand, the calculated sensitivity is too low (56%) for the use of MALAT1 as a single biomarker for the diagnosis of NSCLC, particularly for the subtype AdCa (48%). However, lower sensitivity could be balanced by the use of several biomarkers in a panel. Theoretically, in an optimal panel every single biomarker is characterized by sufficiently high sensitivity and the necessary high specificity, perfectly complement each other in order to obtain superior diagnostic performance [45]. Thus, it might be reasonable to verify MALAT1 in combination with other biomarkers in larger study groups to improve the entire diagnostic performance of the biomarker panel. However, for the discrimination of AdCa and SqCC, a sensitivity of only 8% precludes MALAT1 as a biomarker for the differential diagnosis of NSCLC subtypes.
Bootstrap analysis showed that the calculated cutoffs, sensitivities, and specificities remain stable, indicating that the calculated values are appropriate for the discrimination of patients and controls.
Regarding the fourth key characteristic of diagnostic biomarkers, the results indicate that MALAT1 values in blood are not correlated with tumor size, metastasis status, or lymph node status. However, more cases of earlystage metastasizing NSCLC need to be analyzed in subsequent studies because this study comprises only three cases with tumor stage T1 or T2. Additionally, MALAT1 seems to be relatively independent from common influencing factors like age, gender, and smoking status, indicating the robustness of the candidate biomarker. These observations are in agreement with MALAT1 expression in tissue [46]. However, it has to be clarified if other potential influencing factors from the multitude of biological, preanalytical, and analytical factors show an impact on MALAT1 levels in human blood.

Conclusions
MALAT1 could be detected in peripheral blood, showing different expression levels between NSCLC patients and cancer-free controls. It was shown that MALAT1 complies with key characteristics of diagnostic biomarkers, being minimally-invasive, exhibiting high specificity, and robustness. On the contrary, the observed sensitivity is too low for the use of MALAT1 as a single biomarker for the diagnosis of NSCLC using the cellular fraction of

Additional files
Additional file 1: Subject characteristics and raw data of MALAT1, GAPDH, and HPRT1 expression analysis.
Additional file 2: Marker cutoffs with 95% CI for NSCLC (non-small cell lung cancer) vs. controls, AdCa (adenocarcinoma) vs. controls, SqCC (squamous cell carcinoma) vs. controls, and AdCa vs. SqCC after bootstrap analysis with 1000 random samples, according to false positive rates (FPR) corresponding to none, one, and two false-positive tests and maximum Youden's Index (YI).

Competing interests
The authors declare that they have no competing interests.
Authors' contributions DGW conceived of the study, participated in its design and coordination, and drafted the manuscript. GJ participated in study design and coordination and helped to draft the manuscript. SC performed the statistical analyses and helped to draft the manuscript. OB performed the experiments and helped to draft the manuscript. BP and KHJ participated in the statistical analysis and helped to draft the manuscript. JK participated in study design and helped to draft the manuscript. TB participated in study coordination and helped to draft the manuscript. All authors read and approved the final manuscript.