Quality assessment metrics for whole genome gene expression profiling of paraffin embedded samples
© Mahoney et al.; licensee BioMed Central Ltd. 2013
Received: 10 August 2012
Accepted: 18 January 2013
Published: 30 January 2013
Formalin fixed, paraffin embedded tissues are most commonly used for routine pathology analysis and for long term tissue preservation in the clinical setting. Many institutions have large archives of Formalin fixed, paraffin embedded tissues that provide a unique opportunity for understanding genomic signatures of disease. However, genome-wide expression profiling of Formalin fixed, paraffin embedded samples have been challenging due to RNA degradation. Because of the significant heterogeneity in tissue quality, normalization and analysis of these data presents particular challenges. The distribution of intensity values from archival tissues are inherently noisy and skewed due to differential sample degradation raising two primary concerns; whether a highly skewed array will unduly influence initial normalization of the data and whether outlier arrays can be reliably identified.
Two simple extensions of common regression diagnostic measures are introduced that measure the stress an array undergoes during normalization and how much a given array deviates from the remaining arrays post-normalization. These metrics are applied to a study involving 1618 formalin-fixed, paraffin-embedded HER2-positive breast cancer samples from the N9831 adjuvant trial processed with Illumina’s cDNA-mediated Annealing Selection extension and Ligation assay.
Proper assessment of array quality within a research study is crucial for controlling unwanted variability in the data. The metrics proposed in this paper have direct biological interpretations and can be used to identify arrays that should either be removed from analysis all together or down-weighted to reduce their influence in downstream analyses.
KeywordsHigh-dimensional array quality Formalin-Fixed Paraffin-embedded tissue Outlier detection
Many institutions have large archives of formalin-fixed paraffin-embedded (FFPE) tissue. Compared to the general availability, sample collection protocols, and time-sensitive nature of fresh-frozen tissue, these large archives of FFPE tissues are easily assessable and provide a unique opportunity for understanding genomic signatures of disease on a large scale as well as the ability to evaluate long-term prognostic associations [1, 2]. These FFPE samples have been relatively untouched by high dimensional platforms due to RNA degradation and cross-linking of nucleic acids due to formalin fixation process . However, Illumina introduced their cDNA-mediated Annealing Selection extension and Ligation (DASL) assay that is specifically designed to enable whole genome expression profiling using degradated RNA and is used in conjunction with their BeadArray technology [4–7]. Similarly, the Ovation® FFPE WTA system is available from NuGEN for processing archival tissues to be analyzed by the Affymetrix platform. Although sequencing-based technologies are seen by many as a better alternative to microarray-based methods, sequencing is limited by difficult sample preparation protocols for FFPE samples and the cost of large-scale studies. In addition, several works have reported on the validity of microarray-based approaches to FFPE relative to fresh-frozen tissue and the growth for this technology will most likely increase rapidly [8–10].
Although normalization will equalize the distribution of feature intensities across the arrays, there remains a need to assess the quality of the data. For example, of 7 FFPE experiments submitted to Gene Expression Ominbus (GSE20140, GSE19977, GSE23368, GSE20017, GSE25727, GSE28064, and GSE21921) only the latter two studies acknowledged that array quality assessments were even conducted and neither of these two studies reported their findings [1, 2, 8, 9, 15–17]. Recently Chow et al. reported on their workflow of assessing array quality for FFPE samples using the lumi pipeline . Although this work is an important initial step towards assessing the quality of array data using FFPE samples, the metrics used are based on measures of multidimensional dissimilarity; a concept that may be unfamiliar to the average researcher. Furthermore, thresholds for declaring a sample to be an outlier is study specific and thus make inter-study interrogation difficult.
In this work, we introduce two metrics that easily can be used to assess microarray quality regardless of the platform under consideration and have direct clinical interpretations. These two metrics are used 1) to measure how much data from a single microarray needs to be “stretched” during the normalization process in order to make its marginal distribution match with the remaining arrays (Stress) and 2) a measure of how much a single array deviates from the remaining arrays within the experiment post-normalization (dfArray). We compare our findings to currently available metrics for FFPE samples using the DASL assay and show the benefit of removing arrays of questionable quality from an experiment where differential expression is the primary objective.
The case study consisted of patients with resected HER-2 positive breast cancer who were enrolled in the adjuvant N9831 trial (NCT00005970), which was a Phase III trial where patients were randomized to three arms: (Arm A) doxorubicin and cyclophosphamide followed by weekly paclitaxel, (Arm B) same as Arm A but followed by 1 year of sequential trastuzumab, or (Arm C) same as Arm A but with 1 year concurrent trastuzumab started the same day as weekly paclitaxel . Patient consent was obtained for additional translational work related to the tumor specimens and the institutional review board of all participating institutions approved the study. A total of 1632 samples from 1460 unique patients were labeled using the Whole-Genome DASL HT Assay and hybridized on the HumanHT-12 v4 Expression BeadChip. Patient samples were randomized onto 96-well plates, stratified by treatment arm, year on N9831 study and nodal status. The final dataset used herein consists of 1618 arrays after removing subjects that had withdrawn consent post data acquisition.
Where Y ij denotes the intensity values after background correction, μ ij = log2θ ij represents the “true” relative amount of a feature hybridized to the array and is the primary parameter of interest in microarray experiments, S ij = log2K ij represents systematic biases, and ε ij = log2θ ij represents random variation with mean 0 and variance σ i with the subscript indicating that the variance is feature specific.
Review of other metrics
and compares a given array’s feature intensity relative to the median level of intensity for that feature across all j arrays. The array-specific distribution of RLE is used to determine if a particular array has predominately low- or high-expressed features as indicated by an overall shift. This metric is easily applicable to any microarray platform. However, for normalization routines that leverage probe-specific information such as loess, RLE ≅ 0 by definition so one does not expect to see large shifts. Moreover, the spread in the distribution of RLE is not independent of feature variance σ i 2. This makes distribution summaries difficult to interpret for the purpose of outlier detection as an outlier for a particular feature can be masked by the other features with large variance.
The two measures only differ in that the GNUSE metric uses distributional information on Y ij ' from a large collection of stored arrays to estimate the denominator median j (SE(Y ij ')) whereas NUSE re-estimates this for each new experiment. Regardless of which form is used, if the median NUSE or GNUSE for a particular array is high, this would be an indication that many of the features are behaving poorly and thus the array should be considered for removal. A value of 1.25 for the median NUSE or GNUSE has been suggested by McCall as a guideline for identifying bad arrays as this suggests that the variation for the array is 25% higher than an average array.
Quality assessment strategies for Formalin Fixed Paraffin-Embedded tissues analyzed with Illumina’s DASL assay
Mahoney et al.
Chow et al.
Calculate Stress and dfArray
Calculate Outlier using un-normalized raw data
(Plot Stress vs dfArray)
Stage 1: Remove arrays with Stress ≥ 1.5
Stage 1: Remove arrays with Outlier ≥ Th*median(Outlier) (Default Th = 2)
Renormalize data after removing bad arrays
Renormalize data after removing bad arrays
Calculate dfArray on renormalized data
Calculate Outlier on renormalized data
Stage 2: Investigate arrays with dfArray ≥ 2
Stage 2: Remove arrays with Outlier ≥ Th*median(Outlier) (Default Th = 2)
Final normalization after removing all outlying arrays
Where Z ij represents the feature mean centered and scaled pre- or post-normalized expression data for the ith feature from the jth array and Target i represents a robust estimate of the feature mean across all arrays and is a correspondingly a pre- or post-normalized estimate (Table 1). The dissimilarity function used is either the Euclidean distance of the jth array from the Target or one minus the correlation between the jth array and the Target. The lumi package considers an array as an outlier whenever lumiOutlier j > Th × median (lumiOutlier j ) , where Th is a user-specified threshold (default specified in the package is Th = 2). It is difficult to attribute a biologically meaningful interpretation of this metric in such a way as to make it easily transparent to the average researcher. Another drawback is that the threshold is defined relative to the current sample of arrays. Thresholds that are sample dependent are problematic in practice as they vary from batch-to-batch and provide no sense of global quality of an array beyond the average array within the current batch. If, for example, the average array is also of poor quality, the researcher is left with an experiment containing many poor arrays jeopardizing the validity of the study.
To address the shortcomings of the metrics purposed thus far, we propose two metrics that combine the essence of RLE, NUSE/GNUSE, and the lum iOutlier, yet are flexible enough to be implemented on a broad spectrum of microarray platforms with direct biological interpretation. Importantly for the analysis of archival tissues, the proposed metrics allow for the identification of poor arrays that have undue influence during the normalization process. Such arrays are fairly obvious to identify when evaluating data from fresh-frozen samples; however, it is less obvious to determine a threshold for determining poor samples with archival samples.
and is calculated across all i features on a specific array. The log 2 is used here to indicate that the index will need to be transformed to the fold-change scale. Also, by taking the absolute value, features that are up or down regulated by “x-fold” are considered equally Stress ed. Various distributional summaries and figures can be generated on Stress j , but we found the median to be the most useful. Arrays can be rank ordered according to their Stress values, and the arrays with the highest or more disparate Stress values would be considered as suspect for inclusion in the study. As an example, if the median Stress of an array is 2, this would indicate that half of the features had to be adjusted by 100% or more relative to their initial values. For many studies, a 2-fold change is the biological effect size of interest. Any final result becomes highly suspect when it is of the same order of magnitude as the biases that were removed from the data.
as values that fall above or below are viewed as equivalent errors. For this work we consider any array with 25% of the features having expression levels larger than twice the standard deviation above the median expression as suspect. This threshold can certainly be modified by the user and by expressing the cutoff in terms of standard deviations above the median expression level allows for a better reference of understanding amongst researchers with basic statistical training.
As we show in the results, dfArray is highly correlated with the dissimilarity metric used in the lumi package. Since the dissimilarity metric is used in clustering procedures, this indicates that arrays with a large dfArray index may be associated to clinical subclasses not accounted for in the normalization process. Our proposed quality assessment strategy for FFPE samples analyzed is outlined in Table 1 and the R package Stress.dfArray is freely available at http://mayoresearch.mayo.edu/mayo/research/biostat/splusfunctions.cfm.
Distributional characteristics of arrays
As described above, the case study used throughout consists of 1618 HumanHT-12 v4 Expression BeadChip DASL assays that were generated as part of an ongoing breast cancer study that analyzed FFPE archival tissues. Boxplots of the log 2 transformed intensity values showed that the quality of the data varied dramatically between the samples. Specifically, it was apparent that some of the samples failed completely, while there were other samples for which it appeared that some of the probes worked while other probes did not. Figure 2A displays box-plots of the pre-normalized expression values for 40 samples, representing various array qualities. For presentation purposes, samples were assigned to 4 array-quality groups based on the interquartile range (IQR = Q3 – Q1) and skewness (skew = (Q3-Q1)/IQR; symmetric distribution will have skew = 0.50) in order to represent the extremes in array quality and 10 representative samples are shown for each group. Approximately 15% of the 1618 FFPE samples examined exhibited large skewness (shown in quadrant R2), a small IQR (quadrant R3), or both (quadrant R4). Unlike data from fresh-frozen samples where only a couple of arrays might be poor and are obvious to detect, the distribution of intensity values from archival samples vary dramatically and there is not a clear threshold for determining which arrays are of poor quality.
Association of quality metrics with array characteristics
Concordance of quality metrics
Benefit of conducting quality assessment on array data
The use of microarrays in understanding disease pathogenesis has seen extraordinary growth over the last decade. Historically, data generated by this technology has been used for class comparisons (comparing gene expression profiles between known disease states), class prediction (prediction of disease state), and class discovery (identification of new subclasses of disease base on gene expression profiles). Recently, interest has moved from the bench to the bed side where treatment decisions based on gene-expression profiles obtained from microarrays are being considered . In fact, this is the objective of the current case study; to define a molecular signature to predict response to trastuzumab for HER2-positive breast cancer patients.
As the use of microarrays has increased, so to have the concerns about the validity of this technology [30–33]. Some of these concerns broadly revolve around proper analytical methods, the concordance of results between publications, centers, or laboratories, and the concordance of results between different platforms, to name just a few. Several research initiatives have formed over the years to investigate these concerns dating back to the early days of “Affycomp”  to the more recent formation of the External RNA Control Consortium and the MicroArray Quality and Countrol projects . These efforts have facilitated greater communication between researchers as well as the development of standard practices to increase the validity of microarray technologies. The overarching theme resulting from these efforts is that microarray technologies are reliably reproducible across many different settings with proper laboratory procedures, data handling, and scrutiny. Several investigations have reported on the gain in analytic efficiency when poor-quality microarrays are removed [20, 35]. However, most and if not all of this work has centered on analysis of fresh-frozen samples.
Analysis of archival tissues presents a new challenge and is complicated by poor RNA quality and significant variation among FFPE samples that have been preserved over the course of many years and under different conditions. As we have shown here, this variation in sample quality for FFPE samples creates large variation in the expression profiles across arrays that are typically not seen when dealing with fresh-frozen samples. This has spurred many questions regarding the normalization, quality assessment, and analysis of array based studies using FFPE samples .
The choice of normalization routine may have an impact on downstream analyses when it comes to FFPE samples. Many of the FFPE samples in the present study exhibit a high prevalence of “dead probes” where little or no signal is generated beyond background. Many of the more popular normalization routines (e.g., quantile, loess) used in practice were developed on data where the prevalence of dead probes was very small. Therefore, we believe additional studies are required to determine the best normalization strategy for data that is generated from the FFPE samples.
It is important to note that normalization is not the end all step to preprocessing microarray data and certainly not a solution for poorly-designed studies. Assessing the quality of microarray data is essential and the two metrics proposed here, Stress and dfArray, are easily applicable to any microarray platform for this purpose. For studies using FFPE samples, removing arrays that are of poor quality from the normalization process reduces the bias in the estimated feature abundance and the noise level in the data and thus increases the ability to detect biologically-meaningful differences. Some have suggested that the information provided by the quality metrics could also be used to weight downstream analyses towards arrays with better quality . This is potentially a viable option for studies using FFPE samples, but more research is needed. We anticipate that the arrays identified by the Stress metric as being an outlier have the greatest influence on the normalization process and therefore will need to be excluded. However, the Stress metric could be recomputed after removing outliers and either the newly-computed Stress metric or dfArray could be used to down weight arrays during differential-expression analyses.
As more high dimensional data become publicly available, there is an increasing interest to pool data across studies, or at the very least, mine these repositories for promising biomarker signatures prior to initiating a research project. At our institution, such an endeavor is being implemented through the creation of the B iologically O riented R epository A rchitecture (BORA), which is an informatics warehouse of “-omics” data that is linked to the tissue pathology and clinical characteristics of the patient. These types of initiatives require robust quality metrics to accurately assess high dimensional data across multiple studies especially when the data has been preprocessed and summarized prior to storage.
Two robust quality control metrics are presented that provide the end-users with valuable information regarding the quality of the arrays within their study. These metrics are directly applicable to any high-dimensional platform and can be easily implemented into preprocessing pipelines.
Availability and requirements
Package name: Stress.dfArray
Requirements: R-2.14.0 or later (http://www.r-project.org/)
Relative Log Expression
Normalized Unscaled Standard Error
Global Normalized Unscaled Standard Error
Deviation of array
cDNA-mediated Annealing Selection extension and Ligation
Q3: First and third quartiles
We thank Giovanni Parmigiani, PhD, for helpful discussions regarding the content of the manuscript and the Mayo Clinic Gene Expression Core for generating all of the DASL data used in this study. This work was supported in part by the National Institutes of Health [Grant #s CA25224, CA114740, and CA129949], the Breast Cancer Research Foundation, Mayo Clinic Cancer Center, and the Mayo Clinic Center for Individualized Medicine.
This work was supported in part by the National Institutes of Health [Grant #s CA25224, CA114740, and CA129949], the Breast Cancer Research Foundation, Mayo Clinic Cancer Center, and the Mayo Clinic Center for Individualized Medicine.
- Waddell N, Cocciardi S, Johnson J, Healey S, Marsh A, Riley J, da Silva L, Vargas AC, Reid L, Simpson PT: Gene expression profiling of formalin-fixed, paraffin-embedded familial breast tumours using the whole genome-DASL assay. J Pathol. 2010, 221 (4): 452-461.PubMedGoogle Scholar
- Sadi AM, Wang DY, Youngson BJ, Miller N, Boerner S, Done SJ, Leong WL: Clinical relevance of DNA microarray analyses using archival formalin-fixed paraffin-embedded breast cancer specimens. BMC Cancer. 2011, 11: 253-251. 10.1186/1471-2407-11-253. 213PubMedPubMed CentralView ArticleGoogle Scholar
- Ton CC, Vartanian N, Chai X, Lin MG, Yuan X, Malone KE, Li CI, Dawson A, Sather C, Delrow J: Gene expression array testing of FFPE archival breast tumor samples: an optimized protocol for WG-DASL sample preparation. Breast Cancer Res Treat. 2011, 125 (3): 879-883. 10.1007/s10549-010-1159-6.PubMedPubMed CentralView ArticleGoogle Scholar
- Bibikova M, Talantov D, Chudin E, Yeakley JM, Chen J, Doucet D, Wickham E, Atkins D, Barker D, Chee M: Quantitative gene expression profiling in formalin-fixed, paraffin-embedded tissues using universal bead arrays. Am J Pathol. 2004, 165 (5): 1799-1807. 10.1016/S0002-9440(10)63435-9.PubMedPubMed CentralView ArticleGoogle Scholar
- Fan JB, Yeakley JM, Bibikova M, Chudin E, Wickham E, Chen J, Doucet D, Rigault P, Zhang B, Shen R: A versatile assay for high-throughput gene expression profiling on universal array matrices. Genome Res. 2004, 14 (5): 878-885. 10.1101/gr.2167504.PubMedPubMed CentralView ArticleGoogle Scholar
- Bibikova M, Yeakley JM, Chudin E, Chen J, Wickham E, Wang-Rodriguez J, Fan JB: Gene expression profiles in formalin-fixed, paraffin-embedded tissues obtained with a novel assay for microarray analysis. Clin Chem. 2004, 50 (12): 2384-2386. 10.1373/clinchem.2004.037432.PubMedView ArticleGoogle Scholar
- April C, Klotzle B, Royce T, Wickham-Garcia E, Boyaniwsky T, Izzo J, Cox D, Jones W, Rubio R, Holton K: Whole-genome gene expression profiling of formalin-fixed, paraffin-embedded tissue samples. PLoS One. 2009, 4 (12): e8162-10.1371/journal.pone.0008162.PubMedPubMed CentralView ArticleGoogle Scholar
- Fountzilas E, Markou K, Vlachtsis K, Nikolaou A, Arapantoni-Dadioti P, Ntoula E, Tassopoulos G, Bobos M, Konstantinopoulos P, Fountzilas G: Identification and validation of gene expression models that predict clinical outcome in patients with early-stage laryngeal cancer. Ann Oncol. 2012, 23 (8): 2146-2153. 10.1093/annonc/mdr576.PubMedPubMed CentralView ArticleGoogle Scholar
- Minguez B, Hoshida Y, Villanueva A, Toffanin S, Cabellos L, Thung S, Mandeli J, Sia D, April C, Fan JB: Gene-expression signature of vascular invasion in hepatocellular carcinoma. J Hepatol. 2011, 55 (6): 1325-1331. 10.1016/j.jhep.2011.02.034.PubMedPubMed CentralView ArticleGoogle Scholar
- Waldron L, Simpson P, Parmigiani G, Huttenhower C: Report on emerging technologies for translational bioinformatics: a symposium on gene expression profiling for archival tissues. BMC Cancer. 2012, 12: 124-10.1186/1471-2407-12-124.PubMedPubMed CentralView ArticleGoogle Scholar
- Kerr MK, Churchill GA: Experimental design for gene expression microarrays. Biostatistics. 2001, 2 (2): 183-201. 10.1093/biostatistics/2.2.183.PubMedView ArticleGoogle Scholar
- Kerr MK, Churchill GA: Statistical design and the analysis of gene expression microarray data. Genet Res. 2001, 77 (2): 123-128.PubMedGoogle Scholar
- Ballman KV, Grill DE, Oberg AL, Therneau TM: Faster cyclic loess: normalizing RNA arrays via linear models. Bioinformatics. 2004, 20 (16): 2778-2786. 10.1093/bioinformatics/bth327.PubMedView ArticleGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19 (2): 185-193. 10.1093/bioinformatics/19.2.185.PubMedView ArticleGoogle Scholar
- Toffanin S, Hoshida Y, Lachenmayer A, Villanueva A, Cabellos L, Minguez B, Savic R, Ward SC, Thung S, Chiang DY: MicroRNA-based classification of hepatocellular carcinoma and oncogenic role of miR-517a. Gastroenterology. 2011, 140 (5): 1618-1628. 10.1053/j.gastro.2011.02.009. e1616PubMedPubMed CentralView ArticleGoogle Scholar
- Villanueva A, Hoshida Y, Battiston C, Tovar V, Sia D, Alsinet C, Cornella H, Liberzon A, Kobayashi M, Kumada H: Combining clinical, pathology, and gene expression data to predict recurrence of hepatocellular carcinoma. Gastroenterology. 2011, 140 (5): 1501-1512. 10.1053/j.gastro.2011.02.006. e1502PubMedPubMed CentralView ArticleGoogle Scholar
- Winn ME, Shaw M, April C, Klotzle B, Fan JB, Murray SS, Schork NJ: Gene expression profiling of human whole blood samples with the Illumina WG-DASL assay. BMC Genomics. 2011, 12: 412-10.1186/1471-2164-12-412.PubMedPubMed CentralView ArticleGoogle Scholar
- Chow ML, Winn ME, Li HR, April C, Wynshaw-Boris A, Fan JB, Fu XD, Courchesne E, Schork NJ: Preprocessing and quality control strategies for illumina DASL assay-based brain gene expression studies with semi-degraded samples. Front Genet. 2012, 3: 11-PubMedPubMed CentralView ArticleGoogle Scholar
- Perez EA, Suman VJ, Davidson NE, Gralow JR, Kaufman PA, Visscher DW, Chen B, Ingle JN, Dakhil SR, Zujewski J: Sequential versus concurrent trastuzumab in adjuvant chemotherapy for breast cancer. J Clin Oncol. 2011, 29 (34): 4491-4497. 10.1200/JCO.2011.36.7045.PubMedPubMed CentralView ArticleGoogle Scholar
- McCall MN, Murakami PN, Lukk M, Huber W, Irizarry RA: Assessing affymetrix GeneChip microarray quality. BMC Bioinforma. 2011, 12: 137-10.1186/1471-2105-12-137.View ArticleGoogle Scholar
- Wu Z, Irizarry RA: A statistical framework for the analysis of microarray probe-level data. Ann Appl Stat. 2007, 1: 333-357. 10.1214/07-AOAS116.View ArticleGoogle Scholar
- Huber W, von Heydebreck A, Sültmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002, 18 (suppl 1): S96-S104. 10.1093/bioinformatics/18.suppl_1.S96.PubMedView ArticleGoogle Scholar
- Rocke DM, Durbin B: A model for measurement error for gene expression arrays. J Comput Biol. 2001, 8 (6): 557-569. 10.1089/106652701753307485.PubMedView ArticleGoogle Scholar
- Bolstad BM, Collin F, Simpson KM, Irizarry RA, Speed TP: Experimental design and low-level analysis of microarray data. Int Rev Neurobiol. 2004, 60: 25-58.PubMedView ArticleGoogle Scholar
- Ritchie ME, Dunning MJ, Smith ML, Shi W, Lynch AG: BeadArray expression analysis using bioconductor. PLoS Comput Biol. 2011, 7 (12): e1002276-10.1371/journal.pcbi.1002276.PubMedPubMed CentralView ArticleGoogle Scholar
- Kim RS, Lin J: Multi-level mixed effects models for bead arrays. Bioinformatics. 2011, 27 (5): 633-640. 10.1093/bioinformatics/btq708.PubMedPubMed CentralView ArticleGoogle Scholar
- Du P, Kibbe WA, Lin SM: Lumi: a pipeline for processing illumina microarray. Bioinformatics. 2008, 24 (13): 1547-1548. 10.1093/bioinformatics/btn224.PubMedView ArticleGoogle Scholar
- Myers RH: Classical and modern regression with applications. 1990, Boston: PWS-KENT, 2Google Scholar
- Making the most of microarrays. Nat Biotechnol. 2006, 24 (9): 1039-
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY: The MicroArray quality control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006, 24 (9): 1151-1161. 10.1038/nbt1239.PubMedView ArticleGoogle Scholar
- Tong W, Lucas AB, Shippy R, Fan X, Fang H, Hong H, Orr MS, Chu TM, Guo X, Collins PJ: Evaluation of external RNA controls for the assessment of microarray performance. Nat Biotechnol. 2006, 24 (9): 1132-1139. 10.1038/nbt1237.PubMedView ArticleGoogle Scholar
- Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV, Knight CR, Lee KY: Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol. 2006, 24 (9): 1115-1122. 10.1038/nbt1236.PubMedView ArticleGoogle Scholar
- Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, Su Z, Chu TM, Goodsaid FM, Pusztai L: The MicroArray quality control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol. 2010, 28 (8): 827-838. 10.1038/nbt.1665.PubMedView ArticleGoogle Scholar
- Irizarry RA, Wu Z, Jaffee HA: Comparison of affymetrix GeneChip expression measures. Bioinformatics. 2006, 22 (7): 789-794. 10.1093/bioinformatics/btk046.PubMedView ArticleGoogle Scholar
- Parmigiani G, Garrett-Mayer ES, Anbazhagan R, Gabrielson E: A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clin Cancer Res. 2004, 10 (9): 2922-2927. 10.1158/1078-0432.CCR-03-0490.PubMedView ArticleGoogle Scholar
- Ritchie ME, Diyagama D, Neilson J, van Laar R, Dobrovic A, Holloway A, Smyth GK: Empirical array quality weights in the analysis of microarray data. BMC Bioinforma. 2006, 7: 261-10.1186/1471-2105-7-261.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.