Effect of RNA integrity on uniquely mapped reads in RNA-Seq
© Chen et al.; licensee BioMed Central Ltd. 2014
Received: 20 March 2014
Accepted: 14 October 2014
Published: 23 October 2014
We examined the performance of three RNA-Sequencing library preparation protocols as a function of RNA integrity, comparing gene expressions between heat-degraded samples to their high-quality counterparts. This work is invaluable given the difficulty of obtaining high-quality RNA from tissues, particularly those from individuals with disease phenotypes.
With the integrity of total RNA being a critical parameter for RNA-Sequencing analysis, degraded RNA can heavily influence the results of gene expression profiles. We discovered that gene expression read results are influenced by RNA quality when a common library construction protocol is used. These results are based on one technical experiment from a pool of 4 neural progenitor cell lines.
The use of alternative protocols can allow samples with a wider range of RNA qualities to be used, facilitating the investigation of disease tissues.
Adiconis et al. examined the performance of five RNA-Seq sample preparation protocols when using RNA of low quality and/or quantity. This work is invaluable given the difficulty of obtaining high-quality RNA from tissues, particularly those from individuals with disease phenotypes. We have used a similar approach of evaluating the performance of RNA-Seq library preparation protocols, as a function of RNA integrity. We compared gene expression, as measured by RNA-Seq, of heat-degraded RNA samples to the expression profiles of the high-quality starting samples.
Methods and results
Specifically, 20 ug of high-quality total RNA (RIN 9.4; 2100 Bioanalyzer, Agilent Technologies Inc., Santa Clara, CA, USA) was constructed by pooling RNA extracted using a Direct-zol RNA MiniPrep kit (Zymo Research, Irvine, CA, USA) from neural progenitor cell lines made from 4 individuals . This pool was heat-degraded (60 minutes at 60°C, followed by 6, 20 and 30 mins at 90°C) to RINs of 7.4, 5.3, and 4.5 . RNA-Seq libraries were then made using three different protocols. 1) Poly-A RNA was purified from 1 ug of total RNA using oligo-dT beads, fragmented with divalent cations, made into cDNA and then sequencing libraries using the TruSeq RNA Sample Preparation kit v2 (RS-122-2001, Illumina Inc., San Diego, CA, USA). 2) Ribosomal RNA was removed from 1 ug of total RNA using the Ribo-Zero rRNA Removal kit (MRZH116, Epicentre Biotechnologies, Madison, WI, USA), and processed without the poly-A selection as per #1. 3) cDNA was made from 200 ng of total RNA using the Ovation RNA-Seq FFPE System (7150, NuGEN Technologies Inc., San Carlos, CA, USA), sheared to 300 bp using a Covaris S2 (500003, Covaris Inc., Woburn, MA, USA), and followed by library construction using the TruSeq DNA Sample Preparation kit v2 (FC-121-2001).
R between degraded sample and intact sample for each protocol
R to RIN 9.4 sample
NuGEN Ovation RNA-Seq FFPE System + Illumina TruSeq DNA Sample Preparation
Epicentre RiboZero rRNA Removal Kit + Illumina TruSeq DNA Sample Preparation
Illumina TruSeq RNA Sample Preparation
For confirmation of mapper accuracy, we mapped all of the samples using TopHat v1.4.0  to GENCODE v17. The resulting BAM files were run through HTSeq v0.6.1  to obtain uniquely mapped read counts. Essentially the same results were obtained as with PerM (data not shown). Additionally, to rule out any bias from differences in numbers of reads, we downsampled all of the samples to 4.5 million reads, and the results were essentially the same (data not shown).
It is likely that the poor performance of protocol #1 at lower RINs can be explained by the poly-A selection step. As RNA integrity decreases, less full length poly-A RNA is recovered, leading to a cDNA library that is increasingly 3′ biased. This is supported by analysis of the 5′ to 3′ read distribution of each library. Those from protocols #2 and #3 are essentially unchanged at decreasing RIN, while the distribution for samples from protocol #1 is severely 3′ biased by RIN 4.5 (data not shown).
We recognize that our results are based on a single experiment using an RNA pool from 4 neural progenitor cell lines and are not broadly applicable. Hence, other investigators may want to use this method to determine the effect of RNA integrity on RNA-Seq from their tissue source of interest.
In summary, our data show that the results of RNA-Seq are influenced by RNA quality with a widely-used cDNA/sequencing library construction protocol. However, this problem can be avoided with alternative protocols, allowing samples with a wider range of RNA qualities to be used, facilitating the investigation of disease tissues.
This work was supported in part by research grants from the NIMH (MH090047 and MH086874), and the NHGRI (HG006531).
- Adiconis X, Borges-Rivera D, Satija R, DeLuca DS, Busby MA, Berlin AM, Sivachenko A, Thompson DA, Wysoker A, Fennell T, Gnirke A, Pochet N, Regev A, Levin JZ: Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat Methods. 2013, 10: 623-629. 10.1038/nmeth.2483.PubMedView ArticleGoogle Scholar
- Evgrafov OV, Wrobel BB, Kang X, Simpson G, Malaspina D, Knowles JA: Olfactory neuroepithelium-derived neural progenitor cells as a model system for investigating the molecular mechanisms of neuropsychiatric disorders. Psychiatr Genet. 2011, 21: 217-228. 10.1097/YPG.0b013e328341a2f0.PubMedView ArticleGoogle Scholar
- Opitz L, Salinas-Riester G, Grade M, Jung K, Jo P, Emons G, Ghadimi BM, Beissbarth T, Gaedcke J: Impact of RNA degradation on gene expression profiling. BMC Med Genomics. 2010, 3: 36-10.1186/1755-8794-3-36.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen Y, Souaiaia T, Chen T: PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. Bioinformatics. 2009, 25: 2514-2521. 10.1093/bioinformatics/btp486.PubMedPubMed CentralView ArticleGoogle Scholar
- Trapnell C, Pachter L, Salzberg S: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25: 1105-1111. 10.1093/bioinformatics/btp120.PubMedPubMed CentralView ArticleGoogle Scholar
- Anders S, Pyl PT, Huber W: HTSeq - A Python framework to work with high-throughput sequencing data. BioRxiv preprint. 2014,http://www.ncbi.nlm.nih.gov/pubmed/25260700,Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.