Skip to main content

An automated method for efficient, accurate and reproducible construction of RNA-seq libraries

Abstract

Background

Integration of RNA-seq expression data with knowledge on chromatin accessibility, histone modifications, DNA methylation, and transcription factor binding has been instrumental for the unveiling of cell-specific local and long-range regulatory patterns, facilitating further investigation on the underlying rules of transcription regulation at an individual and allele-specific level. However, full genome transcriptome characterization has been partially limited by the complexity and increased time-requirements of available RNA-seq library construction protocols.

Findings

Use of the SX-8G IP-StarĀ® Compact System significantly reduces the hands-on time for RNA-seq library synthesis, adenylation, and adaptor ligation providing with high quality RNA-seq libraries tailored for Illumina high-throughput next-generation sequencing. Generated data exhibits high technical reproducibility compared to data from RNA-seq libraries synthesized manually for the same samples. Obtained results are consistent regardless the researcher, day of the experiment, and experimental run.

Conclusions

Overall, the SX-8G IP-StarĀ® Compact System proves an efficient, fast and reliable tool for the construction of next-generation RNA-seq libraries especially for trancriptome-based annotation of larger genomes.

Findings

Background

Deciphering the underlying determinants of transcriptional regulation in relation to cell differentiation, functional diversification, environmental signaling, and disease development remains a central question in biology today. Integration of expression data with knowledge on chromatin accessibility, histone modifications, DNA methylation, and transcription factor binding, has been instrumental for the unveiling of cell-specific local and long-range regulatory patterns, facilitating further investigation on the underlying rules of transcription regulation at an individual and allele-specific level. Current interest by large collaborative projects, such as the ENCODE [1], the NIH Roadmap Epigenomics Mapping Consortium [2,3], and the C. elegans and D. melanogaster modENCODE [4], has been placed on generating genome-wide gene expression maps to locate gene expression changes that accompany important developmental and disease development processes. The pairing of traditional expression assays with high-throughput sequencing (RNA-seq) has allowed the generation of genome-wide gene expression data with unparalleled specificity, throughput, and sensitivity delivering a detailed representation of the transcriptome.

However, full genome transcriptional gene characterization has been partially limited by the complexity and increased time-requirements of available RNA-seq library construction protocols. Here we report the successful application of the SX-8G IP-StarĀ® Compact System (Diagenode) for the easy, rapid, and reproducible RNA-seq library construction of five Mus musculus (mouse) samples. Use of the SX-8G IP-StarĀ® Compact System significantly reduced the hands-on time for RNA-seq library synthesis, adenylation, and adaptor ligation providing with high quality RNA-seq libraries tailored for Illumina high-throughput next-generation sequencing. Generated data exhibited high technical reproducibility compared to data from RNA-seq libraries synthesized manually for the same samples. Obtained results are consistent regardless the researcher, day of the experiment, and experimental run. Overall, the SX-8G IP-StarĀ® Compact System proves an efficient and reliable tool for the construction of next-generation RNA-seq libraries especially for trancriptome-based annotation of larger genomes.

Methods

A schematic step-wise representation of the two tested protocols is presented in FigureĀ 1. Specifically, we tested application of the SX-8G IP-StarĀ® Compact System for the construction of RNA-seq libraries of five mouse (Mm_1-5_Auto) samples in comparison to a manual protocol routinely used in our laboratory. The two protocols were compared using the same thermocycling machines and reagents. Total RNA integrity value following isolation was measured using the Agilent Technologies 2100 Bioanalyzer and was equal to eight for all tested samples. For the manual protocol mRNA preparation, library construction, and purification were done according to the TruSeqā„¢ RNA Sample Preparation v2 low sample (LS) protocol (Illumina). Briefly, mRNA was extracted from 0.2Ā Ī¼g of total RNA for each sample using 5Ā min incubation with 50Ā Ī¼l of RNA Purification Beads (TruSeqā„¢ RNA Sample Preparation Kit v2; Illumina) at 65Ā°C, followed by 5Ā min incubation at room temperature. Following washing and elution of the mRNA denaturation reaction, mRNA was fragmented using 8Ā min incubation with 19.5Ā Ī¼l of the Elute, Prime, Fragment Mix (TruSeqā„¢ RNA Sample Preparation Kit v2) at 94Ā°C. First Strand Synthesis was performed using thermocycling with 8Ā Ī¼l of First Strand Master Mix (TruSeqā„¢ RNA Sample Preparation Kit v2) and SuperScript II Reverse Transcriptase (Invitrogen) at 25Ā°C for 10Ā min, 42Ā°C for 50Ā min and 70Ā°C for 15Ā min. For second strand synthesis samples were incubated with 25Ā Ī¼l of Second Strand Master Mix (TruSeqā„¢ RNA Sample Preparation Kit v2) at 16Ā°C for 1Ā hour. Reactions were cleaned up with Agencourt AMPure XP beads (Beckman Coulter Genomics). Libraries were end-repaired, adenylated at the 3ā€™ end, ligated with adapters and amplified according to the TruSeqā„¢ RNA Sample Preparation v2 LS protocol. Constructed RNA-seq libraries were purified with Agencourt AMPure XP beads and quantified using the Quant-iTā„¢ PicoGreenĀ® ds DNA Assay Kit (Invitrogen) and the KAPA Library Quantification Kit (KAPABIOSYSTEMS) using qPCR. Library quality control was performed with the Agilent Technologies 2100 Bioanalyzer. Libraries were normalized and pooled using the TruSeqā„¢ Cluster Kit v3 (Illumina) based on the qPCR values. Pooled samples were sequenced using the HiSeq 2500 v3 sequencer (Illumina). For the automated protocol the assay was performed as above except that the most time-consuming stage of library preparation, synthesis, and adaptor ligation was performed using the SX-8G IP-StarĀ® Compact System. The only required actions for this purpose were to select the appropriate Diagenode Library Preparation protocol (Illumina_TruSeq_DNA_SamplePrep_v2) for the corresponding sample number and to set up the necessary reagents and consumables following the robotā€™s user-friendly and simple interface.

Figure 1
figure 1

A schematic representation of the sample preparation workflow. The processes of the TruSeqā„¢ RNA Sample Preparation v2 low sample (LS) protocol (Illumina) performed manually and adopted for automated use with the SX-8G IP-StarĀ® Compact System are illustrated. The automated protocol minimizes the hands-on time required for the error-prone manual steps of RNA-seq library synthesis, adenylation, and adaptor ligation including all related clean up steps and allows experimental multitasking for the researcher in task.

RNA-seq data generated using the manual and automated protocols were aligned against the Mus musculus GRCm38/mm10 genome using TopHat 2.0.7 [5]. Following extraction of known transcripts, based on the most parsimonious trancriptome assembly, Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values for each sample processed with the automated (Mm_1-5_Auto) and manual protocol (Mm_1-5_Man) were generated using the open-source software package Cufflinks 2.1.1 [6,7] to estimate relative transcript abundance. Transcripts from unexpressed genes with FPKM values equal to or less than 0.01 were excluded from subsequent analysis. Heat map plots and correlation coefficient values (r2, linear regression model) based on FPKM values of each sample and corresponding technical replicate were generated using the statistical language R. Data visualization, density distribution of FPKM values and cluster analysis were performed using the CummeRbund 2.7.1 R package (http://compbio.mit.edu/cummeRbund/).

Results

Application of the SX-8G IP-StarĀ® Compact System for the RNA-seq library construction of five mouse samples, significantly reduced the amount of hands-on time required for the most time-demanding stages of library synthesis, adenylation, and adaptor ligation including all related clean up steps. Specifically, manual library construction with the protocol routinely used in our laboratory typically takes an average of four hours of hands-on time whereas Diagenode automated library construction with the same reagents and samples required only 30Ā minutes. This corresponds to a 8-fold decrease in the amount of time the researcher has to be directly involved with the procedure, offering substantial flexibility for experimental multitasking.

Notably, generated data with the automated protocol exhibited high technical reproducibility compared to data from RNA-seq libraries synthesized manually for the same samples regardless operator and experimental run. Specifically, density distributions of FPKM values demonstrated high data concordance among samples and technical replicates (FigureĀ 2). Correlation coefficient values r2 obtained using the linear regression model in R for the five mouse samples and corresponding technical replicates ranged from 0.97-0.98, confirming that the SX-8G IP-StarĀ® Compact System can be reliably used for the efficient and accurate construction of RNA-seq libraries (FigureĀ 3). Cluster analysis illustrated tight clustering between samples and technical replicates, further supporting high technical reproducibility between the two tested protocols (FigureĀ 4).

Figure 2
figure 2

Comparison of distributions of FPKM values. Density distributions of FPKM values created using the CummeRbund 2.7.1 R package, support high data concordance among samples and corresponding technical replicates. Mm_1-5_Auto and Mm_1-5_Man correspond to mouse samples processed with the automated and manual protocols respectively.

Figure 3
figure 3

Correlation analysis of FPKM values. Heat map plots generated based on FPKM data from samples processed with the SX-8G IP-StarĀ® Compact System (Mm_1-5_Auto) and their corresponding technical replicates (Mm_1-5_Man). Transcripts from unexpressed genes were excluded using a cut-off FPKM value equal to or less than 0.01. Correlation coefficient r2 between each sample and technical replicate was estimated using the linear regression model in R and ranged from 0.97-0.98, confirming the high technical reproducibility between the two tested protocols.

Figure 4
figure 4

Cluster analysis of FPKM values. Analysis exhibits tight clustering of the tested samples (Mm_1-5_Auto) with the corresponding technical replicates (Mm_1-5_Man) confirming high technical reproducibility between the two protocols under study.

Conclusions

Overall, the SX-8G IP-StarĀ® Compact System proves an efficient, reliable and accurate tool for the construction of next-generation RNA-seq libraries, especially for trancriptome-based annotation of larger genomes. We foresee that incorporation of this technology in Next-Generation Sequencing Cores or Genomics Laboratories will prove an indispensable tool for high-throughput RNA-seq library construction, significantly saving on-hands experimentation time, related costs and error-prone manual steps. Added benefits of the automated protocol include ease of operation and generation of consistent data regardless of human variability and experimental run. Adaptation of this technology should support the unveiling of the mechanisms governing differential gene expression and transcription processing genome-wide, leading to a better understanding of genetic and epigenetic regulation and inheritance in a time-efficient manner.

Abbreviations

RNA-Seq:

Ribonucleic acid next-generation sequencing

mRNA:

Messenger ribonucleic acid

qPCR:

Quantitative polymerase chain reaction

FPKM:

Fragments per kilobase of transcript per million mapped reads

References

  1. Consortium EP. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306(5696):636ā€“40.

    ArticleĀ  Google ScholarĀ 

  2. Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol. 2010;28(10):1045ā€“8.

    ArticleĀ  PubMed CentralĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  3. Chadwick LRNA-SEQ. The NIH roadmap epigenomics program data resource. Epigenomics. 2012;4(3):317ā€“24.

    ArticleĀ  PubMed CentralĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  4. Celniker SE, Dillon LA, Gerstein MB, Gunsalus KC, Henikoff S, Karpen GH, et al. Unlocking the secrets of the genome. Nature. 2009;459(7249):927ā€“30.

    ArticleĀ  PubMed CentralĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  5. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36.

    ArticleĀ  PubMed CentralĀ  PubMedĀ  Google ScholarĀ 

  6. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28(5):511ā€“5.

    ArticleĀ  PubMed CentralĀ  CASĀ  PubMedĀ  Google ScholarĀ 

  7. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7(3):562ā€“78.

    ArticleĀ  PubMed CentralĀ  CASĀ  PubMedĀ  Google ScholarĀ 

Download references

Acknowledgements

This research was supported by NY State Department of Health (C026714 to MJB). Sequencing and bioinformatics was performed at the UB Genomics & Bioinformatics Core.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Joseph Buck.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authorsā€™ contributions

MT participated in the conception of the study, its design and coordination, and wrote the manuscript. SV participated in the conception and design of the study and the execution of the RNA-seq experiments. JB and BM performed the data analysis. NN and MJB supervised all aspects of the project. All authors read and approved the final manuscript.

Rights and permissions

Open Access Ā This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the articleā€™s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleā€™s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsompana, M., Valiyaparambil, S., Bard, J. et al. An automated method for efficient, accurate and reproducible construction of RNA-seq libraries. BMC Res Notes 8, 124 (2015). https://doi.org/10.1186/s13104-015-1089-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13104-015-1089-9

Keywords