RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis
© Chow et al.; licensee BioMed Central Ltd. 2014
Received: 11 September 2013
Accepted: 17 January 2014
Published: 1 February 2014
One of the concerns of assembling de novo transcriptomes is determining the amount of read sequences required to ensure a comprehensive coverage of genes expressed in a particular sample. In this report, we describe the use of Illumina paired-end RNA-Seq (PE RNA-Seq) reads from Hevea brasiliensis (rubber tree) bark to devise a transcript mapping approach for the estimation of the read amount needed for deep transcriptome coverage.
We optimized the assembly of a Hevea bark transcriptome based on 16 Gb Illumina PE RNA-Seq reads using the Oases assembler across a range of k-mer sizes. We then assessed assembly quality based on transcript N50 length and transcript mapping statistics in relation to (a) known Hevea cDNAs with complete open reading frames, (b) a set of core eukaryotic genes and (c) Hevea genome scaffolds. This was followed by a systematic transcript mapping process where sub-assemblies from a series of incremental amounts of bark transcripts were aligned to transcripts from the entire bark transcriptome assembly. The exercise served to relate read amounts to the degree of transcript mapping level, the latter being an indicator of the coverage of gene transcripts expressed in the sample. As read amounts or datasize increased toward 16 Gb, the number of transcripts mapped to the entire bark assembly approached saturation. A colour matrix was subsequently generated to illustrate sequencing depth requirement in relation to the degree of coverage of total sample transcripts.
We devised a procedure, the “transcript mapping saturation test”, to estimate the amount of RNA-Seq reads needed for deep coverage of transcriptomes. For Hevea de novo assembly, we propose generating between 5–8 Gb reads, whereby around 90% transcript coverage could be achieved with optimized k-mers and transcript N50 length. The principle behind this methodology may also be applied to other non-model plants, or with reads from other second generation sequencing platforms.
KeywordsTranscriptome RNA-Seq Sequencing Hevea brasiliensis Rubber tree de novo assembly Gene transcript
Transcriptome analysis has become increasingly powerful through advances in second generation sequencing technologies from companies such as Illumina, Roche and Life Technologies. Improvements in sequencing chemistry and read length have enabled unprecedented depth of sequencing, limited only by cost and availability of biological material. In particular, RNA sequencing (RNA-Seq), also known as whole transcriptome shotgun sequencing, has emerged as a valuable tool for profiling expressed genes in plants and other organisms [1–3]. The depth of transcriptome sequencing provided by RNA-Seq has thus provided a cost-effective means of qualitative and quantitative analyses of gene transcripts in many non-model plant species including the rubber tree, Hevea brasiliensis, the subject of the present study.
Parallel with progress in sequencing technologies, numerous softwares have been developed to assemble de novo transcriptomes. Among the most commonly used sofwares are Velvet , Oases , SOAPdenovo , ABySS , Trinity , MIRA , Newbler (Roche) and CLC (CLC bio). In the absence of a reference genome, the assemble-then-align approach is used in place of the align-then-assemble approach [10–12]. Additional procedures are often integrated in order to improve the quality of the de novo transcriptome assembly. This includes weighing the relative merits of more than one assembler [13–21], optimizing transcript numbers and lengths across different k-mers and other assembly parameters [16, 17, 22–26], hybrid assembly of data from different sequencing platforms [21, 27–30] and alignment of transcripts to sequences from the same or related species [18, 30–32].
Challenges in de novo transcriptome assembly in higher plants lie in the immense number of gene transcripts, large variations in transcript expression levels, presence of alternatively spliced transcript variants and issues in strand directionality [11, 12]. Owing to such problems, de novo assembly requires significantly greater sequencing depth as compared with reference-based assembly. Especially in the case of non-model plants, few guidelines are available for determining the amount of reads to generate to enable deep coverage of transcripts expressed in a particular sample. The general practice commonly adopted, especially for new entrants in second generation sequencing, is to piggy-back on ballpark estimates adopted for the model species. From a survey of recent publications on de novo transcriptome analysis in non-model plants, the read generation per sample could fall below 100 Mb or it could be high as 7 Gb, with 2–5 Gb being the most common sequencing depths [13, 18, 19, 21, 23, 26–49]. Therefore, there is need for a practical procedure to estimate the reads needed for deep coverage of gene transcripts in de novo assembly where such information is unavailable.
Publications containing applications of second generation sequencing in rubber tree transcriptome analysis
Transcriptome type (length and number of reads)
Xia et al. 
Latex and leaf combined; clone RY7-33-97 (12 mil. reads or 1 Gb approx.)
Pootakham et al. 
454 pyrosequencing (Roche)
Information not available
Triwitayakorn et al. 
454 pyrosequencing (Roche)
Shoot apical meristem; clone RRIM 600 (2 mil. reads or 676.5 Mb approx.)
Chow et al. 
Latex; clone RRIM 600 (10 mil. reads or 350 Mb approx.)
Li et al. 
Bark; clone RY7-33-97 (30 mil. reads or 3 Gb approx.)
Duan et al. 
454 pyrosequencing (Roche)
Leaf, bark, latex, root, embryogenic tissues; clone PB 260(0.5 mil. reads or 200 Mb approx. per tissue)
Rahman et al. 
Leaf; clone RRIM 600 (4.89 Gb);
454 pyrosequencing (Roche)
Leaf; clone RRIM 600 (1,085 Mb)
Generation of Illumina PE RNA- Seq Hevea tissue libraries and de novo assembly
Quality processing of reads from three Hevea tissue libraries
Read number (forward + reverse)
Read size (forward + reverse)
Orphan reads (single end)
Read number (forward + reverse)
Read size (forward + reverse)
Orphan reads (single end)
Read number (forward + reverse)
Read size (forward + reverse)
Orphan reads (single end)
Statistics of incremental bark assemblies across k-mers
Validation of the bark transcriptome
Mapping of 255 Hevea ORF sequences to transcripts from the optimized 16 Gb bark assembly
Total queries (Hevea complete ORFs)
Queries with hits to bark transcripts
Queries with bark transcript hits where ORF coverage ≥ 70%
Secondly, completeness of gene representation in the 16 Gb transcriptome was assessed by mapping 87,612 bark transcripts to a set of 248 core eukaryotic genes (CEGs) that had been shown to be a reliable indicator of completeness of gene space in eukaryotic species . Although initially used to assess gene space in newly sequenced genomes, the approach was recently applied to transcriptomes and it also complements other transcript metrics such as N50 length. Using BlastX, 87,612 bark transcripts detected 247 out of 248 CEG proteins (98%) (e-value ≤ 1e-10). Thus, these results support the completeness or depth of bark gene representation in this transcriptome.
As a whole, results of the three mapping analyses carried out provided sufficient validation for the quality of 87,612 bark transcripts assembled from the 16 Gb read set. Fragmented or erroneous transcripts could still be present to some extent in any assembly but we think that the proportion of bark transcripts which did not show meaningful mapping or alignment with Hevea transcripts or genome scaffolds could also be explained by reasons such as inherent variations between sequences derived from different tree clonal varieties.
Mapping saturation test for bark transcript accumulation
However, although the colour matrix indicated generally high transcript coverage by assemblies of 3–13 Gb reads, it is important to select a datasize that would also produce the optimal transcript N50 length at the desired transcript coverage level. As determined previously, the optimized N50 increased with datasize (Table 3). In the optimized 3, 5, 8, 10 and 13 Gb assemblies, this corresponded with 87.21, 89.48, 91.46, 92.60 and 92.12% representation of the 16 Gb bark transcripts respectively (Additional file 2: Table S2). Therefore, based on the colour matrix, the optimized 3–5 Gb assemblies would fall within the 85-90% transcript coverage bracket and the optimized 8–13 Gb assemblies within the 90-95% coverage bracket (Figure 4 and Additional file 2: Table S2). This also indicated that based on the mapping saturation test in this study, a shift in bark transcript coverage bracket by the incremental assemblies occurred between 5–8 Gb.
In essence, the amount of reads for optimal coverage of a tissue transcriptome should take into consideration the requirements for transcript coverage level (reflected by percentage of mapping saturation) and for N50 length. In order to attain both high transcript representation and best N50 length, our general recommendation for Hevea is to generate between 5–8 Gb reads for de novo assembly. Firstly, as observed in the shift in transcript coverage bracket, a minimum of nearly 90% (i.e. 89.48%) transcript coverage could already be achieved by an optimized 5 Gb assembly (Figure 4 and Additional file 2: Table S2), a level that is within range of the following coverage bracket (90-95%). Secondly, it is noted from Table 3 that the improvement in optimized N50 length was more rapid from 3–8 Gb than from 8–16 Gb assemblies. Therefore, generating less than 5 Gb reads may lead to reduction in complete transcripts, and sequencing beyond 8 Gb reads may not yield significantly more new or complete transcripts other than the rarely expressed ones.
Analysis of latex and leaf transcriptome assembly
Statistics of 1, 3 and 5 Gb assemblies of latex and leaf reads
Latex 1 Gb
Latex 3 Gb
Latex 5 Gb
Leaf 1 Gb
Leaf 3 Gb
Leaf 5 Gb
Discussion and conclusions
Knowing whether transcriptome sequencing and assembly have substantially captured all the genes expressed in a sample is an important consideration for plants having limited genomic resources as reference. Generally, the amount of reads for comprehensive coverage of a de novo transcriptome is often determined by a balance of budget, capacity of sequencing platform and guesstimates or “best practices” based on other species. In this work, we report a systematic approach which we name the “transcript mapping saturation test” to assess the amount of reads required for optimal transcriptome coverage in the Hevea rubber tree. This was made possible by the availability of 16 Gb Illumina PE RNA-Seq reads from Hevea bark which enabled us to map transcripts from incremental sub-assemblies to transcripts from the entire assembly (or the full transcriptome) in order to detect the mapping saturation point. The workflow of this methodology is outlined in Figure 3, beginning with assembly optimization and validation of the full transcriptome, followed by the mapping saturation test.
Because sequencing has become increasingly affordable, obtaining as much as 16 Gb reads per sample as a starting point is not insurmountable. Using our approach, sequencing to this extent has to be done only once in the beginning, after which the user is equipped with a guide (the colour matrix) to estimate optimal coverage of expressed genes in the plant species of interest. In developing this approach, we used the Oases assembler because Velvet, for which Oases is an extension, had previously been found to be suitable for producing quality transcripts from Hevea short reads . Thus, we progressed to Oases, which additionally has the ability to resolve alternatively spliced transcripts . We would suggest that a de novo project intending to adopt our approach should first test if the assembler of choice is suited to their transcriptome. Even though our method development is based on Illumina PE RNA-Seq reads, the principle behind this approach should also be applicable to other plant species and to reads from other sequencing platforms.
Although the transcript mapping approach is based on mapping saturation, this does not reduce the need to validate the assembly quality of the full transcriptome. In this work, the N50 trend was used in the initial selection of best k-mer for assembly. Subsequently, the completeness and correctness of assembled transcripts were supported by results of mapping to rubber genome scaffolds  whereby a significant proportion showed transcript-to-scaffold coverage of 90% and above. This was also supported by detection of all but one of 248 core genes expressed in eukaryotes  and significant alignments with known Hevea protein coding frames. However, we should point out that what this paper proposes is essentially a methodology; we do not specifically assert that a 5–8 Gb read depth would be sufficient for optimal transcriptome coverage universally. The optimal read depth may differ in other species depending on factors such as genome size and transcript complexity.
Materials and methods
Plant material and RNA isolation
Latex and bark shavings were obtained from 15-year old RRIM 928 Hevea brasiliensis trees growing in the Rubber Research Institute of Malaysia Research Station, Sungai Buloh. Equal volumes of latex were tapped from three trees and collected directly into 2× RNA extraction buffer . The bark just below the tapping cut of the trees was scraped to remove surface matter before bark shavings (approximately 1 cm depth) were excised with a tapping knife. Young leaves of RRIM 928 trees were collected from the source bush nursery in the Rubber Research Institute of Malaysia Research Station, Sungai Buloh.
Total RNA was isolated from latex and leaf tissues using the phenol-chloroform method . Bark total RNA was isolated using a modified procedure of the Qiagen RNeasy Plant Mini Kit . RNA samples were assessed for quality and quantity using the Nanodrop spectrophotometer (Thermo Scientific).
Sequence generation and quality assessment
Bark, latex and leaf total RNAs (20 μg each) were sent to the Illumina Fast Track sequencing service in San Diego, USA where 200 bp fragment size libraries were produced for paired-end RNA sequencing (PE RNA-Seq). Each RNA-Seq sample was sequenced 100 nucleotides at each end (2 × 100 nt), resulting in about 50 million raw reads each from latex and leaf and nearly 170 million raw reads from the bark (Table 2). Raw reads from bark, latex and leaf are available from the NCBI Sequence Read Archive (accession nos. SRX278513-5).
Clean reads were obtained by trimming raw reads at a minimum phred score of Q = 20, followed by removal of reads below 30 bp and subsequently reads which contained ‘N’ nucleotides. Clean paired reads from the bark (163,316,702 reads; see Table 2) were referred to as the 16 Gb read set and clean paired read sets from the latex and leaves (48,650,932 and 46,062,766 reads respectively; see Table 2) as the 5 Gb read sets. These read sets were used for subsequent transcriptome assembly. Clean paired reads were classified into arbitrary nucleotide size categories to confirm good PE RNA-Seq data quality (Figure 1).
Transcriptome assembly and transcript mapping
Clean paired reads from the bark, latex and leaf (Table 2) were assembled with the Velvet (Version 1.1.05)  and Oases assembler (Version 0.1.22)  using default parameters and selection of a minimum transcript length of 100 bp. A range of hash lengths (k-mers 51–77) was used for assembly of a read set to determine the k-mer which produced the highest transcript N50 length. This best N50 value was termed as the “optimized N50” while the hash length which produced it was the “optimized k-mer”. Note: N50 length is the length of the shortest transcript whereby the sum of transcripts of equal length or longer is at least 50% of the total length of all transcripts.
Incremental quantums of bark reads (1, 3, 5, 8, 10, 13 and 16 Gb) were obtained by partitioning the subsets from the 16 Gb read set. Each subset was random as the 16 Gb read set was already fully randomized. (Similarly, 5 Gb read sets from latex and leaf were also fully randomized). Serial mapping of bark transcripts was performed using BlastN  and the top hit by any query transcript with e-value ≤ 1.0e-5 was counted as a match. The complete methodology for the transcript mapping saturation test is shown in Figure 3.
Bark transcript validation
For evaluation of rubber-specific ORF quality of 87,612 bark transcripts from the 16 Gb assembly (k-mer 73), 255 Hevea sequences which were confirmed to encode complete ORFs were selected from the NCBI GenBank non-redundant database (http://www.ncbi.nlm.nih.gov). These Hevea cDNAs, which were isolated by traditional gene cloning approaches such as cDNA library hybridization and PCR, were generally of high quality as they had mainly been obtained by Sanger sequencing (see Additional file 1: Table S1). Megablast  was used to map the 255 Hevea ORFs to 87,612 transcripts from the 16 Gb bark assembly. Top hits from this analysis (with 86-100% sequence identity match) were screened for high quality matches based on a minimum of 70% coverage of Hevea ORFs (or query coverage) in their alignments to bark transcripts (the subject) (see Table 4 and Additional file 1: Table S1).
For evaluation of completeness of assembled bark transcripts, 248 core eukaryotic genes (CEGs) of Arabidopsis thaliana were downloaded from the CEGMA resource at http://korflab.ucdavis.edu/Datasets/genome_completeness/. This approach was based on a list of 248 highly conserved but low copy number genes that had been shown to be a reliable indicator of completeness of gene space in eukaryotic species . Using BlastX , 87,612 bark transcripts were mapped to the CEGs with any hit of e-value ≤ 1.0e-10 counted as a match.
Rubber genome scaffolds from the BioProject ID: PRJNA80191 (http://www.ncbi.nlm.nih.gov/nuccore/448814761) were used for validating bark transcripts. Bark transcripts were mapped to genome scaffolds by Exonerate (Version 2.2.0)  using default settings with the exception of the following parameters: heuristic mode, est2genome model and alignment score of at least 10 percent of the maximal score for each query. The significance of mapped transcripts was evaluated by calculating query coverage which is expressed as percentage of the transcript sequence (query) that overlaps with the scaffold sequence (subject). This percentage reflects the extent of bark transcript coverage in alignments to genome scaffolds. Scaffold hits were classified according to transcript coverage whereby the higher the percentage, the greater the significance of transcript-to-scaffold alignment (see Figure 2).
Availability and requirements
This work was supported by the Malaysian Rubber Board. The authors thank Siti Zakiah Zailani and V. Mony Rajan for field and laboratory assistance, and Haizarudin Amin Nordin for graphic design.
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10: 57-63. 10.1038/nrg2484.PubMedPubMed CentralView ArticleGoogle Scholar
- Ozsolak F, Milos PM: RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011, 12: 87-98. 10.1038/nrg2934.PubMedPubMed CentralView ArticleGoogle Scholar
- Van Verk MC, Hickman R, Pieterse CMJ, Van Wees SCM: RNA-Seq: revelation of the messengers. Trends Plant Sci. 2013, 18: 175-179. 10.1016/j.tplants.2013.02.001.PubMedView ArticleGoogle Scholar
- Zerbino DR, Birney E: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18: 821-829. 10.1101/gr.074492.107.PubMedPubMed CentralView ArticleGoogle Scholar
- Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: robust de novo RNA-Seq assembly across the dynamic range of expression labels. Bioinformatics. 2012, 28: 1086-1092. 10.1093/bioinformatics/bts094.PubMedPubMed CentralView ArticleGoogle Scholar
- Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2010, 20: 265-272. 10.1101/gr.097261.109.PubMedPubMed CentralView ArticleGoogle Scholar
- Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol İ: ABySS: a parallel assembler for short read sequence data. Genome Res. 2009, 19: 1117-1123. 10.1101/gr.089532.108.PubMedPubMed CentralView ArticleGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29: 644-654. 10.1038/nbt.1883.PubMedPubMed CentralView ArticleGoogle Scholar
- Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WEG, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 2004, 14: 1147-1159. 10.1101/gr.1917404.PubMedPubMed CentralView ArticleGoogle Scholar
- Haas BJ, Zody MC: Advancing RNA-Seq analysis. Nat Biotechnol. 2010, 28: 421-423. 10.1038/nbt0510-421.PubMedView ArticleGoogle Scholar
- Martin JA, Wang Z: Next-generation transcriptome assembly. Nat Rev Genet. 2011, 12: 671-682. 10.1038/nrg3068.PubMedView ArticleGoogle Scholar
- Ward JA, Ponnala L, Weber CA: Strategies for transcriptome analysis in non-model plants. Am J Bot. 2012, 99: 267-276. 10.3732/ajb.1100334.PubMedView ArticleGoogle Scholar
- Barrero RA, Chapman B, Yang Y, Moolhuijzen P, Keeble-Gagnère G, Zhang N, Tang Q, Bellgard MI, Qiu D: De novo assembly of Euphorbia fischeriana root transcriptome identifies prostratin pathway related genes. BMC Genomics. 2011, 12: 600-10.1186/1471-2164-12-600.PubMedPubMed CentralView ArticleGoogle Scholar
- Bräutigam A, Mullick T, Schliesky S, Weber APM: Critical assessment of assembly strategies for non-model species mRNA-Seq data and application of next-generation sequencing to the comparison of C3 and C4 species. J Exp Bot. 2011, 62: 3093-3102. 10.1093/jxb/err029.PubMedView ArticleGoogle Scholar
- Garg R, Patel RK, Tyagi AK, Jain M: De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res. 2011, 18: 53-63. 10.1093/dnares/dsq028.PubMedPubMed CentralView ArticleGoogle Scholar
- Lin Y, Li J, Shen H, Zhang L, Papasian CJ, Deng HW: Comparative studies of de novo assembly tools for next-generation sequencing technologies. Bioinformatics. 2011, 27: 2031-2037. 10.1093/bioinformatics/btr319.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhao QY, Wang Y, Kong YM, Luo D, Li X, Hao P: Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics. 2011, 12: S2-View ArticleGoogle Scholar
- Van Bakel H, Stout JM, Cote AG, Tallon CM, Sharpe AG, Hughes TR, Page JE: The draft genome and transcriptome of Cannabis sativa. Genome Biol. 2011, 12: R102-10.1186/gb-2011-12-10-r102.PubMedPubMed CentralView ArticleGoogle Scholar
- Wong MML, Cannon CH, Wickneswari R: Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing. BMC Genomics. 2011, 12: 342-10.1186/1471-2164-12-342.PubMedPubMed CentralView ArticleGoogle Scholar
- Mundry M, Bornberg-Bauer E, Sammeth M, Feulner PGD: Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach. PLoS One. 2012, 7: e31410-10.1371/journal.pone.0031410.PubMedPubMed CentralView ArticleGoogle Scholar
- O’Rourke JA, Yang SS, Miller SS, Bucciarelli B, Liu J, Rydeen A, Bozsoki Z, Uhde-Stone C, Tu ZJ, Allan D, Gronwald JW, Vance CP: An RNA-Seq transcriptome analysis of orthophosphate-deficient white lupin reveals novel insights into phosphorus acclimation in plants. Plant Physiol. 2013, 161: 705-724. 10.1104/pp.112.209254.PubMedPubMed CentralView ArticleGoogle Scholar
- Collins LJ, Biggs PJ, Voelckel C, Joly S: An approach to transcriptome analysis of non-model organisms using short-read sequences. Genome Inform. 2008, 21: 3-14.PubMedGoogle Scholar
- Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation and marker discovery. BMC Genomics. 2010, 11: 180-10.1186/1471-2164-11-180.PubMedPubMed CentralView ArticleGoogle Scholar
- Surget-Groba Y, Montoya-Burgos JI: Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res. 2010, 20: 1432-1440. 10.1101/gr.103846.109.PubMedPubMed CentralView ArticleGoogle Scholar
- Gruenheit N, Deusch O, Esser C, Becker M, Voelckel C, Lockhart P: Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants. BMC Genomics. 2012, 13: 92-10.1186/1471-2164-13-92.PubMedPubMed CentralView ArticleGoogle Scholar
- Gordo SMC, Pinheiro DG, Moreira ECO, Rodrigues SM, Poltronieri MC, de Lemos OF, Da Silva IT, Ramos RTJ, Silva A, Schneider H, Silva WA, Sampaio I, Darnet S: High-throughput sequencing of black pepper root transcriptome. BMC Plant Biol. 2012, 12: 168-10.1186/1471-2229-12-168.PubMedPubMed CentralView ArticleGoogle Scholar
- Iorizzo M, Senalik DA, Grzebelus D, Bowman M, Cavagnaro PF, Matvienko M, Ashrafi H, Van Deynze A, Simon PW: De novo assembly and characterization of the carrot transcriptome reveals novel genes, new markers, and genetic diversity. BMC Genomics. 2011, 12: 389-10.1186/1471-2164-12-389.PubMedPubMed CentralView ArticleGoogle Scholar
- Su CI, Chao YT, Chang YCA, Chen WC, Chen CY, Lee AY, Hwa KT, Shih MC: De novo assembly of expressed transcripts and global analysis of the Phalaenopsis aphrodite transcriptome. Plant Cell Physiol. 2011, 52: 1501-1514. 10.1093/pcp/pcr097.PubMedView ArticleGoogle Scholar
- Chow K-S, Mat-Isa MN, Bahari A, Ghazali A-K, Alias H, Mohd-Zainuddin Z, Hoh C-C, Wan K-L: Metabolic routes affecting rubber biosynthesis in Hevea brasiliensis latex. J Exp Bot. 2012, 63: 1863-1871. 10.1093/jxb/err363.PubMedPubMed CentralView ArticleGoogle Scholar
- Kudapa H, Bharti AK, Cannon SB, Farmer AD, Mulaosmanovic B, Kramer R, Bohra A, Weeks NT, Crow JA, Tuteja R, Shah T, Dutta S, Gupta DK, Singh A, Gaikwad K, Sharma TK, May GD, Singh NK, Varshney RK: A comprehensive transcriptome assembly of pigeonpea (Cajanus cajan L.) using Sanger and second-generation sequencing platforms. Mol Plant. 2012, 5: 1020-1028. 10.1093/mp/ssr111.PubMedPubMed CentralView ArticleGoogle Scholar
- Hao DC, Ge GB, Xiao PG, Zhang YY, Yang L: The first insight into the tissue specific Taxus transcriptome via Illumina second generation sequencing. PLoS One. 2011, 6: e21220-10.1371/journal.pone.0021220.PubMed CentralView ArticleGoogle Scholar
- Logacheva MD, Kasianov AS, Vinogradov DV, Samigullin TH, Gelfand MS, Makeev VJ, Penin AA: De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum). BMC Genomics. 2011, 12: 30-10.1186/1471-2164-12-30.PubMedPubMed CentralView ArticleGoogle Scholar
- Mizrachi E, Hefer CA, Ranik M, Jourbert F, Myburg AA: De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq. BMC Genomics. 2010, 11: 681-10.1186/1471-2164-11-681.PubMedPubMed CentralView ArticleGoogle Scholar
- Franssen SU, Shrestha RP, Bräutigam A, Bornberg-Bauer E, Weber APM: Comprehensive transcriptome analysis of the highly complex Pisum sativum genome using next generation sequencing. BMC Genomics. 2011, 12: 227-10.1186/1471-2164-12-227.PubMedPubMed CentralView ArticleGoogle Scholar
- Krishnan NM, Pattnaik S, Deepak SA, Hariharan AK, Gaur P, Chaudhary R, Jain P, Vaidyanathan S, Krishna PGB, Panda B: De novo sequencing and assembly of Azadirachta indica fruit transcriptome. Curr Sci. 2011, 101: 1553-1561.Google Scholar
- Natarajan P, Parani M: De novo assembly and transcriptome analysis of five major tissues of Jatropha curcas L. using GS FLX titanium platform of 454 pyrosequencing. BMC Genomics. 2011, 12: 191-10.1186/1471-2164-12-191.PubMedPubMed CentralView ArticleGoogle Scholar
- Sui C, Zhang J, Wei J, Chen S, Li Y, Xu J, Jin Y, Xie C, Gao Z, Chen H, Yang C, Zhang Z, Xu Y: Transcriptome analysis of Bupleurum chinense focusing on genes involved in the biosynthesis of saikosaponins. BMC Genomics. 2011, 12: 539-10.1186/1471-2164-12-539.PubMedPubMed CentralView ArticleGoogle Scholar
- Duan J, Xia C, Zhao G, Jia J, Kong X: Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data. BMC Genomics. 2012, 13: 392-10.1186/1471-2164-13-392.PubMedPubMed CentralView ArticleGoogle Scholar
- Hyun TK, Rim Y, Jang HJ, Kim CH, Park J, Kumar R, Lee S, Kim BC, Bhak J, Nguyen-Quoc B, Kim SW, Lee SY, Kim JY: De novo transcriptome sequencing of Momordica cochinchinensis to identify genes involved in the carotenoid biosynthesis. Plant Mol Biol. 2012, 79: 413-427. 10.1007/s11103-012-9919-9.PubMedView ArticleGoogle Scholar
- Lulin H, Xiao Y, Pei S, Wen T, Shangqin H: The first Illumina-based de novo transcriptome sequencing and analysis of safflower flowers. PLoS One. 2012, 7: e38653-10.1371/journal.pone.0038653.PubMedPubMed CentralView ArticleGoogle Scholar
- Mutasa-Göttgens ES, Joshi A, Holmes HF, Hedden P, Göttgens B: A new RNA Seq-based reference transcriptome for sugar beet and its application in transcriptome-scale analysis of vernalization and gibberellin responses. BMC Genomics. 2012, 13: 99-10.1186/1471-2164-13-99.PubMedPubMed CentralView ArticleGoogle Scholar
- Sloan DB, Keller SR, Berardi AE, Sanderson BJ, Karpovich JF, Taylor DR: De novo transcriptome assembly and polymorphism detection in the flowering plant Silene vulgaris (Caryophyllaceae). Mol Ecol Resour. 2012, 12: 333-343. 10.1111/j.1755-0998.2011.03079.x.PubMedView ArticleGoogle Scholar
- Zhang XM, Zhao L, Larson-Rabin Z, Li DZ, Guo ZH: De novo sequencing and characterization of the floral transcriptome of Dendrocalamus latiflorus (Poaceae: Bambusoideae). PLoS One. 2012, 7: e42082-10.1371/journal.pone.0042082.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhao Z, Tan L, Dang C, Zhang H, Wu Q, An L: Deep-sequencing transcriptome analysis of chilling tolerance mechanisms of a subnival alpine plant, Chorispora bungeana. BMC Plant Biol. 2012, 12: 222-10.1186/1471-2229-12-222.PubMedPubMed CentralView ArticleGoogle Scholar
- Bai S, Saito T, Sakamoto D, Ito A, Fujii H, Moriguchi T: Transcriptome analysis of Japanese pear (Pyrus pyrifolia Nakai.) flower buds transitioning through endodormancy. Plant Cell Physiol. 2013, 54: 1132-1151. 10.1093/pcp/pct067.PubMedView ArticleGoogle Scholar
- Gil-Amado JA, Gomez-Jimenez MC: Transcriptome analysis of mature fruit abscission control in olive. Plant Cell Physiol. 2013, 54: 244-269. 10.1093/pcp/pcs179.PubMedView ArticleGoogle Scholar
- Ramilowski JA, Sawai S, Seki H, Mochida K, Yoshida T, Sakurai T, Muranaka T, Saito K, Daub CO: Glycyrrhiza uralensis transcriptome landscape and study of phytochemicals. Plant Cell Physiol. 2013, 54: 697-710. 10.1093/pcp/pct057.PubMedView ArticleGoogle Scholar
- Van Moerkercke A, Fabris M, Pollier J, Baart GJE, Rombauts S, Hasnain G, Rischer H, Memelink J, Oksman-Caldentey KM, Goossens A: CathaCyc, a metabolic pathway database built from Catharanthus roseus RNA-Seq data. Plant Cell Physiol. 2013, 54: 673-685. 10.1093/pcp/pct039.PubMedView ArticleGoogle Scholar
- Zhang J, Wu K, Zeng S, da Silva JAT, Zhao X, Tian CE, Xia H, Duan J: Transcriptome analysis of Cymbidium sinense and its application to the identification of genes associated with floral development. BMC Genomics. 2013, 14: 279-10.1186/1471-2164-14-279.PubMedPubMed CentralView ArticleGoogle Scholar
- Bennett MD, Leitch I: Nuclear DNA amounts in angiosperms-583 new estimates. Ann Bot. 1997, 80: 169-196. 10.1006/anbo.1997.0415.View ArticleGoogle Scholar
- Han KH, Shin DH, Yang J, Kim IJ, Oh SK, Chow KS: Genes expressed in the latex of Hevea brasiliensis. Tree Physiol. 2000, 20: 503-510. 10.1093/treephys/20.8.503.PubMedView ArticleGoogle Scholar
- Ko J-H, Chow K-S, Han K-H: Transcriptome analysis reveals novel features of the molecular events occurring in the laticifers of Hevea brasiliensis (para rubber tree). Plant Mol Biol. 2003, 53: 479-492.PubMedView ArticleGoogle Scholar
- Chow K-S, Wan K-L, Mat-Isa M-N, Bahari A, Tan S-H, Harikrishna K, Yeang H-Y: Insights into rubber biosynthesis from transcriptome analysis of Hevea brasiliensis latex. J Exp Bot. 2007, 58: 2429-2440. 10.1093/jxb/erm093.PubMedView ArticleGoogle Scholar
- Mat-Isa M-N, Chow K-S, Mohamad A-F-H, Shahrum M-Y, Hoh C-C, Mohd-Amin M-R, Zainal K-A, Yeang H-Y, Wan K-L: NRESTdb: access to the transcriptome of natural rubber latex. J Rubber Res. 2009, 12: 229-238.Google Scholar
- Xia Z, Xu H, Zhai J, Li D, Luo H, He C, Huang X: RNA-Seq analysis and de novo transcriptome assembly of Hevea brasiliensis. Plant Mol Biol. 2011, 77: 299-308. 10.1007/s11103-011-9811-z.PubMedView ArticleGoogle Scholar
- Pootakham W, Chanprasert J, Jomchai N, Sangsrakru D, Yoocha T, Therawattanasuk K, Tangphatsornruang S: Single nucleotide polymorphism marker development in the rubber tree, Hevea brasiliensis (Euphorbiaceae). Am J Bot. 2011, 98: e337-e338. 10.3732/ajb.1100228.PubMedView ArticleGoogle Scholar
- Triwitayakorn K, Chatkulkawin P, Kanjanawattanawong S, Sraphet S, Yoocha T, Sangsrakru D, Chanprasert J, Ngamphiw C, Jomchai N, Therawattanasuk K, Tangphatsornruang S: Transcriptome sequencing of Hevea brasiliensis for development of microsatellite markers and construction of a genetic linkage map. DNA Res. 2011, 18: 471-482. 10.1093/dnares/dsr034.PubMedPubMed CentralView ArticleGoogle Scholar
- Li D, Deng Z, Qin B, Liu X, Men Z: De novo assembly and characterization of bark transcriptome using Illumina sequencing and development of EST-SSR markers in rubber tree (Hevea brasiliensis Muell. Arg.). BMC Genomics. 2012, 13: 192-10.1186/1471-2164-13-192.PubMedPubMed CentralView ArticleGoogle Scholar
- Duan C, Argout X, Gébelin V, Summo M, Dufayard JF, Leclercq J, Hadi K, Piyatrakul P, Pirrello J, Rio M, Champion A, Montoro P: Identification of the Hevea brasiliensis AP2/ERF superfamily by RNA sequencing. BMC Genomics. 2013, 14: 30-10.1186/1471-2164-14-30.PubMedPubMed CentralView ArticleGoogle Scholar
- Rahman AY, Usharraj AO, Misra BB, Thottathil GP, Jayasekaran K, Feng Y, Hou S, Ong SY, Ng FL, Lee LS, Tan HS, Muhd Sakaff MKL, Teh BS, Khoo BF, Badai SS, Ab Aziz N, Yuryev A, Knudsen B, Dionne-Laporte A, Mchunu NP, Yu Q, Langston BJ, Freitas TAK, Young AG, Chen R, Wang L, Najimudin N, Saito JA, Alam M: Draft genome sequence of the rubber tree Hevea brasiliensis. BMC Genomics. 2013, 14: 75-10.1186/1471-2164-14-75.PubMedPubMed CentralView ArticleGoogle Scholar
- Qin Y, Shi F, Tang C: Molecular characterization and expression analysis of cDNAs encoding four Rab and two Arf GTPases in the latex of Hevea brasiliensis. Plant Physiol Biochem. 2011, 49: 729-737. 10.1016/j.plaphy.2011.03.006.PubMedView ArticleGoogle Scholar
- Ruderman S, Kongsawadworakul P, Viboonjun U, Mongkolporn O, Chrestin H: Mitochondrial/Cytosolic Acetyl CoA and rubber biosynthesis genes expression in Hevea brasiliensis latex and rubber yield. Kasetsart J (Nat Sci). 2012, 46: 346-362.Google Scholar
- Piyatrakul P, Putranto RA, Martin F, Rio M, Dessailly F, Leclercq J, Dufayard JF, Lardet L, Montoro P: Some ethylene biosynthesis and AP2/ERF genes reveal a specific pattern of expression during somatic embryogenesis in Hevea brasiliensis. BMC Plant Biol. 2012, 12: 244-10.1186/1471-2229-12-244.PubMedPubMed CentralView ArticleGoogle Scholar
- Putranto RA, Sanier C, Leclercq J, Duan C, Rio M, Jourdan C, Thaler P, Sabau X, Argout X, Montoro P: Differential gene expression in different types of Hevea brasiliensis roots. Plant Sci. 2012, 183: 149-158.PubMedView ArticleGoogle Scholar
- Wang Z, Fang B, Chen J, Zhang X, Luo Z, Huang L, Chen X, Li Y: De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweetpotato (Ipomoea batatas). BMC Genomics. 2010, 11: 726-10.1186/1471-2164-11-726.PubMedPubMed CentralView ArticleGoogle Scholar
- Parra G, Bradnam K, Ning Z, Keane T, Korf I: Assessing the gene space in draft genomes. Nucleic Acids Res. 2009, 37: 289-297. 10.1093/nar/gkn916.PubMedPubMed CentralView ArticleGoogle Scholar
- Kush A, Goyvaerts E, Chye ML, Chua NH: Laticifer-specific gene expression in Hevea brasiliensis (rubber tree). P Natl Acad Sci USA. 1990, 87: 1787-1790. 10.1073/pnas.87.5.1787.View ArticleGoogle Scholar
- MacKenzie DJ, McLean MA, Mukerji S, Green M: Improved RNA extraction from woody plants for the detection of viral pathogens by reverse transcription-polymerase chain reaction. Plant Dis. 1997, 81: 222-226. 10.1094/PDIS.1922.214.171.124.View ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
- Slater GS, Birney E: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005, 6: 31-10.1186/1471-2105-6-31.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.