Skip to main content

Transcriptomics data of 11 species of yeast identically grown in rich media and oxidative stress conditions



The objective of this experiment was to identify transcripts in baker’s yeast (Saccharomyces cerevisiae) that could have originated from previously non-coding genomic regions, or de novo. We generated this data to be able to compare the transcriptomes of different species of Ascomycota.

Data description

We generated high-depth RNA sequencing data for 11 species of yeast: Saccharomyces cerevisiae, Saccharomyces paradoxus, Saccharomyces mikatae, Saccharomyces kudriavzevii, Saccharomyces bayanus, Naumovia castelii, Kluyveromyces lactis, Lachancea waltii, Lachancea thermotolerans, Lachancea kluyveri, and Schizosaccharomyces pombe. Using RNA-Seq from yeast grown in rich and oxidative conditions we created genome-guided de novo assemblies of the transcriptomes for each species. We included synthetic spike-in transcripts in each sample to determine the lower limit of detection of the sequencing platform as well as the reliability of our de novo transcriptome assembly pipeline. We subsequently compared the de novo transcripts assemblies to the reference gene annotations and generated assemblies that comprised both annotated and novel transcripts.


Due to pervasive transcription and pervasive translation in these yeast, new transcripts and ORFs can quickly appear in non-genic sequences and become exposed to selection. This process, known as de novo gene birth, can lead to the appearance of new genes with entirely novel functions. Our objective was to identify and characterize putative de novo genes in baker’s yeast to further understand the phenomenon of de novo gene birth. To correctly classify putative de novo genes via the taxonomic conservation of these unique sequences, we need comparable data for a set of closely related species. Due to the similarity of molecular pathways to more complex eukaryotes coupled with their ease of growth in the lab, budding yeasts have proved to be a popular group of organisms for experiments ranging from experimental evolution to genetic engineering. We selected these 11 species based on their sparse taxonomic distribution, their amenability to growth in a custom rich media, the availability of genome assemblies, and their inclusion in previous studies of de novo genes in yeast. We have used novel transcripts assembled from our RNA-Seq data, taken together with the reference annotations, to generate a more complete transcriptome for each of the eleven species surveyed. We have estimated the time that each S. cerevisiae transcript originated in the yeast phylogeny using homology searches and genomic synteny [1]. As organisms modify their expression and translation of genes in response to stress, we sequenced the transcriptomes of all 11 species of yeast in both rich media and oxidative stress conditions to capture potential transcriptome variability.

The availability of complete gene annotations is key for genome-wide studies. The transcript assemblies provided contain hundreds of transcripts that were not present in the available annotations, and thus provide a more complete view of the gene content of each organism than previous annotations. These transcriptomes can be used as a basis to discover new encoded proteins, to study the evolution of yeast gene families and to investigate the changes in gene expression across different Saccharomycotina species. The addition of the ERCC Spike-into all samples also allows for the benchmarking of different de novo transcriptome assembly protocols.

Data description

We grew 11 species of yeast in two conditions:

  1. 1.

    Rich medium The yeast were grown in 20 mL of a custom rich medium [2], which was shown to accommodate various species of yeast, in 50 mL Erlenmeyer flasks at 30 °C. Cells were harvested in log growth phase at an OD600 of approximately 0.25.

  2. 2.

    Oxidative stress The same isogenic populations of yeast were grown in parallel, identical to the first condition. However, 30 min prior to harvesting the cells, hydrogen peroxide was added to a final concentration of 1.5 mM; we used a time period of 30 min to maximize the cellular response to stress [3], and a concentration of 1.5 mM H2O2 as we observed the yeast to grow approximately twice as slowly at this concentration.

After extraction, purification, and polyA selection of the RNA, synthetic spike-in transcripts from the ERCC RNA Spike-in kit [4] were added to each sample in order to assess the performance and limitations of our pipeline. After library preparation, the libraries were pooled into two batches (normal/stress) and sequenced in one lane on the Illumina HiSeq 2500 (paired-end, stranded, 50 bp long). This generated > 20 million high-quality strand-specific read pairs per sample (Table 1).

Table 1 Overview of data files

After taking some quality control measures with our raw RNA-Seq data, we mapped the reads to their respective genomes (Table 1) and assembled de novo transcriptomes using the program Trinity version 2.1.0 [5]. We created a non-redundant set of features from the reference annotations combined with our de novo assembled transcripts; de novo assembled transcripts which correspond to annotated features according to Cuffmerge version 2.2.0 [6] were discarded, while those that did not were considered to be novel; we identified an average of 700 novel transcripts per species [1] (Table 1). The majority of these novel transcripts were found to be expressed in both conditions, but dozens of transcripts were only expressed in one condition or the other. Using the ERCC RNA Spike-in [4], we calculated that the lower limit of detection for annotated features in our pipeline was 2 TPM, and the lower limit of expression necessary to reliably assemble novel transcripts was 15 TPM; over half of the unannotated transcripts that we assembled were expressed above this conservative threshold of 15 TPM in at least one of the two conditions.


A limitation of this dataset is that there are not multiple replicates for each species/condition, except for L. waltii, which has two replicates in each condition. We also would like to acknowledge that the concentration of hydrogen peroxide we used to induce an oxidative stress response (1.5 mM) was higher than the concentration used in other studies of oxidative stress response in yeast (0.1–1 mM).



RNA sequencing


transcripts per million, a normalized measure of mRNA abundance


External RNA Control Consortium


millimolar, a measure of concentration

H2O2 :

hydrogen peroxide


  1. Blevins WR, Ruiz-Orera J, Messeguer X, Blasco-Moreno B, Villanueva-Canas JL, Espinar L, et al. Frequent birth of de novo genes in the compact yeast genome. bioRxiv. 2019.

    Article  Google Scholar 

  2. Tsankov AM, Thompson DA, Socha A, Regev A, Rando OJ. The role of nucleosome positioning in the evolution of gene regulation. PLoS Biol. 2010.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000;11(12):4241–57.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Rna E, Consortium C. Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics. 2005;6:150.

    Article  CAS  Google Scholar 

  5. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7(3):562–78.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Transcriptomic comparison of 11 species of yeast in rich media and oxidative stress conditions. Sequence Read Archive. 2019. Accessed 16 Mar 2019.

  8. Blevins, W. Combination of novel transcripts from de novo assembly and genes from reference annotations for 11 species. figshare. 2019.

Download references

Authors’ contributions

MMA and LBC designed and funded the experiments. WRB carried out the experiments and wrote the manuscript. All authors read and approved the manuscript.


We thank Dr. Ksenia Pugach and the Verstreppen lab for cultures of several species of yeast, and the Sequencing Facilities at the Center for Regulatory Genomics (CRG) for the library preparation and sequencing. We also thank Lorena Espinar, Bernat Blasco-Moreno, and Leire de Campos-Mata for their help in preparing the samples.

Competing interests

The authors declare that they have no competing interests.

Availability of data materials

The data described in this Data Note can be freely and openly accessed on the Sequence Read Archive (SRA) with Project ID SRP187756 [7] and on Figshare at [8] (Table 1). We have performed numerous additional analyses using this data which are available from the corresponding author on reasonable request; more information can be found in the text and supplementary material of our preprint “Frequent birth of de novo genes in the compact yeast genome” [1].

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.


The work was funded by the following grants: (1) BFU2015-65235-P Ministerio de Economía e Innovación (Spanish Government)-FEDER (EU). (2) BFU2015-68351-P Ministerio de Economía e Innovación (Spanish Government)-FEDER (EU). (3) MDM-2014-0370 “Maria de Maeztu” Programme for Units of Excellence in R&D (Spanish Government). (4) 2017SGR1054 Agència de Gestió d’Ajuts Universitaris i de Recerca Generalitat de Catalunya. (5) 2017SGR01020 Agència de Gestió d’Ajuts Universitaris i de Recerca Generalitat de Catalunya. (6) Predoctoral fellowship (FI, Generalitat de Catalunya) to WRB. Grants 1, 3 and 4 were used to cover lab expenses and to obtain the RNA samples from yeast cultures. Grants 2 and 5 were used for RNA sequencing. Grant 6 covered the salary of WRB. These funding bodies had no role in the design of the study, collection of the data, analysis of the results, or writing of the manuscript.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations


Corresponding author

Correspondence to William R. Blevins.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Blevins, W.R., Carey, L.B. & Albà, M.M. Transcriptomics data of 11 species of yeast identically grown in rich media and oxidative stress conditions. BMC Res Notes 12, 250 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: