Skip to main content

RNAseq based variant dataset in a black poplar association panel

Abstract

Objective

Black poplar (Populus nigra L.) is a species native to Eurasia with a wide distribution area. It is an ecologically important species from riparian ecosystems, that is used as a parent of interspecific (P. deltoides x P. nigra) cultivated poplar hybrids. Variant detection from transcriptomics sequences of 241 P. nigra individuals, sampled in natural populations from 11 river catchments (in four European countries) is described here. These data provide new valuable resources for population structure analysis, population genomics and genome-wide association studies.

Data description

We generated transcriptomics data from a mixture of young differentiating xylem and cambium tissues of 480 Populus nigra trees sampled in a common garden experiment located at Orléans (France), corresponding to 241 genotypes (2 clonal replicates per genotype, at maximum) by using RNAseq technology. We launched on the resulting sequences an in-silico pipeline that allowed us to obtain 878,957 biallelic polymorphisms without missing data. More than 99% of these positions are annotated and 98.8% are located on the 19 chromosomes of the P. trichocarpa reference genome. The raw RNAseq sequences are available at the NCBI Sequence Read Archive SPR188754 and the variant dataset at the Recherche Data Gouv repository under https://doi.org/10.15454/8DQXK5.

Peer Review reports

Objective

Of the twenty-nine species in the genus Populus, black poplar (P. nigra L.) is native to Eurasia with a wide distribution area including Europe, as well as the southwest and central Asia, and northwest Africa [1]. It is regarded as a keystone species for riparian ecosystems in ecological and conservation studies [2] and it has an interest as a parental pool in interspecific (P. deltoides x P. nigra) poplar breeding programs as the origin of cultivated hybrids [3].

In this study, we selected 241 genotypes from a P. nigra collection of 587 genotypes previously genotyped with an Illumina 12 K Infinium Bead-Chip array (8000 Single Nucleotide Variants (SNVs); [4]), which in turn belong to a larger collection of 1098 cloned genotypes sampled in natural populations from 11 river catchments [5] in four European countries. These 241 genotypes were previously studied for wood properties in 2 sites (Savigliano-2011 & Orléans-2012; [6]), and their selection was based on the following set of criteria defined following a first analysis of population structure with 8000 SNVs: (i) introgression < 10% of the worldwide-spread fastigiated form P. nigra var italica, (ii) proportion of recruitment to their ancestral population > 50% and (iii) survival in the common garden located at Orléans, France.

We generated transcriptomics data from young differentiating xylem and cambium tissues of these P. nigra selected genotypes by using RNAseq technology. We launched on the resulting sequences an in-silico pipeline [7] that allowed us to obtain 878,957 polymorphisms. We already used this data in Chateigner et al. [8] and in Wade et al. [9] for phenotype prediction. These data provide new valuable resources for a wide variety of genome-based studies, ranging from population structure analysis over distribution ranges, to genomic prediction and genome-wide association studies (GWAS) for traits related to wood properties and growth, for example.

Data description

Young differentiating xylem and cambium tissues were harvested in June 2015 from 480 Populus nigra trees from a common garden located at Orléans (France) (241 genotypes, 2 clonal trees per genotype, Data file 1 [10], Data set 1 [11]). RNA from the xylem and cambium were extracted with RNeasy Plant kit (Qiagen, France) according to manufacturer’s recommendations. Treatment with DNase I (Qiagen, France) was carried out to ensure elimination of genomic DNA. RNA was eluted in RNAse-DNAse free water and quantified with a Nanodrop spectrophotometer. RNA from xylem and cambium of the same plant were pooled in an equimolar extract (250 ng/μL) and sent to the sequencing platform. The sequencing platform POPS (transcriptOmic Platform of Institute of Plant Sciences—Paris-Saclay) prepared the RNAseq libraries from polyA-RNA selection using the TruSeq_Stranded_mRNA_SamplePrep_Guide_15031047_D protocol (Illumina, California, U.S.A.).

Identity of each sample was checked with the following procedure: a first round of variant detection was performed with FreeBayes (v.1.0.0) [12], then IBS (Identity By State) was calculated between samples of the same individual, as well as between the samples of the present study and the 852 individuals that had been previously genotyped with the Illumina 12 K Infinium Bead-Chip array [4] (Data file 2 [13]). After removing sampling errors and correcting identities, 241 unique genotypes were considered corresponding to 461 FASTQ files (Data set 2 [14]).

The sequences for each sample were processed with the pipeline defined in Rogier et al. [7] with small modifications to detect SNVs. All experiment steps (from growth conditions to bioinformatic analyses) are available in the CATdb database (Data set 3 [15]). Briefly, the reads were first trimmed with Trimmomatic (v.0.38) [16] to remove adapter and low-quality sequences. Then, they were aligned to the Populus trichocarpa reference genome v.3.0 [17] using the BWA-MEM algorithm (v.0.7.12) [18]. We followed the GATK Best Practices [19, 20] for RNAseq short variant discovery: we first marked the duplicates with the MarkDuplicates from the Picard tools (v.2.0.1) [21] and then used the SplitNCigarReads, the Indel Realignment and the Base Quality Recalibration tools from GATK (v.3.5) [22]. SNV and short insertions and deletions were genotyped for all the sequenced trees (the same genotypes were pooled together) with 3 variants callers: (i) GATK using the HaplotypeCaller tool in single-sample calling mode followed by joint genotyping of the samples with the GenotypeGVCFs tool; (ii) FreeBayes (v.1.0.0) [12] in a multi-sample mode and (iii) the mpileup tool from SAMtools (v.1.3.1) [23] in a multi-sample mode followed by bcftools (v.1.3.1) [24].

The resulting 3 files (one per caller) were filtered with VCFtools (v.0.1.15) [25] to obtain only the intra-specific (Populus nigra) biallelic SNV with a variant quality score (QUAL) threshold over 30. These filtered files were then combined together with the vcf-isec tool from VCFtools and we only kept the genotype calls that were detected by at least 2 variant callers. Otherwise, the genotype call was set as a missing value for this particular individual. As there remained some missing data (Data file 3 [26]), genotype imputation was performed using the Fimpute program (v.2.2) [27] for the SNV located on the chromosomes/scaffolds that contain at least 2 SNV. Thereby, 9.73% of missing values were imputed. This yielded genotypes at 878,957 biallelic sites for the 241 Populus nigra individuals (Data file 4 [28]) without missing data. 878,893 of them have an annotation (Data file 5 [29], Data file 6 [30]) from ANNOVAR (v.2017Jul16) [31] (Data file 7 [32]). Among them, 868,861 are located on the 19 chromosomes (Data file 7, [32]). A total of 26,909 genes harbored between 1 and 309 SNVs.

Limitations

  • The SNVs are limited to expressed genes: 26,909 vs a maximum of 41,335 P. trichocarpa protein-coding genes. We found a correlation between expression level and SNV density at gene level, which is likely due to an increase in coverage for highly expressed genes (Data file 7 [32]).

  • RNA were extracted from young differentiating xylem and cambium tissues, which might not be representative of transcriptional activity of other tissues of the tree. The found SNVs are therefore specific to the genes expressed in these two tissues. Nevertheless, this makes this SNV dataset clearly appropriate to study the genetics of wood formation.

  • The SNVs are limited to the gene space: 92.55% of the found SNV are exonic, intronic, in the 3’UTR (UnTranslated Region) or in the 5’UTR (Data file 7 [32]).

Availability of data and materials

The raw RNAseq sequences are available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive with the SPR188754 identifier [14]. The 241 associated accessions can be found in the GnpIS Information System under the collection “POPULUS_NIGRA_RNASEQ_PANEL” [11] or with the DOIs in the table of the studied genotypes [10]. The steps of the experiment (from growth conditions to bioinformatic analyses) are described in the CATdb database [15]. The other data described in this Data note can be freely and openly accessed on the Recherche Data Gouv repository under https://doi.org/10.15454/8DQXK5 [33]. Please see Table

Table 1 Overview of data files/data sets

1 and references [13, 26, 28,29,30, 32] for details and links to the data.

Abbreviations

GATK:

Genome analysis ToolKit

GWAS:

Genome-wide association studies

IBS:

Identity by state

mRNA:

Messenger RiboNucleic Acid

NCBI:

National center for biotechnology information

RNA:

RiboNucleic acid

RNAseq:

RiboNucleic acid sequencing

SNV:

Single nucleotide variant

UTR:

UnTranslated region

References

  1. Dickmann DI, Kuzovkina J. Poplars and willows of the world, with emphasis on silviculturally important species. In: Isebrands JG, Richardson J, editors. Poplars and willows: trees for society and the environment. Wallingford: CABI; 2014.

    Google Scholar 

  2. Imbert E, Lefèvre F. Dispersal and gene flow of Populus nigra (Salicaceae) along a dynamic river system. J Ecol. 2003;91(3):447–56. https://doi.org/10.1046/j.1365-2745.2003.00772.x.

    Article  Google Scholar 

  3. Stanton BJ, Serapiglia MJ, Smart LB. The domestication and conservation of Populus and Salix genetic resources. In: Isebrands JG, Richardson J, editors. Poplars and willows: trees for society and the environment. Wallingford: CABI; 2014.

    Google Scholar 

  4. Faivre-Rampant P, Zaina G, Jorge V, Giacomello S, Segura V, Scalabrin S, et al. New resources for genetic studies in Populus nigra: genome-wide SNP discovery and development of a 12k infinium array. Mol Ecol Resour. 2016;16(4):1023–36. https://doi.org/10.1111/1755-0998.12513.

    Article  CAS  PubMed  Google Scholar 

  5. Guet J, Fabbrini F, Fichot R, Sabatti M, Bastien C, Brignolas F. Genetic variation for leaf morphology, leaf structure and leaf carbon isotope discrimination in European populations of black poplar (Populus nigra L.). Tree Physiol. 2015;35(8):850–63. https://doi.org/10.1093/treephys/tpv056.

    Article  CAS  PubMed  Google Scholar 

  6. Gebreselassie MN, Ader K, Boizot N, Millier F, Charpentier JP, Alves A, et al. Near-infrared spectroscopy enables the genetic analysis of chemical properties in a large set of wood samples from Populus nigra (L.) natural populations. Ind Crops Prod. 2017. https://doi.org/10.1016/j.indcrop.2017.05.013.

    Article  Google Scholar 

  7. Rogier O, Chateigner A, Amanzougarene S, Lesage-Descauses MC, Balzergue S, Brunaud V, et al. Accuracy of RNAseq based SNP discovery and genotyping in Populus nigra. BMC Genomics. 2018;19(1):909. https://doi.org/10.1186/s12864-018-5239-z.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Chateigner A, Lesage-Descauses MC, Rogier O, Jorge V, Leplé JC, Brunaud V, et al. Gene expression predictions and networks in natural populations supports the omnigenic theory. BMC Genomics. 2020;21(1):416. https://doi.org/10.1186/s12864-020-06809-2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Wade AR, Duruflé H, Sanchez L, Segura V. eQTLs are key players in the integration of genomic and transcriptomic data for phenotype prediction. BMC Genomics. 2022;23(1):476. https://doi.org/10.1186/s12864-022-08690-7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "Collection_POPULUS_NIGRA_RNASEQ_PANEL.tab", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, V2; 2022; https://doi.org/10.5774/GKXDSQ.

  11. GnpIS: Genetic dans Genomic Information System. GnpIS; Collection: POPULUS_NIGRA_RNASEQ_PANEL. https://urgi.versailles.inrae.fr/faidare/search?gl=POPULUS_NIGRA_RNASEQ_PANEL

  12. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012. https://doi.org/10.4855/arXiv.1207.3907.

    Article  Google Scholar 

  13. Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_imputated_POPULUS_NIGRA_RNASEQ_PANEL_quality_control.pdf", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, V2; 2022; https://doi.org/10.5774/SSDFV2

  14. NCBI Sequence Read Archive. 2020; https://identifiers.org/ncbi/insdc.sra:SRP188754.

  15. CATdb: a Plant Transcriptome Database; Available from: http://tools.ips2.u-psud.fr/CATdb/ficheexperiment.html?experiment=640;

  16. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014;30(15):2114–20. https://doi.org/10.1093/bioinformatics/btu170.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006;313(5793):1596–604. https://doi.org/10.1126/science.1128691.

    Article  CAS  PubMed  Google Scholar 

  18. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013. https://doi.org/10.48550/arXiv.1303.3997.

  19. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491–8. https://doi.org/10.1038/ng.806.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Van der Auwera GA, O’Connor BD. Genomics in the cloud: using docker, GATK, and WDL in terra. 1st ed. Sebastopol: O’Reilly Media, Inc.; 2020.

    Google Scholar 

  21. Broad Institute. Picard Tools. Broad Institute, GitHub repository; 2018; https://broadinstitute.github.io/picard/.

  22. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. https://doi.org/10.1101/gr.107524.110.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):008. https://doi.org/10.1093/gigascience/giab008.

    Article  CAS  Google Scholar 

  24. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27(21):2987–93. https://doi.org/10.1093/bioinformatics/btr509.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8. https://doi.org/10.1093/bioinformatics/btr330.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_raw_POPULUS_NIGRA_RNASEQ_PANEL.vcf", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, V2; 2022; https://doi.org/10.5774/RBR6X0.

  27. Sargolzaei M, Chesnais JP, Schenkel FS. A new approach for efficient genotype imputation using information from relatives. BMC Genomics. 2014;15(1):478. https://doi.org/10.1186/1471-2164-15-478.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_imputated_POPULUS_NIGRA_RNASEQ_PANEL.vcf", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, V2; 2022; https://doi.org/10.5774/5IQLI9.

  29. Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_imputated_POPULUS_NIGRA_RNASEQ_PANEL.variant_function", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, V2; 2022; https://doi.org/10.5774/PAEKL7.

  30. Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_imputated_POPULUS_NIGRA_RNASEQ_PANEL.exonic_variant_function", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, 2022; https://doi.org/10.5774/EG9HOE.

  31. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164. https://doi.org/10.1093/nar/gkq603.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_imputated_POPULUS_NIGRA_RNASEQ_PANEL_analysis_figures.pdf", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project). Recherche Data Gouv, 2022; https://doi.org/10.5774/BQQTBR.

  33. Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project). Recherche Data Gouv. 2022. https://doi.org/10.1545/8DQXK5.

    Article  Google Scholar 

Download references

Acknowledgements

We thank GBFOR (INRAE, Forest Genetics and Biomass Facility), https://doi.org/10.15454/1.5572308287502317E12 for management of the common garden experiment. We also thank C. Michotey and the INRAE URGI platform for their help and the maintenance of the GnpIS repository.

Funding

This work was done within the SYBIOPOP project (ANR-13-JSV6-0001) funded by the French National Research Agency (ANR). The platform POPS benefits from the support of the LabEx Saclay Plant Sciences-SPS (ANR-10-LABX-0040-SPS).

Author information

Authors and Affiliations

Authors

Contributions

VS designed the study; JA, CBa, VBe, GB, NB, CBu, J-PC, ADej, ADel, RF, VJ, VL-P, FL, M-CL-D, IL-J, A-LL, SM, MNG, PP, CR and VS contributed to tissue sampling; M-CL-D, CM, JC and LS-T carried out the experiments and the sequencing; OR, AC, VBr, LS, VJ and VS contributed to the data analysis; OR, VJ and VS wrote the paper with input from all co-authors. All authors have read and approved the manuscript.

Corresponding author

Correspondence to Vincent Segura.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rogier, O., Chateigner, A., Lesage-Descauses, MC. et al. RNAseq based variant dataset in a black poplar association panel. BMC Res Notes 16, 248 (2023). https://doi.org/10.1186/s13104-023-06521-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13104-023-06521-w

Keywords