- Data Note
- Open access
- Published:
RNAseq based variant dataset in a black poplar association panel
BMC Research Notes volume 16, Article number: 248 (2023)
Abstract
Objective
Black poplar (Populus nigra L.) is a species native to Eurasia with a wide distribution area. It is an ecologically important species from riparian ecosystems, that is used as a parent of interspecific (P. deltoides x P. nigra) cultivated poplar hybrids. Variant detection from transcriptomics sequences of 241 P. nigra individuals, sampled in natural populations from 11 river catchments (in four European countries) is described here. These data provide new valuable resources for population structure analysis, population genomics and genome-wide association studies.
Data description
We generated transcriptomics data from a mixture of young differentiating xylem and cambium tissues of 480 Populus nigra trees sampled in a common garden experiment located at Orléans (France), corresponding to 241 genotypes (2 clonal replicates per genotype, at maximum) by using RNAseq technology. We launched on the resulting sequences an in-silico pipeline that allowed us to obtain 878,957 biallelic polymorphisms without missing data. More than 99% of these positions are annotated and 98.8% are located on the 19 chromosomes of the P. trichocarpa reference genome. The raw RNAseq sequences are available at the NCBI Sequence Read Archive SPR188754 and the variant dataset at the Recherche Data Gouv repository under https://doi.org/10.15454/8DQXK5.
Objective
Of the twenty-nine species in the genus Populus, black poplar (P. nigra L.) is native to Eurasia with a wide distribution area including Europe, as well as the southwest and central Asia, and northwest Africa [1]. It is regarded as a keystone species for riparian ecosystems in ecological and conservation studies [2] and it has an interest as a parental pool in interspecific (P. deltoides x P. nigra) poplar breeding programs as the origin of cultivated hybrids [3].
In this study, we selected 241 genotypes from a P. nigra collection of 587 genotypes previously genotyped with an Illumina 12 K Infinium Bead-Chip array (8000 Single Nucleotide Variants (SNVs); [4]), which in turn belong to a larger collection of 1098 cloned genotypes sampled in natural populations from 11 river catchments [5] in four European countries. These 241 genotypes were previously studied for wood properties in 2 sites (Savigliano-2011 & Orléans-2012; [6]), and their selection was based on the following set of criteria defined following a first analysis of population structure with 8000 SNVs: (i) introgression < 10% of the worldwide-spread fastigiated form P. nigra var italica, (ii) proportion of recruitment to their ancestral population > 50% and (iii) survival in the common garden located at Orléans, France.
We generated transcriptomics data from young differentiating xylem and cambium tissues of these P. nigra selected genotypes by using RNAseq technology. We launched on the resulting sequences an in-silico pipeline [7] that allowed us to obtain 878,957 polymorphisms. We already used this data in Chateigner et al. [8] and in Wade et al. [9] for phenotype prediction. These data provide new valuable resources for a wide variety of genome-based studies, ranging from population structure analysis over distribution ranges, to genomic prediction and genome-wide association studies (GWAS) for traits related to wood properties and growth, for example.
Data description
Young differentiating xylem and cambium tissues were harvested in June 2015 from 480 Populus nigra trees from a common garden located at Orléans (France) (241 genotypes, 2 clonal trees per genotype, Data file 1 [10], Data set 1 [11]). RNA from the xylem and cambium were extracted with RNeasy Plant kit (Qiagen, France) according to manufacturer’s recommendations. Treatment with DNase I (Qiagen, France) was carried out to ensure elimination of genomic DNA. RNA was eluted in RNAse-DNAse free water and quantified with a Nanodrop spectrophotometer. RNA from xylem and cambium of the same plant were pooled in an equimolar extract (250 ng/μL) and sent to the sequencing platform. The sequencing platform POPS (transcriptOmic Platform of Institute of Plant Sciences—Paris-Saclay) prepared the RNAseq libraries from polyA-RNA selection using the TruSeq_Stranded_mRNA_SamplePrep_Guide_15031047_D protocol (Illumina, California, U.S.A.).
Identity of each sample was checked with the following procedure: a first round of variant detection was performed with FreeBayes (v.1.0.0) [12], then IBS (Identity By State) was calculated between samples of the same individual, as well as between the samples of the present study and the 852 individuals that had been previously genotyped with the Illumina 12 K Infinium Bead-Chip array [4] (Data file 2 [13]). After removing sampling errors and correcting identities, 241 unique genotypes were considered corresponding to 461 FASTQ files (Data set 2 [14]).
The sequences for each sample were processed with the pipeline defined in Rogier et al. [7] with small modifications to detect SNVs. All experiment steps (from growth conditions to bioinformatic analyses) are available in the CATdb database (Data set 3 [15]). Briefly, the reads were first trimmed with Trimmomatic (v.0.38) [16] to remove adapter and low-quality sequences. Then, they were aligned to the Populus trichocarpa reference genome v.3.0 [17] using the BWA-MEM algorithm (v.0.7.12) [18]. We followed the GATK Best Practices [19, 20] for RNAseq short variant discovery: we first marked the duplicates with the MarkDuplicates from the Picard tools (v.2.0.1) [21] and then used the SplitNCigarReads, the Indel Realignment and the Base Quality Recalibration tools from GATK (v.3.5) [22]. SNV and short insertions and deletions were genotyped for all the sequenced trees (the same genotypes were pooled together) with 3 variants callers: (i) GATK using the HaplotypeCaller tool in single-sample calling mode followed by joint genotyping of the samples with the GenotypeGVCFs tool; (ii) FreeBayes (v.1.0.0) [12] in a multi-sample mode and (iii) the mpileup tool from SAMtools (v.1.3.1) [23] in a multi-sample mode followed by bcftools (v.1.3.1) [24].
The resulting 3 files (one per caller) were filtered with VCFtools (v.0.1.15) [25] to obtain only the intra-specific (Populus nigra) biallelic SNV with a variant quality score (QUAL) threshold over 30. These filtered files were then combined together with the vcf-isec tool from VCFtools and we only kept the genotype calls that were detected by at least 2 variant callers. Otherwise, the genotype call was set as a missing value for this particular individual. As there remained some missing data (Data file 3 [26]), genotype imputation was performed using the Fimpute program (v.2.2) [27] for the SNV located on the chromosomes/scaffolds that contain at least 2 SNV. Thereby, 9.73% of missing values were imputed. This yielded genotypes at 878,957 biallelic sites for the 241 Populus nigra individuals (Data file 4 [28]) without missing data. 878,893 of them have an annotation (Data file 5 [29], Data file 6 [30]) from ANNOVAR (v.2017Jul16) [31] (Data file 7 [32]). Among them, 868,861 are located on the 19 chromosomes (Data file 7, [32]). A total of 26,909 genes harbored between 1 and 309 SNVs.
Limitations
-
The SNVs are limited to expressed genes: 26,909 vs a maximum of 41,335 P. trichocarpa protein-coding genes. We found a correlation between expression level and SNV density at gene level, which is likely due to an increase in coverage for highly expressed genes (Data file 7 [32]).
-
RNA were extracted from young differentiating xylem and cambium tissues, which might not be representative of transcriptional activity of other tissues of the tree. The found SNVs are therefore specific to the genes expressed in these two tissues. Nevertheless, this makes this SNV dataset clearly appropriate to study the genetics of wood formation.
-
The SNVs are limited to the gene space: 92.55% of the found SNV are exonic, intronic, in the 3’UTR (UnTranslated Region) or in the 5’UTR (Data file 7 [32]).
Availability of data and materials
The raw RNAseq sequences are available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive with the SPR188754 identifier [14]. The 241 associated accessions can be found in the GnpIS Information System under the collection “POPULUS_NIGRA_RNASEQ_PANEL” [11] or with the DOIs in the table of the studied genotypes [10]. The steps of the experiment (from growth conditions to bioinformatic analyses) are described in the CATdb database [15]. The other data described in this Data note can be freely and openly accessed on the Recherche Data Gouv repository under https://doi.org/10.15454/8DQXK5 [33]. Please see Table
1 and references [13, 26, 28,29,30, 32] for details and links to the data.
Abbreviations
- GATK:
-
Genome analysis ToolKit
- GWAS:
-
Genome-wide association studies
- IBS:
-
Identity by state
- mRNA:
-
Messenger RiboNucleic Acid
- NCBI:
-
National center for biotechnology information
- RNA:
-
RiboNucleic acid
- RNAseq:
-
RiboNucleic acid sequencing
- SNV:
-
Single nucleotide variant
- UTR:
-
UnTranslated region
References
Dickmann DI, Kuzovkina J. Poplars and willows of the world, with emphasis on silviculturally important species. In: Isebrands JG, Richardson J, editors. Poplars and willows: trees for society and the environment. Wallingford: CABI; 2014.
Imbert E, Lefèvre F. Dispersal and gene flow of Populus nigra (Salicaceae) along a dynamic river system. J Ecol. 2003;91(3):447–56. https://doi.org/10.1046/j.1365-2745.2003.00772.x.
Stanton BJ, Serapiglia MJ, Smart LB. The domestication and conservation of Populus and Salix genetic resources. In: Isebrands JG, Richardson J, editors. Poplars and willows: trees for society and the environment. Wallingford: CABI; 2014.
Faivre-Rampant P, Zaina G, Jorge V, Giacomello S, Segura V, Scalabrin S, et al. New resources for genetic studies in Populus nigra: genome-wide SNP discovery and development of a 12k infinium array. Mol Ecol Resour. 2016;16(4):1023–36. https://doi.org/10.1111/1755-0998.12513.
Guet J, Fabbrini F, Fichot R, Sabatti M, Bastien C, Brignolas F. Genetic variation for leaf morphology, leaf structure and leaf carbon isotope discrimination in European populations of black poplar (Populus nigra L.). Tree Physiol. 2015;35(8):850–63. https://doi.org/10.1093/treephys/tpv056.
Gebreselassie MN, Ader K, Boizot N, Millier F, Charpentier JP, Alves A, et al. Near-infrared spectroscopy enables the genetic analysis of chemical properties in a large set of wood samples from Populus nigra (L.) natural populations. Ind Crops Prod. 2017. https://doi.org/10.1016/j.indcrop.2017.05.013.
Rogier O, Chateigner A, Amanzougarene S, Lesage-Descauses MC, Balzergue S, Brunaud V, et al. Accuracy of RNAseq based SNP discovery and genotyping in Populus nigra. BMC Genomics. 2018;19(1):909. https://doi.org/10.1186/s12864-018-5239-z.
Chateigner A, Lesage-Descauses MC, Rogier O, Jorge V, Leplé JC, Brunaud V, et al. Gene expression predictions and networks in natural populations supports the omnigenic theory. BMC Genomics. 2020;21(1):416. https://doi.org/10.1186/s12864-020-06809-2.
Wade AR, Duruflé H, Sanchez L, Segura V. eQTLs are key players in the integration of genomic and transcriptomic data for phenotype prediction. BMC Genomics. 2022;23(1):476. https://doi.org/10.1186/s12864-022-08690-7.
Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "Collection_POPULUS_NIGRA_RNASEQ_PANEL.tab", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, V2; 2022; https://doi.org/10.5774/GKXDSQ.
GnpIS: Genetic dans Genomic Information System. GnpIS; Collection: POPULUS_NIGRA_RNASEQ_PANEL. https://urgi.versailles.inrae.fr/faidare/search?gl=POPULUS_NIGRA_RNASEQ_PANEL
Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012. https://doi.org/10.4855/arXiv.1207.3907.
Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_imputated_POPULUS_NIGRA_RNASEQ_PANEL_quality_control.pdf", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, V2; 2022; https://doi.org/10.5774/SSDFV2
NCBI Sequence Read Archive. 2020; https://identifiers.org/ncbi/insdc.sra:SRP188754.
CATdb: a Plant Transcriptome Database; Available from: http://tools.ips2.u-psud.fr/CATdb/ficheexperiment.html?experiment=640;
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014;30(15):2114–20. https://doi.org/10.1093/bioinformatics/btu170.
Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006;313(5793):1596–604. https://doi.org/10.1126/science.1128691.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013. https://doi.org/10.48550/arXiv.1303.3997.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491–8. https://doi.org/10.1038/ng.806.
Van der Auwera GA, O’Connor BD. Genomics in the cloud: using docker, GATK, and WDL in terra. 1st ed. Sebastopol: O’Reilly Media, Inc.; 2020.
Broad Institute. Picard Tools. Broad Institute, GitHub repository; 2018; https://broadinstitute.github.io/picard/.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. https://doi.org/10.1101/gr.107524.110.
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):008. https://doi.org/10.1093/gigascience/giab008.
Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27(21):2987–93. https://doi.org/10.1093/bioinformatics/btr509.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8. https://doi.org/10.1093/bioinformatics/btr330.
Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_raw_POPULUS_NIGRA_RNASEQ_PANEL.vcf", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, V2; 2022; https://doi.org/10.5774/RBR6X0.
Sargolzaei M, Chesnais JP, Schenkel FS. A new approach for efficient genotype imputation using information from relatives. BMC Genomics. 2014;15(1):478. https://doi.org/10.1186/1471-2164-15-478.
Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_imputated_POPULUS_NIGRA_RNASEQ_PANEL.vcf", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, V2; 2022; https://doi.org/10.5774/5IQLI9.
Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_imputated_POPULUS_NIGRA_RNASEQ_PANEL.variant_function", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, V2; 2022; https://doi.org/10.5774/PAEKL7.
Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_imputated_POPULUS_NIGRA_RNASEQ_PANEL.exonic_variant_function", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project), Recherche Data Gouv, 2022; https://doi.org/10.5774/EG9HOE.
Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164. https://doi.org/10.1093/nar/gkq603.
Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. "SNV_imputated_POPULUS_NIGRA_RNASEQ_PANEL_analysis_figures.pdf", Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project). Recherche Data Gouv, 2022; https://doi.org/10.5774/BQQTBR.
Rogier O, Chateigner A, Wade AR, Lesage-Descauses MC, Brunaud V, Caius J, Soubigou-Taconnat L, Duruflé H, Sanchez L, Jorge V, Segura V. Phenotypic, genotypic and transcriptomic data of 241 Populus nigra (from the Sybiopop project). Recherche Data Gouv. 2022. https://doi.org/10.1545/8DQXK5.
Acknowledgements
We thank GBFOR (INRAE, Forest Genetics and Biomass Facility), https://doi.org/10.15454/1.5572308287502317E12 for management of the common garden experiment. We also thank C. Michotey and the INRAE URGI platform for their help and the maintenance of the GnpIS repository.
Funding
This work was done within the SYBIOPOP project (ANR-13-JSV6-0001) funded by the French National Research Agency (ANR). The platform POPS benefits from the support of the LabEx Saclay Plant Sciences-SPS (ANR-10-LABX-0040-SPS).
Author information
Authors and Affiliations
Contributions
VS designed the study; JA, CBa, VBe, GB, NB, CBu, J-PC, ADej, ADel, RF, VJ, VL-P, FL, M-CL-D, IL-J, A-LL, SM, MNG, PP, CR and VS contributed to tissue sampling; M-CL-D, CM, JC and LS-T carried out the experiments and the sequencing; OR, AC, VBr, LS, VJ and VS contributed to the data analysis; OR, VJ and VS wrote the paper with input from all co-authors. All authors have read and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Rogier, O., Chateigner, A., Lesage-Descauses, MC. et al. RNAseq based variant dataset in a black poplar association panel. BMC Res Notes 16, 248 (2023). https://doi.org/10.1186/s13104-023-06521-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13104-023-06521-w