- Data note
- Open Access
- Published:
Whole genome resequencing data sets of different species from Pistacia genus
BMC Research Notes volume 14, Article number: 290 (2021)
Abstract
Objectives
Pistacia genus belongs to the flowering plants in the cashew family and contains at least 11 species. The whole-genome resequencing data of different species from Pistacia genus are described herein. The data reported here will be useful for better understand the adaptive evolution, demographic history, genetic diversity, population structure, and domestication of pistachio.
Data description
Genomic DNA was isolated from fresh leaves and used to construct libraries with insert size of 350 bp. Sequence libraries were made and sequenced on the Illumina Hiseq 4000 platform to produce 150 bp paired-end reads. A total number of 4,851,118,730 billion reads (ranging from 33,305,900 to 34,990,618 reads per sample) were created across all samples. We produced a total of 727.67 Gbp data which have been deposited in the Genome Sequence Archive (GSA) database with the Accession of CRA000978. All of the data are also available as the sequence read archive (SRA) format in the National Center for Biotechnology Information (NCBI) with identifier of SRP189222, mirroring our deposited data in GSA.
Objective
Pistacia genus belongs to the flowering plants in the Anacardiaceae family. Other plants in the Anacardiaceae or the cashew family include poison oak, mango, poison ivy, sumac, and pepper tree [1]. The Pistacia covers at least eleven species and is estimated to be approximately 80 million years old [2]. Pistachio has a long history of plantation (3000–4000 years) in Iran and is native to the arid zones of Central Asia [3]. The Romans at the beginning of the Christian era introduced this plant into Mediterranean Europe [3] and its cultivation extended westward from its center of origin to Italy, Spain, and other Mediterranean regions of Southern Europe, North Africa, and the Middle East, as well as to China and to the United States and Australia [4, 5]. The worldwide production of pistachios was about 1.4 million tonnes in 2018, with Iran and the United States together accounting for 72% of the total as leading producers [6]. Pistachio plants have a juvenile period of about 5–10 years. The most economically important species is P. vera which is the only cultivated species from the Pistacia genus [7]. The other species of this genus are forest trees and have edible seeds and can be used as rootstock seed sources for cultivated P. vera [1, 8]. Also, plant materials such as leaf, seed, flower, and resins derived from the stem of some species from the Pistacia genus have pharmacological properties such as antioxidant, anti-inflammatory and antimicrobial activities [9,10,11].
This study provides whole-genome resequencing data of different species from Pistacia genus (Table 1). These genome sequences data will be useful for comparative population genomics and to better understand the demographic history and adaptive evolution of pistachio. We used these data for providing insights into pistachio genetic diversity, population structure, and domestication [12].
Data description
The materials used for DNA extraction were fresh leaves collected from the germplasm collections of the Pistachio Research Institute in Rafsanjan, Iran; the pistachio germplasm of Ardakan, Iran. Leaf tissues were harvested during the 2015–2017 period and were stored at − 80 °C at the Shahid Bahonar University of Kerman, Iran, until subjected to DNA extraction. Extraction of the total genomic DNA from the fresh leaves was conducted using hexadecyl trimethyl ammonium bromide (CTAB) protocol with some modifications. NanoDrop spectrophotometer and 1% agarose gel electrophoresis were used to assess the quantity and quality of the extracted DNA, looking for a 260/280 absorbance ratio of 1.8–2.0, a single absorbance peak at 260 nm, and no evidence of significant band shearing or contamination. The isolated DNA was dissolved in 20 μl TE buffer and kept at − 20 °C for subsequent analyses. A total of 10 μg of the extracted DNA was used to construct libraries with an average insert size of 350 bp. Illumina library preparation pipeline was used as guideline for constructing the sequence libraries. The sequence libraries were sequenced on the Illumina Hiseq 4000 platform to create 150 bp paired-end reads.
The pistachio descriptor [13] was used as a guideline to measure the pistachio fruit size-related traits. The following phenotypes were recorded: fresh fruit weight with green skin (g), dried pistachio fruit weight (g), dried pistachio fruit length (mm), dried pistachio fruit diameter (mm), dried pistachio fruit width (mm), dried pistachio fruit and kernel shape, dried kernel weight (g), kernel diameter (mm), kernel width (mm), kernel length (mm).
We resequenced a total of 107 genomes from P. vera (93 cultivars and 14 genomes of wild pistachio) to an average depth of 6–8X. In addition, we resequenced 35 genomes from different close species, including P. palaestina (n = 5), P. mutica (n = 13), P. khinjuk (n = 14), and P. integerrima (n = 4) (Table 1). A total number of 4,851,118,730 billion reads (ranging from 33,305,900 to 34,990,618 reads per sample) were created across all samples. We produced a total of 727.67 Gbp data (The SRA data size of 303.14 GBytes).
We processed the data and conducted several analyses [12]. The quality of the raw sequence reads was assessed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and the reads were mapped to the pistachio reference genome (version 1) applying BWA-MEM (http://bio-bwa.sourceforge.net/). Sorting and duplicate marking of the bam format files were conducted by Picards tools 1.56 (http://picard.sourceforge.net) and SNPs calling was performed by using Genome Analysis Toolkit (GATK) (https://gatk.broadinstitute.org/hc/en-us). A total of 14,767,700 single-base variants (SNPs) were called [12]. The five different species, i.e., P. vera, P. palaestina, P. mutica, P. khinjuk, and P. integerrima were clearly separated following phylogenetic analyses using the maximum likelihood and neighbor joining methods [12].
Limitations
No genome sequence from the male pistachio plants was created in our study and this may limit some analyses related to the sex-specific traits. The geographical coverage of P. vera was limited to the main center of pistachio production, Iran, and the data may not be sufficient for gene flow, migration, and study on the domestication origin of pistachio. In addition, we produced the short-reads with a mean depth of 6–8X which is a medium depth and it might not be suitable for some genomic analyses.
Availability of data and materials
All sequence data reported here have been deposited in NGDC, GSA database (https://bigd.big.ac.cn/gsa/) under the Accession Number of CRA000978. Please see Table 1 and the references [14,15,16,17,18,19,20,21,22,23,24] for details and links to the data. In addition, the sequence data described in this Data note can be freely and openly accessed as the sequence read archive (SRA) format from the NCBI database (https://www.ncbi.nlm.nih.gov/sra/SRP189222). The SRA data mirror our deposited data in GSA. However, Pistachio Research Center (Rafsanjan, Iran), maintains confidentiality of information regarding the phenotypic data. Information on these phenotypic data are available upon request from the corresponding author.
Change history
21 October 2021
The original online version of this article was revised to correct the first author’s surname
20 October 2021
A Correction to this paper has been published: https://doi.org/10.1186/s13104-021-05802-6
Abbreviations
- GSA:
-
Genome Sequence Archive
- NCBI:
-
National Center for Biotechnology Information
- NGDC:
-
National Genomics Data Center
- SIFT:
-
Sorting intolerant from tolerant
- masl:
-
Meters above sea level
- CTAB:
-
Cetyl trimethylammonium bromide
- DNA:
-
Deoxyribonucleic acid
- TE:
-
Tris, Ethylenediaminetetraacetic acid
- SNPs:
-
Single nucleotide polymorphisms
- NGS:
-
Next-generation sequencing
- bp:
-
Base pair
References
Kafkas S. Phylogenetic analysis of the genus Pistacia by AFLP markers. Plant Syst Evol. 2006;262(1–2):113–24.
Parfitt DA, Badenes ML. Phylogeny of the genus Pistacia as determined from analysis of the chloroplast genome. P Natl Acad Sci USA. 1997;94:7987–92.
Motalebipour EZ, Kafkas S, Khodaeiaminjan M, et al. Genome survey of pistachio (Pistacia vera L.) by next generation sequencing: development of novel SSR markers and genetic diversity in Pistaca species. BMC Genomics. 2016;17:998.
Hormaza JI, Dollo L, Polito VS. Determination of relatedness and geographic movements of Pistacia vera (Pistachio; Anacardiaceae) germplasm by RAPD analysis. Econ Bot. 1994;48(4):349–58.
Hormaza JI, Pinney K, Polito VS. Genetic diversity of pistachio (Pistacia vera, Anacardiaceae) germplasm based on randomly amplified polymorphic DNA (RAPD) markers. Econ Bot. 1998;52:78–87.
Faostat. FAO web page. 2019. http://www.fao.org/faostat. Accessed 24 Feb 2020.
Zohary M. A monographical study of the genus Pistacia. Palestine J Bot. 1952;5(4):187–228.
Kafkas S, Kafkas E, Perl-Treves R. Morphological diversity and a germplasm survey of three wild Pistacia species in Turkey. Genet Resour Crop Evol. 2002;49(3):261–70.
Bozorgi M, Memariani Z, Mobli M, Salehi Surmaghi MH, Shams-Ardekani MR, Rahimi R. Five Pistacia species (P. vera, P. atlantica, P. terebinthus, P. khinjuk, and P. lentiscus): a review of their traditional uses, phytochemistry, and pharmacology. Sci World J. 2013;15:1–33.
Tsokou A, Georgopoulou K, Melliou E, Magiatis P, Tsitsa E. Composition and enantiomeric analysis of the essential oil of the fruits and the leaves of Pistacia vera from Greece. Molecules. 2007;12(6):1233–9.
Jazi MM, Seyedi SM, Ebrahimie E, et al. A genome-wide transcriptome map of pistachio (Pistacia vera L.) provides novel insights into salinity-related genes and marker discovery. BMC Genomics. 2017;18:627.
Zeng L, Tu XL, Dai H, et al. Whole genomes and transcriptomes reveal adaptation and domestication of pistachio. Genome Biol. 2019;20:79.
IPGRI. Descriptors for pistachio (Pistacia vera L.). Rome: International Plant Genetic Resources Institute; 1997. p. 1997.
Sequence Read Archive. 2019. https://www.ncbi.nlm.nih.gov/sra/SRP189222.
Genome Sequence Archive. 2019. https://bigd.big.ac.cn/gsa/browse/CRA000978/CRR030744.
Genome Sequence Archive. 2019. https://bigd.big.ac.cn/gsa/browse/CRA000978/CRR030745.
Genome Sequence Archive. 2019. https://bigd.big.ac.cn/gsa/browse/CRA000978/CRR030764.
Genome Sequence Archive. 2019. https://bigd.big.ac.cn/gsa/browse/CRA000978/CRR030765.
Genome Sequence Archive. 2019. https://bigd.big.ac.cn/gsa/browse/CRA000978/CRR030752.
Genome Sequence Archive. 2019. https://bigd.big.ac.cn/gsa/browse/CRA000978/CRR030840.
Genome Sequence Archive. 2019. https://bigd.big.ac.cn/gsa/browse/CRA000978/CRR030854.
Genome Sequence Archive. 2019. https://bigd.big.ac.cn/gsa/browse/CRA000978/CRR030871.
Genome Sequence Archive. 2019. https://bigd.big.ac.cn/gsa/browse/CRA000978/CRR030866.
Genome Sequence Archive. 2019. https://bigd.big.ac.cn/gsa/browse/CRA000978/CRR030873.
Acknowledgements
The authors gratefully acknowledge the support from the personnel of the Pistachio Research Center, Horticultural Sciences Research Institute, Agricultural Research, Education and Extension Organization (AREEO), Rafsanjan, Iran. Also, we greatly appreciate Dr. Hojjat Asadollahpour Nanaei who helped in DNA extraction, Dr. Hasan Moradian and Dr. Saeed S. Sohrabi for their help in collecting samples. Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman provided the laboratory facilities for some parts of this work.
Funding
Data collection for this study was funded by the Chinese Academy of Sciences President’s International Fellowship Initiative (No. 2016VBA050), the Youth Innovation Promotion Association, Chinese Academy of Sciences, the International Cooperation Program of Bureau of International Cooperation of Chinese Academy of Sciences (No. GJHZ1559), the National Natural Science Foundation of China (No. 91531303), and the Animal Branch of the Germplasm Bank of Wild Species, Chinese Academy of Sciences (the Large Research Infrastructure Funding).
Author information
Authors and Affiliations
Contributions
AE designed the study. Sampling was done by AT. The genome resequencing data were created and assessed by AE. AT prepared the manuscript. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
No approvals were required for the study, which complied with all relevant regulations. Consent to participate is not applicable to this study.
Consent to publish
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised to correct the first author’s surname.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Tajabadipour, A., Esmailizadeh, A. Whole genome resequencing data sets of different species from Pistacia genus. BMC Res Notes 14, 290 (2021). https://doi.org/10.1186/s13104-021-05702-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13104-021-05702-9
Keywords
- Cultivars
- Pistachio
- Genomes
- Whole-genome resequencing