Skip to main content

Whole genome resequencing data sets of different species from Pistacia genus



Pistacia genus belongs to the flowering plants in the cashew family and contains at least 11 species. The whole-genome resequencing data of different species from Pistacia genus are described herein. The data reported here will be useful for better understand the adaptive evolution, demographic history, genetic diversity, population structure, and domestication of pistachio.

Data description

Genomic DNA was isolated from fresh leaves and used to construct libraries with insert size of 350 bp. Sequence libraries were made and sequenced on the Illumina Hiseq 4000 platform to produce 150 bp paired-end reads. A total number of 4,851,118,730 billion reads (ranging from 33,305,900 to 34,990,618 reads per sample) were created across all samples. We produced a total of 727.67 Gbp data which have been deposited in the Genome Sequence Archive (GSA) database with the Accession of CRA000978. All of the data are also available as the sequence read archive (SRA) format in the National Center for Biotechnology Information (NCBI) with identifier of SRP189222, mirroring our deposited data in GSA.


Pistacia genus belongs to the flowering plants in the Anacardiaceae family. Other plants in the Anacardiaceae or the cashew family include poison oak, mango, poison ivy, sumac, and pepper tree [1]. The Pistacia covers at least eleven species and is estimated to be approximately 80 million years old [2]. Pistachio has a long history of plantation (3000–4000 years) in Iran and is native to the arid zones of Central Asia [3]. The Romans at the beginning of the Christian era introduced this plant into Mediterranean Europe [3] and its cultivation extended westward from its center of origin to Italy, Spain, and other Mediterranean regions of Southern Europe, North Africa, and the Middle East, as well as to China and to the United States and Australia [4, 5]. The worldwide production of pistachios was about 1.4 million tonnes in 2018, with Iran and the United States together accounting for 72% of the total as leading producers [6]. Pistachio plants have a juvenile period of about 5–10 years. The most economically important species is P. vera which is the only cultivated species from the Pistacia genus [7]. The other species of this genus are forest trees and have edible seeds and can be used as rootstock seed sources for cultivated P. vera [1, 8]. Also, plant materials such as leaf, seed, flower, and resins derived from the stem of some species from the Pistacia genus have pharmacological properties such as antioxidant, anti-inflammatory and antimicrobial activities [9,10,11].

This study provides whole-genome resequencing data of different species from Pistacia genus (Table 1). These genome sequences data will be useful for comparative population genomics and to better understand the demographic history and adaptive evolution of pistachio. We used these data for providing insights into pistachio genetic diversity, population structure, and domestication [12].

Table 1 Overview of data files/data sets

Data description

The materials used for DNA extraction were fresh leaves collected from the germplasm collections of the Pistachio Research Institute in Rafsanjan, Iran; the pistachio germplasm of Ardakan, Iran. Leaf tissues were harvested during the 2015–2017 period and were stored at − 80 °C at the Shahid Bahonar University of Kerman, Iran, until subjected to DNA extraction. Extraction of the total genomic DNA from the fresh leaves was conducted using hexadecyl trimethyl ammonium bromide (CTAB) protocol with some modifications. NanoDrop spectrophotometer and 1% agarose gel electrophoresis were used to assess the quantity and quality of the extracted DNA, looking for a 260/280 absorbance ratio of 1.8–2.0, a single absorbance peak at 260 nm, and no evidence of significant band shearing or contamination. The isolated DNA was dissolved in 20 μl TE buffer and kept at − 20 °C for subsequent analyses. A total of 10 μg of the extracted DNA was used to construct libraries with an average insert size of 350 bp. Illumina library preparation pipeline was used as guideline for constructing the sequence libraries. The sequence libraries were sequenced on the Illumina Hiseq 4000 platform to create 150 bp paired-end reads.

The pistachio descriptor [13] was used as a guideline to measure the pistachio fruit size-related traits. The following phenotypes were recorded: fresh fruit weight with green skin (g), dried pistachio fruit weight (g), dried pistachio fruit length (mm), dried pistachio fruit diameter (mm), dried pistachio fruit width (mm), dried pistachio fruit and kernel shape, dried kernel weight (g), kernel diameter (mm), kernel width (mm), kernel length (mm).

We resequenced a total of 107 genomes from P. vera (93 cultivars and 14 genomes of wild pistachio) to an average depth of 6–8X. In addition, we resequenced 35 genomes from different close species, including P. palaestina (n = 5), P. mutica (n = 13), P. khinjuk (n = 14), and P. integerrima (n = 4) (Table 1). A total number of 4,851,118,730 billion reads (ranging from 33,305,900 to 34,990,618 reads per sample) were created across all samples. We produced a total of 727.67 Gbp data (The SRA data size of 303.14 GBytes).

We processed the data and conducted several analyses [12]. The quality of the raw sequence reads was assessed using FastQC (, and the reads were mapped to the pistachio reference genome (version 1) applying BWA-MEM ( Sorting and duplicate marking of the bam format files were conducted by Picards tools 1.56 ( and SNPs calling was performed by using Genome Analysis Toolkit (GATK) ( A total of 14,767,700 single-base variants (SNPs) were called [12]. The five different species, i.e., P. vera, P. palaestina, P. mutica, P. khinjuk, and P. integerrima were clearly separated following phylogenetic analyses using the maximum likelihood and neighbor joining methods [12].


No genome sequence from the male pistachio plants was created in our study and this may limit some analyses related to the sex-specific traits. The geographical coverage of P. vera was limited to the main center of pistachio production, Iran, and the data may not be sufficient for gene flow, migration, and study on the domestication origin of pistachio. In addition, we produced the short-reads with a mean depth of 6–8X which is a medium depth and it might not be suitable for some genomic analyses.

Availability of data and materials

All sequence data reported here have been deposited in NGDC, GSA database ( under the Accession Number of CRA000978. Please see Table 1 and the references [14,15,16,17,18,19,20,21,22,23,24] for details and links to the data. In addition, the sequence data described in this Data note can be freely and openly accessed as the sequence read archive (SRA) format from the NCBI database ( The SRA data mirror our deposited data in GSA. However, Pistachio Research Center (Rafsanjan, Iran), maintains confidentiality of information regarding the phenotypic data. Information on these phenotypic data are available upon request from the corresponding author.



Genome Sequence Archive


National Center for Biotechnology Information


National Genomics Data Center


Sorting intolerant from tolerant


Meters above sea level


Cetyl trimethylammonium bromide


Deoxyribonucleic acid


Tris, Ethylenediaminetetraacetic acid


Single nucleotide polymorphisms


Next-generation sequencing


Base pair


  1. 1.

    Kafkas S. Phylogenetic analysis of the genus Pistacia by AFLP markers. Plant Syst Evol. 2006;262(1–2):113–24.

    Article  Google Scholar 

  2. 2.

    Parfitt DA, Badenes ML. Phylogeny of the genus Pistacia as determined from analysis of the chloroplast genome. P Natl Acad Sci USA. 1997;94:7987–92.

    CAS  Article  Google Scholar 

  3. 3.

    Motalebipour EZ, Kafkas S, Khodaeiaminjan M, et al. Genome survey of pistachio (Pistacia vera L.) by next generation sequencing: development of novel SSR markers and genetic diversity in Pistaca species. BMC Genomics. 2016;17:998.

    Article  Google Scholar 

  4. 4.

    Hormaza JI, Dollo L, Polito VS. Determination of relatedness and geographic movements of Pistacia vera (Pistachio; Anacardiaceae) germplasm by RAPD analysis. Econ Bot. 1994;48(4):349–58.

    Article  Google Scholar 

  5. 5.

    Hormaza JI, Pinney K, Polito VS. Genetic diversity of pistachio (Pistacia vera, Anacardiaceae) germplasm based on randomly amplified polymorphic DNA (RAPD) markers. Econ Bot. 1998;52:78–87.

    Article  Google Scholar 

  6. 6.

    Faostat. FAO web page. 2019. Accessed 24 Feb 2020.

  7. 7.

    Zohary M. A monographical study of the genus Pistacia. Palestine J Bot. 1952;5(4):187–228.

    Google Scholar 

  8. 8.

    Kafkas S, Kafkas E, Perl-Treves R. Morphological diversity and a germplasm survey of three wild Pistacia species in Turkey. Genet Resour Crop Evol. 2002;49(3):261–70.

    Article  Google Scholar 

  9. 9.

    Bozorgi M, Memariani Z, Mobli M, Salehi Surmaghi MH, Shams-Ardekani MR, Rahimi R. Five Pistacia species (P. vera, P. atlantica, P. terebinthus, P. khinjuk, and P. lentiscus): a review of their traditional uses, phytochemistry, and pharmacology. Sci World J. 2013;15:1–33.

    Article  Google Scholar 

  10. 10.

    Tsokou A, Georgopoulou K, Melliou E, Magiatis P, Tsitsa E. Composition and enantiomeric analysis of the essential oil of the fruits and the leaves of Pistacia vera from Greece. Molecules. 2007;12(6):1233–9.

    CAS  Article  Google Scholar 

  11. 11.

    Jazi MM, Seyedi SM, Ebrahimie E, et al. A genome-wide transcriptome map of pistachio (Pistacia vera L.) provides novel insights into salinity-related genes and marker discovery. BMC Genomics. 2017;18:627.

    Article  Google Scholar 

  12. 12.

    Zeng L, Tu XL, Dai H, et al. Whole genomes and transcriptomes reveal adaptation and domestication of pistachio. Genome Biol. 2019;20:79.

    Article  Google Scholar 

  13. 13.

    IPGRI. Descriptors for pistachio (Pistacia vera L.). Rome: International Plant Genetic Resources Institute; 1997. p. 1997.

    Google Scholar 

  14. 14.

    Sequence Read Archive. 2019.

  15. 15.

    Genome Sequence Archive. 2019.

  16. 16.

    Genome Sequence Archive. 2019.

  17. 17.

    Genome Sequence Archive. 2019.

  18. 18.

    Genome Sequence Archive. 2019.

  19. 19.

    Genome Sequence Archive. 2019.

  20. 20.

    Genome Sequence Archive. 2019.

  21. 21.

    Genome Sequence Archive. 2019.

  22. 22.

    Genome Sequence Archive. 2019.

  23. 23.

    Genome Sequence Archive. 2019.

  24. 24.

    Genome Sequence Archive. 2019.

Download references


The authors gratefully acknowledge the support from the personnel of the Pistachio Research Center, Horticultural Sciences Research Institute, Agricultural Research, Education and Extension Organization (AREEO), Rafsanjan, Iran. Also, we greatly appreciate Dr. Hojjat Asadollahpour Nanaei who helped in DNA extraction, Dr. Hasan Moradian and Dr. Saeed S. Sohrabi for their help in collecting samples. Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman provided the laboratory facilities for some parts of this work.


Data collection for this study was funded by the Chinese Academy of Sciences President’s International Fellowship Initiative (No. 2016VBA050), the Youth Innovation Promotion Association, Chinese Academy of Sciences, the International Cooperation Program of Bureau of International Cooperation of Chinese Academy of Sciences (No. GJHZ1559), the National Natural Science Foundation of China (No. 91531303), and the Animal Branch of the Germplasm Bank of Wild Species, Chinese Academy of Sciences (the Large Research Infrastructure Funding).

Author information




AE designed the study. Sampling was done by AT. The genome resequencing data were created and assessed by AE. AT prepared the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Ali Esmailizadeh.

Ethics declarations

Ethics approval and consent to participate

No approvals were required for the study, which complied with all relevant regulations. Consent to participate is not applicable to this study.

Consent to publish

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tajabadi, A., Esmailizadeh, A. Whole genome resequencing data sets of different species from Pistacia genus. BMC Res Notes 14, 290 (2021).

Download citation


  • Cultivars
  • Pistachio
  • Genomes
  • Whole-genome resequencing