Whole genome resequencing data sets of different species from Pistacia genus

Pistacia genus belongs to the flowering plants in the cashew family and contains at least 11 species. The whole-genome resequencing data of different species from Pistacia genus are described herein. The data reported here will be useful for better understand the adaptive evolution, demographic history, genetic diversity, population structure, and domestication of pistachio. Genomic DNA was isolated from fresh leaves and used to construct libraries with insert size of 350 bp. Sequence libraries were made and sequenced on the Illumina Hiseq 4000 platform to produce 150 bp paired-end reads. A total number of 4,851,118,730 billion reads (ranging from 33,305,900 to 34,990,618 reads per sample) were created across all samples. We produced a total of 727.67 Gbp data which have been deposited in the Genome Sequence Archive (GSA) database with the Accession of CRA000978. All of the data are also available as the sequence read archive (SRA) format in the National Center for Biotechnology Information (NCBI) with identifier of SRP189222, mirroring our deposited data in GSA.


Objective
Pistacia genus belongs to the flowering plants in the Anacardiaceae family. Other plants in the Anacardiaceae or the cashew family include poison oak, mango, poison ivy, sumac, and pepper tree [1]. The Pistacia covers at least eleven species and is estimated to be approximately 80 million years old [2]. Pistachio has a long history of plantation (3000-4000 years) in Iran and is native to the arid zones of Central Asia [3]. The Romans at the beginning of the Christian era introduced this plant into Mediterranean Europe [3] and its cultivation extended westward from its center of origin to Italy, Spain, and other Mediterranean regions of Southern Europe, North Africa, and the Middle East, as well as to China and to the United States and Australia [4,5]. The worldwide production of pistachios was about 1.4 million tonnes in 2018, with Iran and the United States together accounting for 72% of the total as leading producers [6]. Pistachio plants have a juvenile period of about 5-10 years. The most economically important species is P. vera which is the only cultivated species from the Pistacia genus [7]. The other species of this genus are forest trees and have edible seeds and can be used as rootstock seed sources for cultivated P. vera [1,8]. Also, plant materials such as leaf, seed, flower, and resins derived from the stem of some species from the Pistacia genus have pharmacological properties such as antioxidant, anti-inflammatory and antimicrobial activities [9][10][11].
This study provides whole-genome resequencing data of different species from Pistacia genus ( Table 1). These genome sequences data will be useful for comparative population genomics and to better understand the demographic history and adaptive evolution of pistachio. We used these data for providing insights into pistachio

Data description
The materials used for DNA extraction were fresh leaves collected from the germplasm collections of the Pistachio Research Institute in Rafsanjan, Iran; the pistachio germplasm of Ardakan, Iran. Leaf tissues were harvested during the 2015-2017 period and were stored at − 80 °C at the Shahid Bahonar University of Kerman, Iran, until subjected to DNA extraction. Extraction of the total genomic DNA from the fresh leaves was conducted using hexadecyl trimethyl ammonium bromide (CTAB) protocol with some modifications. NanoDrop spectrophotometer and 1% agarose gel electrophoresis were used to assess the quantity and quality of the extracted DNA, looking for a 260/280 absorbance ratio of 1.8-2.0, a single absorbance peak at 260 nm, and no evidence of significant band shearing or contamination. The isolated DNA was dissolved in 20 μl TE buffer and kept at − 20 °C for subsequent analyses. A total of 10 μg of the extracted DNA was used to construct libraries with an average insert size of 350 bp. Illumina library preparation pipeline was used as guideline for constructing the sequence libraries. The sequence libraries were sequenced on the Illumina Hiseq 4000 platform to create 150 bp pairedend reads.
The pistachio descriptor [13] was used as a guideline to measure the pistachio fruit size-related traits. The following phenotypes were recorded: fresh fruit weight with green skin (g), dried pistachio fruit weight (g), dried pistachio fruit length (mm), dried pistachio fruit diameter (mm), dried pistachio fruit width (mm), dried pistachio fruit and kernel shape, dried kernel weight (g), kernel diameter (mm), kernel width (mm), kernel length (mm).
We resequenced a total of 107 genomes from P. vera (93 cultivars and 14 genomes of wild pistachio) to an average depth of 6-8X. In addition, we resequenced 35 genomes from different close species, including P. palaestina (n = 5), P. mutica (n = 13), P. khinjuk (n = 14), and P. integerrima (n = 4) ( Table 1). A total number of 4,851,118,730 billion reads (ranging from 33,305,900 to 34,990,618 reads per sample) were created across all samples. We produced a total of 727.67 Gbp data (The SRA data size of 303.14 GBytes).

Limitations
No genome sequence from the male pistachio plants was created in our study and this may limit some analyses related to the sex-specific traits. The geographical coverage of P. vera was limited to the main center of pistachio production, Iran, and the data may not be sufficient for gene flow, migration, and study on the domestication origin of pistachio. In addition, we produced the short-reads with a mean depth of 6-8X which is a medium depth and it might not be suitable for some genomic analyses.