Development of novel genic microsatellite markers from transcriptome sequencing in sugar maple (Acer saccharum Marsh.)

Background Sugar maple (Acer saccharum Marsh.) is a hardwood tree species native to northeastern North America and economically valued for its wood and sap. Yet, few molecular genetic resources have been developed for this species to date. Microsatellite markers have been a useful tool in population genetics, e.g., to monitor genetic variation and to analyze gene flow patterns. The objective of this study is to develop a reference transcriptome and microsatellite markers in sugar maple. Findings A set of 117,861 putative unique transcripts were assembled using 29.2 Gb of RNA sequencing data derived from different tissues and stress treatments. From this set of sequences a total of 1068 microsatellite motifs were identified. Out of 58 genic microsatellite markers tested on a population of 47 sugar maple trees in upper Michigan, 22 amplified well, of which 16 were polymorphic and 6 were monomorphic. Values for expected heterozygosity varied from 0.224 to 0.726 for individual loci. Of the 16 polymorphic markers, 15 exhibited transferability to other Acer L. species. Conclusions Genic microsatellite markers can be applied to analyze genetic variation in potentially adaptive genes relative to genomic reference markers as a basis for the management of sugar maple genetic resources in the face of climate change. Electronic supplementary material The online version of this article (doi:10.1186/s13104-017-2653-2) contains supplementary material, which is available to authorized users.

along environmental gradients. Heat and drought stress and population fragmentation are especially expected at the southern distribution edge of the species [7], as the result of a warming climate. The new genic microsatellites will be useful to analyze patterns of genetic variation and provide the basis for the management and conservation of the species' genetic resources in a changing climate.

Plant materials for SSR analysis
Leaf samples were collected from 47 georeferenced adult trees of one natural population close to Houghton, MI (47°07′07.46″N, 88°35′16.41″W) in a region which is dominated by sugar maple forests [4]. Total genomic DNA was isolated for the 47 samples of Acer saccharum and for three samples of each Acer rubrum L. (section Rubra), Acer saccharinum L. (section Rubra), Acer platanoides L. (section Platanoidea) and Acer ginnala Maxim. (section Ginnala) from ~1 cm 2 of silica gel dried leaf samples using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany).

Library preparation, transcriptome sequencing and assembly
A total of 51 transcriptome libraries were derived from tissues (leaves, petioles, roots) of ozone, cold, heat, drought, wound stressed and unstressed half-sib seedlings. Tissue types and treatments are described in Additional file 1. Details on ozone and wound stress treatments are given in [8]. Tissues were flash frozen and stored at −80 °C until RNA was extracted. Approximately 1 g of frozen tissue was used for isolation of total RNA with a modified CTAB method that uses lithium chloride precipitation [9]. After RNA quality assessment with an Agilent Bioanalyzer (Agilent Technologies), RNA was subjected to TruSeq Stranded mRNA library preparation (San Diego, CA), following the manufacturer's protocol. All libraries were labeled with dual barcodes. Paired-end sequencing of 46 sugar maple libraries was performed on an Illumina HiSeq 2500 instrument (San Diego, CA) at a read length of 101 bp. A further set of 5 libraries were sequenced on an Illumina MiSeq (San Diego, CA) (Additional file 1). Raw reads can be found in the NCBI Short Read Archive (SRA) with the bioproject accession PRJNA273272. Read statistics for each library are provided in Additional file 1.
Trimmomatic version 0.35 was used to trim raw sequencing reads of adapter sequences and low quality bases [10]. Kmer-based error correction of RNASeq reads was performed in Rcorrector (kmer length of 31) [11]. Trimmed and corrected reads were assembled with the Trinity pipeline version r20131110 [12]. Isoforms from the Trinity pipeline were collapsed and further assembled with cd-hit version 4.6.1 [13]. Open reading frame (ORF) prediction was executed with TransDecoder version 2.0.1 [14]. TransDecoder predictions were further improved by performing searches using BLAST version 2.2.29 versus the Uniprot SwissProt protein databases (cut-off of e-value <1e−4) and by HMMER version 3.1b2 against the pfam database [15][16][17][18]. Two methods were employed to determine assembly completeness of the sugar maple transcriptome. First, transcripts were assessed by BUSCO (Benchmarking Universal Single-Copy Orthologs) version 1.1b1 to determine the presence of universal single-copy orthologs using the early release plant database [19]. Second, TransRate v1.0.3 was used to calculate quality scores for contigs based on the read alignment evidence [20].
Primer pairs (Sigma Aldrich Inc., St. Louis, MO) were tested for amplification initially in four randomly selected samples. Markers that amplified polymorphic products in the expected size range in these four samples were amplified in all 47 samples. Genetic variation assessment was conducted using GenAlEx 6.502 [24]. Specifically, observed and expected heterozygosity (H o , H e , respectively) [25] and number of alleles (N a ) were calculated. Further, the inbreeding coefficient (F) was estimated for each locus in the population using GENEPOP version 4.2 [26,27]. Significant deviations from Hardy-Weinberg proportions were tested using Fischer's exact test in GENEPOP.
Additionally pairwise linkage disequilibrium was tested for all marker pairs in GENEPOP. Bonferroni corrections were applied to adjust for multiple testing [28].

Transcriptome assembly and validation
Illumina sequencing of 51 transcriptome sugar maple libraries of varying tissue types and treatments produced 282 million reads (29.9 Gb), available at NCBI SRA bioproject accession PRJNA273272 (Additional file 1). De novo assembly of reads yielded 117,861 putative unique transcripts (PUTs) with an average length of 945 bases and an N50 length of 1667 bases. TransDecoder version 2.0.1 was used to identify 67,537 possible open read frames (ORFs) derived from 51,390 PUTs (43.6%). Multiple ORFs per PUT can occur for example due to chimeras in the assembly or sequencing errors (insertions/ deletions) that shift the reading frame incorrectly. The peptide sequences calculated from the ORFs average 284 amino acids in length. Assembly accuracy was assessed as quality metrics based on examination of read mapping using Transrate [20] and identified an overall mapping rate of 72.8%, and over 85% of those mapped reads supported the assembly (i.e. both pairs aligned to the same transcript in the correct orientation without overlapping either transcript end). Out of a total of 956 BUSCO ortho-groups searched, 914 were found in the BUSCO database. These were further classified as 434 complete single-copy genes, 446 complete but duplicated genes, and 34 fragmented copies of genes. This represents the first transcriptome reference for sugar maple and will be a valuable genomic resource for future studies, for example, to generate markers, to use as a reference for gene expression sequencing or to identify candidate stress-response genes. Raw reads are stored at NCBI Short Read Archive (SRA) with the bioproject accession PRJNA273272.

Functional annotation and SSR discovery
Sugar maple PUTs were used as queries for BLAST searches against the proteins in the Swiss-Prot and plant TrEMBL [16] databases yielding 64,458 (54.7%) PUTs with matches to TrEMBL plant proteins and 49,760 (42.2%) PUTs with matches to Swiss-Prot. Simple Sequence Repeat (SSR) extraction found 6279 SSRs in 5965 PUTs (5.1%). SSRs were identified for 2, 3, and 4 bp motifs. The 4 bp motifs were the rarest, making up 0.97% of total SSRs found. The shorter 2 bp and 3 bp motifs were much more common, making up 78.7 and 20.4% of total SSRs found, respectively. Primers were designed for 1068 SSRs using the flanking regions of the repeats (Additional file 2). Of these 812 were 2 bp repeats ranging from 8 to 15 repeats, 234 were 3 bp repeats ranging from 7 to 14 repeats, and 22 were 4 bp repeats with 6 repeats. Additional file 2 contains a report of the SSR analysis along with primers designed for SSRs meeting the appropriate requirements and functional annotations for SSR markers.

Microsatellite marker characterization
We selected 58 SSRs with a wide range of expected amplicon sizes (100-487 bp) to develop sets of markers with non-overlapping size ranges that could be used for multiplex PCRs in subsequent studies. Out of the 58 primers tested, 16 amplified polymorphic loci and 6 amplified monomorphic loci (Table 1; Additional file 2). The expected size for the polymorphic markers varied from 113 and 425 bp. For polymorphic loci H e ranged from 0.224 to 0.726, H o from 0.143 to 0.722 and N a from 2 to 12. A total of 11 out of the 16 markers showed no significant deviation from Hardy-Weinberg proportions after Bonferroni correction, while markers As_di12 and As_ di38 (p < 0.05) and markers As_di34, As_di48 and As_di9 (p < 0.01) showed significant and positive F values. No significant linkage disequilibrium was found between marker pairs after Bonferroni correction (p < 0.05).
A total of 16 markers were tested for transferability in three DNA samples from each of the four Acer species which represent three taxonomic sections: A. saccharinum (section Rubra), A. rubrum (section Rubra), A. platanoides (section Platanoidea) and A. ginnala (section Ginnala) using the same PCR protocol as described above. The results showed that most primers amplified a multi-banding pattern across species. However, 15 markers amplified single loci in the expected size range in at least one other Acer species (Table 2). Ten markers were monomorphic in some or all species (Table 2). Two markers, As_tetra1 and As_di41, were polymorphic in more than one species. The level of genetic variation for individual polymorphic markers varied considerably. Mean variation at polymorphic markers was much lower for genic (H o :0.424; H e :0.543) than for non-genic markers in the same samples (H o :0.708; H e :0.822 [4]). Similarly, lower genetic variation was observed at genic EST-SSRs than at genomic SSRs in several Quercus rubra L. and Quercus ellipsoidalis E.J. Hill populations [29]. The generally lower genetic variation at genic EST-SSRs suggests that these markers are not selectively neutral, but subject to selection [6]. Genetic variation observed at genic EST-SSRs in sugar maple was lower than estimates obtained in other wind-pollinated tree species such as North American oak species Q. rubra and Q. ellipsoidalis from the same geographic region (H e = 0.700, range 0.690-0.740, [29]). As for non-genic markers which were characterized in the same population [4] most F values are not significantly different from zero. High F values at individual markers could be due to the presence of null alleles. Markers As_di34 and As_di9 with high F values and significant deviation from Hardy-Weinberg proportions (p < 0.001) showed no amplification in nearly half of the samples suggesting high null allele frequencies. All other markers amplified in most samples making them applicable for population genetic analyses. Only two out of the 47 samples showed variable amplification success potentially as result of low DNA quality. The other marker, As_di48, with a very high F value showed high amplification success comparable to the other markers. Overall high variation was observed in F values ranging from −0.081 for As_di7 to 0.705 for As_di48.

Table 1 Primer sequences and descriptions of 22 microsatellite markers developed in Acer saccharum
The overall high level of genetic variation, the absence of linkage disequilibrium and low to moderate deviations from HWE for most markers suggest the suitability of these markers for population genetic studies in sugar maple. In addition, markers showed high transferability to other Acer species providing at least two new polymorphic markers for the four other Acer species, and even six new polymorphic markers for one of the species, A. platanoides. High transferability of markers across Acer sections suggests their potential usefulness for population genetic analyses in other maple species. In contrast, only two out of 13 genomic SSRs amplified polymorphic loci in one other species, Acer ginnala [4]. This high transferability is typical of EST-SSRs and has been shown repeatedly in crop plants and is more prevalent than in non-EST-SSR markers [6]. Genetic variation at SSR markers was only assessed in a single natural population in Michigan. Allelic diversity will certainly vary among populations within the large distribution range of the species. However, in outcrossing, wind-pollinated tree species with wide and continuous distribution ranges such as sugar maple most of the genetic variation (i.e. H e ) is generally distributed within populations [30] and similar levels of within population variation are expected within the main distribution range of the species.

Conclusions
Especially in the trailing (southern) edges of the species' distribution range lower genetic variation is expected as result of differential mortality due to increased abiotic and biotic stressors. With the availability of both neutral genomic SSRs and genic SSRs with potential role in adaptation, patterns of genetic variation and their relation to environmental and climatic variables can be studied. For example, the presented marker and transcriptome resources can be used to identify genes involved in the adaptation to these stressful conditions, for example by analyzing associations of alleles with environmental gradients [31].