Development of novel genic microsatellite markers from transcriptome sequencing in sugar maple (Acer saccharum Marsh.)
© The Author(s) 2017
Received: 21 November 2016
Accepted: 21 July 2017
Published: 8 August 2017
Sugar maple (Acer saccharum Marsh.) is a hardwood tree species native to northeastern North America and economically valued for its wood and sap. Yet, few molecular genetic resources have been developed for this species to date. Microsatellite markers have been a useful tool in population genetics, e.g., to monitor genetic variation and to analyze gene flow patterns. The objective of this study is to develop a reference transcriptome and microsatellite markers in sugar maple.
A set of 117,861 putative unique transcripts were assembled using 29.2 Gb of RNA sequencing data derived from different tissues and stress treatments. From this set of sequences a total of 1068 microsatellite motifs were identified. Out of 58 genic microsatellite markers tested on a population of 47 sugar maple trees in upper Michigan, 22 amplified well, of which 16 were polymorphic and 6 were monomorphic. Values for expected heterozygosity varied from 0.224 to 0.726 for individual loci. Of the 16 polymorphic markers, 15 exhibited transferability to other Acer L. species.
Genic microsatellite markers can be applied to analyze genetic variation in potentially adaptive genes relative to genomic reference markers as a basis for the management of sugar maple genetic resources in the face of climate change.
Sugar maple (Acer saccharum Marsh.) is a hardwood species native to the Northeastern United States and Southern Canada. It is valued for its wood and syrup/sugar, supports a large industry  and is an important ecosystem component . While microsatellite markers have many applications in population genetic analyses, few resources for sugar maple have been developed to date [3, 4].
Microsatellite markers are short sequence repeats (SSRs) which are often highly variable and prone to mutations . Genic microsatellite markers can be derived from EST-libraries and can be located in the coding regions or in the 5′- and 3′ untranslated regions of mRNAs . Due to their location in expressed genes, they often show a lower genetic variation than genomic (non-genic) microsatellites and a higher transferability between related species . Especially, stress-responsive genes might be under selection and show clinal variation along environmental gradients. Heat and drought stress and population fragmentation are especially expected at the southern distribution edge of the species , as the result of a warming climate. The new genic microsatellites will be useful to analyze patterns of genetic variation and provide the basis for the management and conservation of the species’ genetic resources in a changing climate.
Plant materials for SSR analysis
Leaf samples were collected from 47 georeferenced adult trees of one natural population close to Houghton, MI (47°07′07.46″N, 88°35′16.41″W) in a region which is dominated by sugar maple forests . Total genomic DNA was isolated for the 47 samples of Acer saccharum and for three samples of each Acer rubrum L. (section Rubra), Acer saccharinum L. (section Rubra), Acer platanoides L. (section Platanoidea) and Acer ginnala Maxim. (section Ginnala) from ~1 cm2 of silica gel dried leaf samples using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany).
Library preparation, transcriptome sequencing and assembly
A total of 51 transcriptome libraries were derived from tissues (leaves, petioles, roots) of ozone, cold, heat, drought, wound stressed and unstressed half-sib seedlings. Tissue types and treatments are described in Additional file 1. Details on ozone and wound stress treatments are given in . Tissues were flash frozen and stored at −80 °C until RNA was extracted. Approximately 1 g of frozen tissue was used for isolation of total RNA with a modified CTAB method that uses lithium chloride precipitation . After RNA quality assessment with an Agilent Bioanalyzer (Agilent Technologies), RNA was subjected to TruSeq Stranded mRNA library preparation (San Diego, CA), following the manufacturer’s protocol. All libraries were labeled with dual barcodes. Paired-end sequencing of 46 sugar maple libraries was performed on an Illumina HiSeq 2500 instrument (San Diego, CA) at a read length of 101 bp. A further set of 5 libraries were sequenced on an Illumina MiSeq (San Diego, CA) (Additional file 1). Raw reads can be found in the NCBI Short Read Archive (SRA) with the bioproject accession PRJNA273272. Read statistics for each library are provided in Additional file 1.
Trimmomatic version 0.35 was used to trim raw sequencing reads of adapter sequences and low quality bases . Kmer-based error correction of RNASeq reads was performed in Rcorrector (kmer length of 31) . Trimmed and corrected reads were assembled with the Trinity pipeline version r20131110 . Isoforms from the Trinity pipeline were collapsed and further assembled with cd-hit version 4.6.1 . Open reading frame (ORF) prediction was executed with TransDecoder version 2.0.1 . TransDecoder predictions were further improved by performing searches using BLAST version 2.2.29 versus the Uniprot SwissProt protein databases (cut-off of e-value <1e−4) and by HMMER version 3.1b2 against the pfam database [15–18]. Two methods were employed to determine assembly completeness of the sugar maple transcriptome. First, transcripts were assessed by BUSCO (Benchmarking Universal Single-Copy Orthologs) version 1.1b1 to determine the presence of universal single-copy orthologs using the early release plant database . Second, TransRate v1.0.3 was used to calculate quality scores for contigs based on the read alignment evidence .
Functional annotation and SSR discovery
BLAST version 2.2.29 queries were performed against the plant taxonomic division of the TrEMBL protein database and Swiss-Prot protein database [15, 16]. Simple sequence repeats (SSRs) were found by searching putative unique transcripts (PUTs) using a perl script (https://github.com/mestato/lab_code/tree/master/hwg_gssr_scripts). SSRs needed to meet specific requirements for inclusion in the final output, requiring: 2 bp repeats to have 8–200 copies, 3 bp repeats to have 7–133 copies, and 4 bp repeats to have 6–100 copies. Compound SSRs, defined as SSRs occurring with less than 15 bases of separation, were removed. Dustmasker was used to mask low complexity regions to aid primer design in Primer3 v2.3.6 [15, 21]. The following Primer3 parameters were used: primer_product_size_range = 100–450, primer_min_tm = 55.0, primer_max_tm = 65.0, primer_min_gc = 40, primer_max_gc = 60, primer_max_poly_x = 3, primer_gc_clamp = 2.
The DNA was diluted to ~1 ng/µl with double deionized water (Ultra Pure Water, Molecular Grade, Phenix Research Products). A 15 µl PCR reaction was prepared consisting of 6 µl double deionized water (Ultra Pure Water, Molecular Grade from Phenix Research Products), 5 µl HotFIREPol (Solis Biodyne, containing 2 mM dNTPs, 1U Taq polymerase, 10 mM MgCl2), 0.2 µl of 5 µM forward primer with the M13 tail (5′-CACGACGTTGTAAACGAC-3′), 1.5 µl of 5 µM 5′ dye labelled (6-FAM) M13 primer, 0.5 µl of 5 µM pig-tailed (5′-GTTTCTT-3′) reverse primer [22, 23] (Sigma Aldrich Inc., St. Louis, MO) and 2 µl DNA (~2 ng). The PCR touchdown reaction was performed in an Applied Biosystems 2720 Thermal Cycler (Foster City, CA) consisting of an initial denaturation at 95 °C for 15 min, 10 touch-down cycles of 1 min at 94 °C, 1 min at 60 °C (decreasing 1 °C each cycle), and 1 min at 72 °C, followed by 25 cycles of 94 °C for 1 min, 50 °C for 1 min and 72 °C for 1 min. The final extension was at 72 °C for 20 min. PCR products were checked on 1.5% agarose gels and stained with 2 µl GelRed (10,000X in water; Biotium, Hayward, CA) and exact fragment sizes were determined after electrophoretic separation on an ABI Prism® Genetic Analyzer 3730 with Gene-ScanTM LIZ-500 as internal size standard. Alleles were assigned to bins after careful visual inspection using GeneMarker® V2.6.7 (SoftGenetics).
Primer pairs (Sigma Aldrich Inc., St. Louis, MO) were tested for amplification initially in four randomly selected samples. Markers that amplified polymorphic products in the expected size range in these four samples were amplified in all 47 samples. Genetic variation assessment was conducted using GenAlEx 6.502 . Specifically, observed and expected heterozygosity (Ho, He, respectively)  and number of alleles (Na) were calculated. Further, the inbreeding coefficient (F) was estimated for each locus in the population using GENEPOP version 4.2 [26, 27]. Significant deviations from Hardy–Weinberg proportions were tested using Fischer’s exact test in GENEPOP. Additionally pairwise linkage disequilibrium was tested for all marker pairs in GENEPOP. Bonferroni corrections were applied to adjust for multiple testing .
Results and discussion
Transcriptome assembly and validation
Illumina sequencing of 51 transcriptome sugar maple libraries of varying tissue types and treatments produced 282 million reads (29.9 Gb), available at NCBI SRA bioproject accession PRJNA273272 (Additional file 1). De novo assembly of reads yielded 117,861 putative unique transcripts (PUTs) with an average length of 945 bases and an N50 length of 1667 bases. TransDecoder version 2.0.1 was used to identify 67,537 possible open read frames (ORFs) derived from 51,390 PUTs (43.6%). Multiple ORFs per PUT can occur for example due to chimeras in the assembly or sequencing errors (insertions/deletions) that shift the reading frame incorrectly. The peptide sequences calculated from the ORFs average 284 amino acids in length. Assembly accuracy was assessed as quality metrics based on examination of read mapping using Transrate  and identified an overall mapping rate of 72.8%, and over 85% of those mapped reads supported the assembly (i.e. both pairs aligned to the same transcript in the correct orientation without overlapping either transcript end). Out of a total of 956 BUSCO ortho-groups searched, 914 were found in the BUSCO database. These were further classified as 434 complete single-copy genes, 446 complete but duplicated genes, and 34 fragmented copies of genes. This represents the first transcriptome reference for sugar maple and will be a valuable genomic resource for future studies, for example, to generate markers, to use as a reference for gene expression sequencing or to identify candidate stress-response genes. Raw reads are stored at NCBI Short Read Archive (SRA) with the bioproject accession PRJNA273272.
Functional annotation and SSR discovery
Sugar maple PUTs were used as queries for BLAST searches against the proteins in the Swiss-Prot and plant TrEMBL  databases yielding 64,458 (54.7%) PUTs with matches to TrEMBL plant proteins and 49,760 (42.2%) PUTs with matches to Swiss-Prot. Simple Sequence Repeat (SSR) extraction found 6279 SSRs in 5965 PUTs (5.1%). SSRs were identified for 2, 3, and 4 bp motifs. The 4 bp motifs were the rarest, making up 0.97% of total SSRs found. The shorter 2 bp and 3 bp motifs were much more common, making up 78.7 and 20.4% of total SSRs found, respectively. Primers were designed for 1068 SSRs using the flanking regions of the repeats (Additional file 2). Of these 812 were 2 bp repeats ranging from 8 to 15 repeats, 234 were 3 bp repeats ranging from 7 to 14 repeats, and 22 were 4 bp repeats with 6 repeats. Additional file 2 contains a report of the SSR analysis along with primers designed for SSRs meeting the appropriate requirements and functional annotations for SSR markers.
Microsatellite marker characterization
Primer sequences and descriptions of 22 microsatellite markers developed in Acer saccharum
Primer sequences (5′–3′)
Size range (bp)
**Expected fragment size
Transferability of Acer saccharum microsatellite loci to other Acer species
The level of genetic variation for individual polymorphic markers varied considerably. Mean variation at polymorphic markers was much lower for genic (Ho:0.424; He:0.543) than for non-genic markers in the same samples (Ho:0.708; He:0.822 ). Similarly, lower genetic variation was observed at genic EST-SSRs than at genomic SSRs in several Quercus rubra L. and Quercus ellipsoidalis E.J. Hill populations . The generally lower genetic variation at genic EST-SSRs suggests that these markers are not selectively neutral, but subject to selection . Genetic variation observed at genic EST-SSRs in sugar maple was lower than estimates obtained in other wind-pollinated tree species such as North American oak species Q. rubra and Q. ellipsoidalis from the same geographic region (He = 0.700, range 0.690–0.740, ). As for non-genic markers which were characterized in the same population  most F values are not significantly different from zero. High F values at individual markers could be due to the presence of null alleles. Markers As_di34 and As_di9 with high F values and significant deviation from Hardy–Weinberg proportions (p < 0.001) showed no amplification in nearly half of the samples suggesting high null allele frequencies. All other markers amplified in most samples making them applicable for population genetic analyses. Only two out of the 47 samples showed variable amplification success potentially as result of low DNA quality. The other marker, As_di48, with a very high F value showed high amplification success comparable to the other markers. Overall high variation was observed in F values ranging from −0.081 for As_di7 to 0.705 for As_di48.
The overall high level of genetic variation, the absence of linkage disequilibrium and low to moderate deviations from HWE for most markers suggest the suitability of these markers for population genetic studies in sugar maple. In addition, markers showed high transferability to other Acer species providing at least two new polymorphic markers for the four other Acer species, and even six new polymorphic markers for one of the species, A. platanoides. High transferability of markers across Acer sections suggests their potential usefulness for population genetic analyses in other maple species. In contrast, only two out of 13 genomic SSRs amplified polymorphic loci in one other species, Acer ginnala . This high transferability is typical of EST-SSRs and has been shown repeatedly in crop plants and is more prevalent than in non-EST-SSR markers . Genetic variation at SSR markers was only assessed in a single natural population in Michigan. Allelic diversity will certainly vary among populations within the large distribution range of the species. However, in outcrossing, wind-pollinated tree species with wide and continuous distribution ranges such as sugar maple most of the genetic variation (i.e. He) is generally distributed within populations  and similar levels of within population variation are expected within the main distribution range of the species.
Especially in the trailing (southern) edges of the species’ distribution range lower genetic variation is expected as result of differential mortality due to increased abiotic and biotic stressors. With the availability of both neutral genomic SSRs and genic SSRs with potential role in adaptation, patterns of genetic variation and their relation to environmental and climatic variables can be studied. For example, the presented marker and transcriptome resources can be used to identify genes involved in the adaptation to these stressful conditions, for example by analyzing associations of alleles with environmental gradients .
OG, HL, MS, MC, SES, SS, JEC conceived and designed the experiment. MC and SS provided plant materials. MH, TL, TB, CC, HL, NZ, YH and DM performed experiments. OG, MH, TL, and MS analyzed the data. OG, MH, MS and TL wrote the first draft of the paper. All authors were involved in the revision of the manuscript and read, edited, and approved the final manuscript.
This project was supported by the National Science Foundation Plant Genome Research Program (TRPGRA2 IOS-1025974) as part of the Hardwood Genomics Project. Additional funding was provided by the Ecosystem Science Center and the Life Science and Technology Institute (LSTI) at Michigan Technological University. We acknowledge support by the German Research Foundation (DFG) and the Open Access Publication Funds of Göttingen University. This work would not have been possible without the contributions of Sudhir Khodwekar at Michigan Technological University who isolated DNA and helped with interpreting the results. Monica Harmon was partly supported by the OSM/VISTA Master of Science degree program. We also acknowledge the Genomics Core Facility of the Huck Institutes of the Life Sciences at Penn State University for MiSeq data production.
The authors declare that they have no competing interests.
Availability of data and materials
Raw reads can be found in the NCBI Short Read Archive (SRA) with the bioproject accession PRJNA273272.
The study was funded the National Science Foundation Plant Genome Research Program (TRPGRA2 IOS-1025974) as part of the Hardwood Genomics Project.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- USDA/NASS. QuickStats Ad hoc Query Tool. In. Quickstats.nass.usda.gov; 2015.Google Scholar
- Emerman SH, Dawson TE. Hydraulic lift and its influence on the water content of the rhizosphere: an example from sugar maple, Acer saccharum. Oecologia. 1996;108(2):273–8.View ArticlePubMedGoogle Scholar
- Graignic N, Tremblay F, Bergeron Y. Development of polymorphic nuclear microsatellite markers in sugar maple (Acer saccharum Marsh.) using cross-species transfer and SSR-enriched shotgun pyrosequencing. Conserv Genet Resour. 2013;5(3):845–8.View ArticleGoogle Scholar
- Khodwekar S, Staton M, Coggeshall MV, Carlson JE, Gailing O. Nuclear microsatellite markers for population genetic studies in sugar maple (Acer saccharum Marsh.). Ann For Res. 2015;58:193–204.View ArticleGoogle Scholar
- Tautz D. Hypervariability of simple sequences as a general source for polymorphic DNA markers. Nucleic Acids Res. 1989;17:6463–71.View ArticlePubMedPubMed CentralGoogle Scholar
- Ellis JR, Burke JM. EST-SSRs as a resource for population genetic analyses. Heredity. 2007;99(2):125–32.View ArticlePubMedGoogle Scholar
- Iverson LR, Prasad AM, Matthews SN, Peters M. Estimating potential habitat for 134 eastern US tree species under six climate scenarios. For Ecol Manag. 2008;254(3):390–406.View ArticleGoogle Scholar
- Lane T, Best T, Zembower N, Davitt J, Henry N, Xu Y, Koch J, Liang H, McGraw J, Schuster S, et al. The green ash transcriptome and identification of genes responding to abiotic and biotic stresses. BMC genomics. 2016;17:702.View ArticlePubMedPubMed CentralGoogle Scholar
- Chang S, Puryear J, Cairney J. A simple and efficient method for isolating RNA from pine trees. Plant Mol Biol Rep. 1993;11(2):113–6.View ArticleGoogle Scholar
- Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.View ArticlePubMedPubMed CentralGoogle Scholar
- Song L, Florea L. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads. GigaScience. 2015;4:48.View ArticlePubMedPubMed CentralGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng QD, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644.View ArticlePubMedPubMed CentralGoogle Scholar
- Fu LM, Niu BF, Zhu ZW, Wu ST, Li WZ. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150–2.View ArticlePubMedPubMed CentralGoogle Scholar
- Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8(8):1494–512.View ArticlePubMedGoogle Scholar
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST plus : architecture and applications. BMC Bioinform. 2009;10:421.View ArticleGoogle Scholar
- Magrane M, UniProt C. UniProt Knowledgebase: a hub of integrated protein data. Database. 2011.Google Scholar
- Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:W29–37.View ArticlePubMedPubMed CentralGoogle Scholar
- Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44(D1):D279–85.View ArticlePubMedGoogle Scholar
- Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2.View ArticlePubMedGoogle Scholar
- Smith-Unna R, Boursnell C, Patro R, Hibberd JM, Kelly S. TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res. 2016;26(8):1134–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG. Primer3-new capabilities and interfaces. Nucleic Acids Res. 2012;40(15):e115.View ArticlePubMedPubMed CentralGoogle Scholar
- Kubisiak TL, Nelson CD, Staton ME, Zhebentyayeva T, Smith C, Olukolu BA, Fang GC, Hebard FV, Anagnostakis S, Wheeler N, et al. A transcriptome-based genetic map of Chinese chestnut (Castanea mollissima) and identification of regions of segmental homology with peach (Prunus persica). Tree Genet Genomes. 2013;9(2):557–71.View ArticleGoogle Scholar
- Schuelke M. An economic method for the fluorescent labeling of PCR fragments. Nat Biotechnol. 2000;18(2):233–4.View ArticlePubMedGoogle Scholar
- Peakall R, Smouse PE. GENEALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Mol Ecol Notes. 2006;6:288–95.View ArticleGoogle Scholar
- Nei M. Analysis of gene diversity in subdivided populations. Proc Nat Acad Sci USA. 1973;70(12):3321–3.View ArticlePubMedPubMed CentralGoogle Scholar
- Raymond M, Rousset F. GENEPOP (Version 1.2): population genetics software for exact tests and ecumenicism. J Hered. 1995;86(3):248–9.View ArticleGoogle Scholar
- Rousset F. GENEPOP ‘ 007: a complete re-implementation of the GENEPOP software for Windows and Linux. Mol Ecol Resour. 2008;8(1):103–6.View ArticlePubMedGoogle Scholar
- Rice WR. Analyzing tables of statistical tests. Evolution. 1989;43(1):223–5.View ArticlePubMedGoogle Scholar
- Lind J, Gailing O. Genetic structure of Quercus rubra L. and Q. ellipsoidalis E. J. Hill populations at gene-based EST-SSR and nuclear SSR markers. Tree Genet Genomes. 2013;9:707–22.View ArticleGoogle Scholar
- Hamrick JL, Godt MJW, Sherman-Broyles SL. Factors influencing levels of genetic diversity in woody plant species. New For. 1992;6:95–124.View ArticleGoogle Scholar
- Eckert A, van Heerwaarden J, Wegrzyn J, Nelson C, Ross-Ibarra J, Gonzalez-Martinez S, Neale D. Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L. Pinaceae). Genetics. 2010;185:969–82.View ArticlePubMedPubMed CentralGoogle Scholar