- Research article
- Open Access
Characterization of variable EST SSR markers for Norway spruce (Picea abies L.)
BMC Research Notes volume 4, Article number: 401 (2011)
Norway spruce is widely distributed across Europe and the predominant tree of the Alpine region. Fast growth and the fact that timber can be harvested cost-effectively in relatively young populations define its status as one of the economically most important tree species of Northern Europe. In this study, EST derived simple sequence repeat (SSR) markers were developed for the assessment of putative functional diversity in Austrian Norway spruce stands.
SSR sequences were identified by analyzing 14,022 publicly available EST sequences. Tri-nucleotide repeat motifs were most abundant in the data set followed by penta- and hexa-nucleotide repeats. Specific primer pairs were designed for sixty loci. Among these, 27 displayed polymorphism in a testing population of 16 P. abies individuals sampled across Austria and in an additional screening population of 96 P. abies individuals from two geographically distinct Austrian populations. Allele numbers per locus ranged from two to 17 with observed heterozygosity ranging from 0.075 to 0.99.
We have characterized variable EST SSR markers for Norway spruce detected in expressed genes. Due to their moderate to high degree of variability in the two tested screening populations, these newly developed SSR markers are well suited for the analysis of stress related functional variation present in Norway spruce populations.
Natural populations of Picea abies L. (Norway spruce) are found from north-western Europe outside permafrost areas down to northern Greece, westwards to the Massif Central (France) and east to the Ural Mountains. Picea abies is growing above 400-500 m and ascends close to 2000 m in the Alps. Studies on genetic variation based on allozymes have shown that Picea abies genetic differentiation among populations is rather low over its whole distribution range [1, 2]. Previous studies on the genetic structure of P. abies using organelle markers showed pronounced differentiation between north-east boreal origins and areas in the central European mountains [3, 4], supporting the hypothesis of two distinct main glacial refugia as postulated from pollen data .
Initial reports on the occurrence of SSRs in conifers such as Pinus radiata, Pinus sylvestris or Picea abies have shown that marker development for such complex genomes is difficult. Frequently several DNA fragments in addition to the expected ones are amplified when using primers flanking putative SSR regions. This can be attributed to the very large size of the conifer genomes, with Norway spruce having 39.140 Mb in the diploid genome (2n = 24) as well as the high proportion of repetitive DNA in the conifer genomes . These repetitive elements as well as pseudogenes frequently produce complex amplification products from multiple loci [10, 11]. To overcome this problem, Scotti et al.  showed that by isolating di-nucleotide SSRs from cDNA libraries of conifer species this problem can be overcome relying on the fact that expressed genes are less likely to be of repetitive nature in the genome.
Following this line of exploiting the advantages of publicly available data on expressed sequences (ESTs) as source for marker development  as described also by Rungis et al.  for conifers, we used Norway spruce EST sequences from NCBI for in silico identification of SSR regions which could serve for marker development.
Screening of 14,022 Norway Spruce EST sequences from the NCBI dbEST revealed 158 sequences containing various repeat motifs. Clustering of these sequences produced 92 SSR containing 'unigenes' which fell into 36 clusters and 56 singletons. In these 92 unigenes, 48 different repeat motifs were identified, with tri-nucleotide (50%), in particular (CGG)n and CAA)n, repeats being the most abundant, followed by penta- (23%) and hexanucleotide repeats (10.4%) respectively. Di- and tetranucleotide repeats were the least abundant with 8.3% each. Twenty-seven out of 60 planned primer pairs amplified polymorphic products in the testing population of 16 individuals, while two did not generate a fragment and 31 proved to be monomorphic [see Additional file 1]. Four of the 27 polymorphic SSR regions showing more than 2 alleles per locus (Pa_16, Pa_24, Pa_34 and Pa_40) in the testing population were excluded from further analysis. In a total of 96 individuals 135 alleles were detected in the 23 remaining loci (Table 1) ranging from 2 to 17 alleles per locus (Table 2) with a mean value of 5.6. Total number of effective alleles was and 50.645 with a minimum of 1.123, a maximum of 4.115 and a mean value of 2.202. 87% of the variable regions were based on trinucleotide repeats, the rest were pentanucleotide repeats. Di-, tetra and hexanucleotide repeats proofed to be not polymorphic. Loci Pa_41 and Pa_53 were not polymorphic in the Mayrhofen population. Values for expected heterozygosity ranged from 0.021 to 0.780 in the Gusswerk population (mean 0.367) and from 0.123 to 0.769 in the Mayrhofen population (mean 0.385). Observed heterozygosity was found to be between 0.021 and 1.000 in the Gusswerk population (mean 0.461) and between 0.042 and 1.000 in the Mayrhofen population (mean 0.515). Differences in observed heterozygosities between the two populations ranged from zero (Pa_44) to 0.894 (Pa_49). The minimum difference of expected heterozygosities between the two populations was found at locus Pa_12 (0.002) and the maximum at locus Pa_49 (0.382). Regarding the Gusswerk population, in eleven loci the observed heterozygosity was significantly higher and in three loci significantly lower than the expected value. In the Mayrhofen population we found thirteen loci with the observed heterozygosity significantly higher than the expected and in four loci it was significantly lower. Eleven loci showed significant departure from Hardy-Weinberg expectations (HWE, Guo and Thompson's exact test [P < 0.05]) within the 48 investigated individuals from the Gusswerk population. The number of loci with significant departure from HWE was higher in the Mayerhofen population (16 loci). Frequency of null alleles was variable from zero to 99.0% in the Gusswerk population and from zero to 85.4% in the Mayerhofen population. In both populations, null alleles were present with a high frequency (above 5%) in more than half of the loci deviating from HWE. Tests for linkage disequilibrium (P < 0.01) revealed no disequilibrium among pair wise compared loci. Fst ranged between 0 and 0.288 with a mean value of 0.033 (after application of the ENA correction described in Chapuis and Estoup (2007), Table 3). Repeat numbers of all polymorphic di- and pentanucleotide repeats added up to a multiple of three and therefore did not cause a frameshift. All polymorphic SSRs are lying within open reading frame sequences (ORF) as detected by GetORF . Fourteen sequences hit functionally annotated genes when compared to the NCBI nucleotide database [see Additional file 2]. The test for outlier loci revealed that Pa_51 (P = 1.000) and Pa_42 (P = 0.981) are under positive selection while the other loci are supposed to behave neutral (P < 0.95).
Several studies describe the development of SSR markers for Norway spruce [8, 12]. In the present work we used EST mining of sequences available in databases instead of classical approaches like screening of genomic libraries or fragment enrichment and subsequent cloning. The newly developed markers can be used to complete unsaturated maps, and might be useful in marker assisted breeding and population genetics.
The high number of loci deviating from Hardy-Weinberg equilibrium could partly be explained by the presence of null alleles (Table 3), partly be a result of sampling or selective pressure on the coding regions which is supported by the fact that loci deviating from HWE are not consistent between the two populations. Null alleles can occur due to mutations in primer binding sites and lead to the overestimation of homozygosity as shown by Callen . The presence of such null alleles as estimated in this study may further be confirmed by synthesis of alternative oligonucleotide primer pairs. Increased appearance of homozygotes in some loci supports the hypothesis of selective pressure which might be caused by advantages of recessive or dominant homozygotes known as directional selection. Positive selection was confirmed by outlier detection for two loci which show a high frequency of homozygotes. Both of them are located in coding regions - Pa_42 shows homology to a cysteine protease and Pa_51 is related to a cadmium transporting ATPase. These genes play important roles in various physiological processes, including response to biotic and abiotic stress [17, 18]. Therefore, additional data of both populations (e.g. environmental growth) conditions would be of interest. Especially locus Pa_51 may be an interesting target for diversity studies because departure from HWE was only detected in the Mayrhofen population. But although these two loci show considerable degrees of population differentiation, the low Fst values present in the other loci support previous findings [3, 4] that there is no major degree of differentiation among Alpine Norway spruce populations.
The new SSR markers characterized in this study are a good resource for assessing diversity in Alpine spruce populations and might be of further use for applications in forestry. Additionally, 31 monomorphic SSR loci were identified in this study. It is still possible to derive polymorphic data from them when screening Norway spruce populations from different origin, as it has already been shown that flanking regions of monomorphic SSRs show a moderate to high degree of variability . Possible variation within these flanking regions in Norway spruce could serve as alternative source for diversity studies applying different techniques for the identification of variability within these regions.
All variable SSRs are located within ORF sequences and 60% of them gave significant hits to functional genes present in GenBank. A frameshift caused by length variation within an ORF might lead to distortion in the coding region and thus to the abortion of the coded protein. In all of the newly identified EST-SSRs no frameshift is occurring because either, as in 50% of the cases, the SSRs are trinucleotide repeats or the repeat number of polymorphic di- and pentanucleotide repeats alters in such a way that the sum of the basepair variation is a multiple of three.
We have identified 92 new SSR regions with 48 different repeat motifs in the expressed part of the Norway spruce genome. In two screening populations of 48 individuals each, 23 SSR regions showed a moderate to high degree of polymorphism. Although a high number of SSR markers were already developed for Norway spruce, most of the available SSRs consist of dinucleotide repeats and do not include our tri- and pentanucleotide repeats. We have demonstrated that on the one hand genetic differentiation between distinct stands is low but on the other hand selective forces are likely to have been acting on several genes because deviations from HWE could not only be explained due to null alleles and test for outliers revealed two loci under positive selection. The results gave an insight into the abundant repeat classes and will be of use in analysis of variation linked to expressed genes in Alpine Norway spruce populations as well as evolutionary forces acting at these loci.
14,022 Norway Spruce EST sequences were extracted from the NCBI dbEST and screened for SSR sequences with the SciRoKo software program . Only perfect SSRs were searched in two iterations, one for mono- and dinucleotides (at least 4 repeats), and another one for tri-, tetra-, penta- and hexanucleotides (at least 3 repeats). SSR motif containing sequences were subjected to clustering and annotation using the EST2uni pipeline . For clustering, the default settings (30 bp minimum overlap with at least 94% identity for pairwise alignment using BLAST, 93% overlap identity cutoff for assembly using CAP3) were applied. The resulting Unigene Set (contigs and singletons) was compared against three protein databases with BLASTX, in particular NCBI NR, Arabidopsis TAIR7, and Uniprot/Swissprot, using an e-value of 1e-10 as cutoff. The descriptions of the BLAST hits obtained with the different BLAST runs were parsed and merged to yield a descriptive annotation for each unigene. The annotations were attributed with modifiers like "Similar to" or "Highly similar to", depending on the e-value of the alignment with the corresponding BLAST hit.
Plant material used: 16 samples from all over Austria (testing population) and two times 48 individuals from two distinct populations (Gusswerk, Styria [N 47° 44|, E 15° 18|]; Mayrhofen, Tyrol [N 47° 11', E 11° 52']; screening population). DNA was kindly provided by the Federal Research and Training Centre for Forests, Natural Hazards and Landscape (BFW).
60 primer pairs were designed using Primer3 web interface  with default settings. PCR reactions were performed in 25 μl volumes containing 50 ng genomic DNA, 1X PCR Buffer (QIAGEN), 1 mM MgCl2, 0.2 mM each dNTP, 0.3 μM FAM-labelled forward primer, 0.3 μM reverse primer and 0.625 U HotStar Taq polymerase (QIAGEN). The PCR thermal profile consisted of 15 min. initial denaturation at 95 °C followed by 35 cycles denaturation at 95 °C for 50 sec., a primer-specific annealing-temperature (Table 1) for 50 sec., extension at 72 °C for 105 sec., and a final extension at 72 °C for 10 minutes. Fragments amplified in 16 individuals from all over Austria (testing population) and 96 individuals from 2 distinct locations in Austria (screening population) were analyzed on an ABI 3100 Genetic Analyzer. Fragment sizes were extracted using PeakScanner 1.0 (Applied Biosystems, Foster City, California, USA) and allele calls were manually performed in Excel. The testing population was only used to test for polymorphic loci and was not included in further calculations. Number of alleles, effective number of alleles, observed and expected heterozygosities and deviation from Hardy-Weinberg equilibrium were calculated using Genepop 4.0 . Fixation indices and fixation indices excluding null alleles (ENA), as well as estimates of null alleles were calculated with FreeNA  and SSR presence in coding regions was analyzed with the GetORF software using default settings. Test for outlier loci were performed with LOSITAN  using an infinite allele model with 100.000 simulations, a confidence interval of 0.95 and a subsample size of 75.
Lagercrantz U, Ryman N: Genetic structure of Norway spruce (Picea abies): concordance of morphological and allozymic variation. Evolution. 1990, 44: 38-53. 10.2307/2409523.
Bergmann F, Ruetz W: Isozyme genetic variation and heterozygosity in random tree samples and selected orchard clones from the same Norway spruce populations. For Ecol Manage. 1991, 46 (1-2): 39-48. 10.1016/0378-1127(91)90243-O.
Vendramin GG, Anzidei M, Madaghiele A, Sperisen C, Bucci G: Chloroplast microsatellite analysis reveals the presence of population subdivision in Norway spruce (Picea abies K.). Genome. 2000, 43 (1): 68-78.
Sperisen C, Büchler U, Gugerli F, Mátyás G, Geburek T, Vendramin GG: Tandem repeats in plant mitochondrial genomes: application to the analysis of population differentiation in the conifer Norway spruce. Mol Ecol. 2001, 10 (1): 257-263. 10.1046/j.1365-294X.2001.01180.x.
Huntley B, Birks HJB: An Atlas of Past and Present Pollen Maps for Europe: 0-13000 years ago. 1983, Cambridge: Cambridge University Press
Smith DN, Devey ME: Occurrence and inheritance of microsatellites in Pinus radiata. Genome. 1994, 37: 977-983. 10.1139/g94-138.
Kostia S, Varvio SL, Vakkari P, Pulkkinen P: Microsatellite sequences in a conifer, Pinus sylvestris. Genome. 1995, 38: 1244-1248. 10.1139/g95-163.
Pfeiffer A, Olivieri AM, Morgante M: Identification and characterization of microsatellites in Norway spruce (Picea abies K.). Genome. 1994, 40 (4): 411-419.
Murray BG, Leitch IJ, Bennet MD: Gymnosperm DNA C-values database (release 4.0, Dec. 2010). 2010, [http://www.kew.org/cvalues/]
Echt CS, May-Marquardt P, Hseih M, Zahorchak R: Characterization of microsatellite markers in eastern white pine. Genome. 1996, 39: 1102-1108. 10.1139/g96-138.
Soranzo N, Provan J, Powell W: Characterisation of microsatellite loci in Pinus sylvestris L. Mol Ecol. 1998, 7: 1260-1261.
Scotti I, Magni F, Fink R, Powell W, Binelli G, Hedley PE: Microsatellite repeats are not randomly distributed within Norway spruce (Picea abies K.) expressed sequences. Genome. 2000, 43: 41-46.
Varshney RK, Thiel T, Stein N, Langridge P, Graner A: In silico analysis on frequency and distribution of microsatellites in ESTs of some cereal species. Cell Mol Biol Lett. 2002, 7: 537-546.
Rungis SD, Bérubé Y, Zhang J, Ralph S, Ritland CE, Ellis BE, Douglas C: Robust simple sequence markers for spruce (Picea spp.) from expressed sequence tags. Theor Appl Gen. 2004, 109: 1283-1294. 10.1007/s00122-004-1742-5.
Callen DF, Thompson AD, Shen Y, Pillips HA, Richards RI, Mulley JC, Sutherland GR: Incidence and Origin of "Null" Alleles in the (AC)n Microsatellite Markers. Am J Hum Genet. 1993, 52: 922-927.
Solomon M, Belenghi B, Delledonne M, Menachem E, Levine A: The involvement of cysteine proteases and protease inhibitor genes in the regulation of programmed cell death in plants. Plant Cell. 1999, 11: 431-444.
Courbot M, Willems G, Motte P, Arvidsson S, Roosens N, Saumitou-Laprade P, Verbruggen N: A major quantitative trait locus for cadmium tolerance in Arabidopsis halleri colocalizes with HMA4, a gene encoding a heavy metal ATPase. Plant Physiol. 2007, 144: 1052-106. 10.1104/pp.106.095133.
Chatrou LW, Escribano MB, Viruel MA, Maas JW, Richardson JE, Hormaza JI: Flanking regions of monomorphic microsatellite loci provide a new source of data for plant species-level phylogenetics. Mol Phylogenet Evol. 2009, 53: 726-733. 10.1016/j.ympev.2009.07.024.
Kofler RC, Schlötterer C, Lelley T: SciRoKo: A new tool for whole genome microsatellite search and investigation (version 3.4). [http://kofler.or.at/bioinformatics/SciRoKo/index.html]
Forment J, Gilabert F, Robles A, Conjero V, Nuez F, Blanca JM: EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration. [http://cichlid.umd.edu/est2uni/]
Rozen S, Skaletsky HJ: Primer3 on the WWW for general users and for biologist programmers (version 0.0.4). [http://frodo.wi.mit.edu/primer3/]
Raymond M, Rousset F: GENEPOP (version 1.2): population genetics software for exact tests and ecumenicism. J Hered. 1995, 86: 248-249.
Chapuis M-P, Estoup A: Microsatellite null alleles and estimation of population differentiation. Mol Biol Evol. 2007, 24: 621-631.
Beaumont MA, Nichols RA: Evaluating loci for use in the genetic analysis of population structure. P Roy Soc Lond B Bio. 1996, 263: 1619-1626. 10.1098/rspb.1996.0237.
The work was supported by the Austrian Federal Ministry of Agriculture, Forestry, Environment and Water Management and conducted in a cooperation project with the Federal Research and Training Centre for Forests, Natural Hazards and Landscape, Vienna, Austria.
The authors declare that they have no competing interests.
SF participated in the design of the study, carried out the primer design, SSR data analysis and allele calling and drafted the manuscript. AB carried out the PCR and the fragment analysis. AH carried out the population genetic analysis and helped to draft the manuscript. NS participated in the analysis of null alleles and FST values. DK carried out the computational detection of SSRs. All authors read and approved the final manuscript. GGV contributed part of the EST sequences and commented on the manuscript.
Electronic supplementary material
About this article
Cite this article
Fluch, S., Burg, A., Kopecky, D. et al. Characterization of variable EST SSR markers for Norway spruce (Picea abies L.). BMC Res Notes 4, 401 (2011) doi:10.1186/1756-0500-4-401
- Simple Sequence Repeat Marker
- Null Allele
- Trinucleotide Repeat
- Testing Population
- Hexanucleotide Repeat