Characterization and compilation of polymorphic simple sequence repeat (SSR) markers of peanut from public database

Background There are several reports describing thousands of SSR markers in the peanut (Arachis hypogaea L.) genome. There is a need to integrate various research reports of peanut DNA polymorphism into a single platform. Further, because of lack of uniformity in the labeling of these markers across the publications, there is some confusion on the identities of many markers. We describe below an effort to develop a central comprehensive database of polymorphic SSR markers in peanut. Findings We compiled 1,343 SSR markers as detecting polymorphism (14.5%) within a total of 9,274 markers. Amongst all polymorphic SSRs examined, we found that AG motif (36.5%) was the most abundant followed by AAG (12.1%), AAT (10.9%), and AT (10.3%).The mean length of SSR repeats in dinucleotide SSRs was significantly longer than that in trinucleotide SSRs. Dinucleotide SSRs showed higher polymorphism frequency for genomic SSRs when compared to trinucleotide SSRs, while for EST-SSRs, the frequency of polymorphic SSRs was higher in trinucleotide SSRs than in dinucleotide SSRs. The correlation of the length of SSR and the frequency of polymorphism revealed that the frequency of polymorphism was decreased as motif repeat number increased. Conclusions The assembled polymorphic SSRs would enhance the density of the existing genetic maps of peanut, which could also be a useful source of DNA markers suitable for high-throughput QTL mapping and marker-assisted selection in peanut improvement and thus would be of value to breeders.


Findings
Background Cultivated peanut (Arachis hypogaea L.) is among the most important legume crops and a valuable source of oil and protein. Grown on six continents, it is economically the second most important legume in the U.S. Peanuts are planted annually on about 22 million ha worldwide, with a production of 35 million tons (source: http://www.agrostats.com/world-statistic/world-peanut. html).
Peanut is a self-pollinated allotetraploid (2n = 4x = 40) crop with a large genome (2.8 Gbp). Unlike many other polyploid crop species, cultivated peanut is generally believed to be monophyletic in origin [1]. Thus, peanut germplasm exhibits far less molecular genetic variation than most other cultivated crops resulting in the detection of fewer DNA markers in this crop. Consequently, marker-assisted selection, an important tool now in the improvement of many crops, is yet to play a significant role in peanut breeding. Paucity of DNA markers has also resulted in inadequate understanding of the nature and evolution of the peanut genome.
During the past two decades, much effort has been made to develop genetic and genomic tools in cultivated peanut, such as construction of BAC libraries [2,3], cDNA libraries [4][5][6][7], genetic linkage maps [8][9][10][11][12][13][14][15][16][17][18], and development of DNA markers [19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36]. Among various molecular markers investigated so far, simple sequence repeats (SSR) have emerged as the preferred DNA marker system for conducting genetic and genomic studies in cultivated peanut [10,11,18,23,[26][27][28]32,33]. To date, nearly 10,000 SSRs have been identified by various research groups around the world. Initial development of SSR markers in peanut employed DNA fragments containing SSRs enriched from genomic libraries by using various SSR probes. Currently SSRs are increasingly developed through data mining of EST and BACend sequences. While there are 32 publications on peanut DNA markers so far, there is a need to analyze all existing SSR markers in peanut to develop a central database of polymorphic SSRs with unambiguous labels gleaned from published literature and the public genome database. Such a comprehensive review of polymorphic SSRs would help to advance peanut research and improvement as it would provide an overall snapshot of all existing DNA markers as well as those that are polymorphic. Further, there is considerable interest among peanut breeders to introduce useful genes from wild species to improve genetic diversity using markerassisted selection using polymorphic markers.

Methods
Information on publicly available peanut SSRs was collected by scanning scientific publications. Based on sequence similarity search with legacy Arachis SSR primer sequences, redundant primer sequences were detected by BLAST with an E-value cut off of 1e -20 . DNA sequences containing polymorphic SSRs were re-searched for motif and repeat number using SSRIT software. Polymorphic SSRs as well as their polymorphism information content (PIC) values were collected from original and cited publications, or determined by laboratory testing for polymorphism using a panel of cultivated peanut genotypes by the authors. These eight cultivated varieties viz., Tifrunner, GT-C20, SunOleic 97R, NC94022, Yue you 92, Xin Hui Xiao Li, D99, and H22 are also parental genotypes of four mapping populations. Genomic DNAs were extracted from these genotypes using MasterPure Plant Leaf DNA Purification Kit (Epicentre, Madison, WI). The PCR program was subject to 94°C/3 min for initial denaturation, followed by 35 cycles of 94°C/30 sce, 55°C/30 sec, and 72°C/30 sec, and 72°C/5 min for final extension. PCR products were resolved in polyacrylamide gel in LI-COR 4300 DNA Analyzer (LI-COR, Lincoln, NA). All polymorphic SSRs were listed in the Microsoft Excel file as a reference and GenBank accession numbers were included wherever available in order to track their original flanking sequences by hyperlink. SSRs mapped in published genetic linkage maps were highlighted by authors' name. Resources of species and DNA domains from which SSRs were identified were also shown to indicate genomic and EST-SSRs, or cultivated and wild species SSRs.

Findings
Redundancy of SSRs developed from different research groups along with the use of non-uniform marker names have resulted in duplicate genotyping of peanut germplasm and inefficient use of resources in peanut genomics. Therefore, there is a need for central depository of informative SSR markers for peanut including all published markers but without redundancy by employing unique and unambiguous marker names. We have attempted to develop such a set of polymorphic SSR markers in peanut. The total number of SSRs reported to date in both cultivated and wild peanut species from the published literature was 9,274 (Table 1). From these, we identified 1,343 SSR markers (14.5% of the total) that detected variation within peanut germplasm. We further analyzed these polymorphic SSRs to gain insights into their nature and frequency. All published SSRs were summarized in Table 1, which shows the source, name, and numbers of developed, polymorphic and mapped SSRs. The length of most sequences was ranged from 100 to 500 bp. Assuming the average length of SSR containing sequences is 250 bp, these SSRs would contain 2.3 Mbp which corresponds to 0.083% of the peanut genome (2,800 Mbp). Among these SSRs, 5,946 were EST-SSRs and 3,328 were genomic SSRs, from which 603 and 740 were confirmed to be polymorphic at frequencies of 10.1% and 22.2% from EST and genomic sequences, respectively.
Additional file 1 provides descriptive information on the polymorphic SSR markers. This file contains other informations, such as, marker name, primer name, alternative name, and GenBank accession numbers where they were available. These polymorphic SSRs were identified by various research groups around the world and often employed different names to denote the same SSR marker. In some instances, two different markers have very similar names, adding to the confusion; for example Ah-xx developed by [29] and Ahxx by [37], sound similar but are from different citations. Some markers having unique names such as marker IPAHMxx and XIPxx, are in fact the same markers but can be easily mistaken as different markers. Further, some marker names and their primer names are often referred to as if they are different markers, such as marker name Ah1TC3A12 with primer name TC3A12, both of which could be mapped on the same genetic linkage map. In the Additional file 1, we present a list of such redundant markers in effort to eliminate duplicate naming of markers. All polymorphic SSR markers listed in the Additional file 1 provide clear information of their source, origin and nature. We believe that such a snapshot of information on all the available polymorphic SSRs in peanut will serve as a useful resource for highthroughput genotyping by array-based platforms in QTL mapping and marker-assisted selection in peanut breeding.
Among 1,343 polymorphic SSR markers, dinucleotide and trinucleotide motifs were the most predominant and a few were the others. The predominant 1,508 di-and tri-numcleotide motifs were identified and sorted as EST-SSRs or genomic SSRs (Table 2). EST sequences harbored 597 SSR motifs in which motifs AAG (21.1%) and AG (20.9%) were most abundant. Genomic SSRs had 911 motifs where motif AG was the most abundant Table 2 Distribution of various types of motifs in polymorphic EST-SSRs and genomic-SSRs (comprising 46.7%) followed by motifs AT (13.6%), AC (12.3%), and AAT (12.0%). The detection of such higher percentage of motif AG in genomic sequences might be because of the bias stemming from the use of dinucleotide SSRs as probes, such as (AG)n in the enrichment approach for identification of SSRs in the peanut genome [26]. Interestingly, with one exception no EST-SSR or genomic SSR with motif CG was detected polymorphic. A similar result was reported by Moretzsohn et al. 2005 [10]. In total, motif AG (43.9%) was the most polymorphic and frequent SSR marker type derived from both EST-SSRs and genomic SSRs, followed by AT (12.4%), AAT (11.1%), AC (10.6%), and AAG (9.1%). These are also the motifs that are generally most abundant in the peanut genome [18,28], while in the soybean genome, motifs AT, AAT, and AAAT were the most abundant after searching whole genome sequences [39].
Comparison of the length of SSRs revealed that the mean length of dinucleotide SSRs was significantly longer than those in trinucleotide SSRs for EST-SSRs and genomic SSRs, respectively (t = 12.48 and t = 8.79, p < 0.0001) ( Table 3). This finding was consistent with observation in barley [40], sugarcane [41] and soybean [42]. As the frequency of polymorphism was compared, SSRs derived from genomic sequences was significantly higher than EST-SSRs in dinucleotide SSRs, but was lower than in trinucleotide SSRs using Fisher's exact test (P < 0.0001).
Many studies have reported that SSRs with longer repeat length are more polymorphic in plant species [10,18,43,44]. In this study, longer mean length of SSR repeat was found in dinucleotide SSRs, but they exhibited higher polymorphism frequencies as trinucleotide SSRs in EST-SSRs. This may be due to changes of dinucleotide repeat length in exons that are likely to be suppressed due to the deleterious nature of the frameshift mutation that would frequently result in translated regions [42,45]. Expansion or contraction of SSR repeat length can occur because of replication slippage which is considered as one of the main reasons for SSR mutations. SSR instability is also dependent on motif size, nucleotide content and SSR length [46].
The relationship between the length of an SSR and the frequency of polymorphism was also analyzed by comparing the repeat number of SSRs and the number of polymorphic SSRs (Figure 1). In general, as the repeat number increased, the number of polymorphic SSRs decreased for both dinucleotide and trinucleotide SSRs. The correlation coefficient of the number of polymorphic SSRs with the number of repeat was −0.945 and −0.661 in dinucleotide and trinucleotide SSRs, respectively. In dinucleotide SSRs, repeat number from 5 to 23 (the length between 10 to 46 bp) displayed higher frequencies of polymorphism, i.e. more than 25 polymorphic SSRs for each repeat number within above range. At higher repeat numbers, the frequency dropped to less than 20 polymorphic SSRs. However, in trinucleotide SSRs, repeat number between 4 to 9 (12-27 bp) exhibited more than 40 polymorphic SSRs. At repeat number of more than 10 (30 bp), the number of polymorphic SSRs steeply dropped to less than 25. The peak of frequency of polymorphism for the respective repeat numbers did not always follow the same pattern. For instance, the higher frequency of polymorphism only occurred for repeat number between 4 to 7 (12-21 bp) for motif AAG, while the highest frequency of polymorphism for motif AAT occurred for repeat number 5-6 (15-18 bp), and 14-20 (42-60 bp) in trinucleotide SSRs. Nevertheless, the distribution of polymorphic SSRs among the different repeat numbers was generally skewed to the smaller number of repeats. This might be simply because SSRs with fewer repeats have been identified in high frequencies than those with larger repeat numbers. In common bean, similar result was reported that the number of SSRs was reduced as the repeat number increased in Blair et al. [47].
Temnykh et al. [48] provide a threshold number for short and long SSRs, where the length of SSR greater than 20 bp is considered as long SSR, named "class I"; while those less than 20 bp are considered short SSR, named "class II". Using this criterion, we found 534 SSRs as longer than 20 bp (class I) while 302 SSRs in the short length range (class II) in dinucleotide SSRs. From this point, longer SSRs are more polymorphic than short SSRs although the length of SSR is highly negative correlated with the frequency of polymorphism (correlation coefficient of −0.945). However, in trinucleotide SSRs, the number of long SSRs (333) was similar to short SSRs (339). When considering both dinucleotide and trinucleotide SSRs together, the longer SSRs (867) is indeed greater than short SSRs (635), which is consistent with many previous reports [10,18,43,44].
Increasing availability, affordability and accessibility of molecular markers are facilitating the development of genetic linkage maps in all major crops. Although the first peanut genetic linkage map was reported by [8] using RFLP markers in a wild species x wild species population, no genetic linkage map was developed for cultivated x cultivated peanut until 15 years later when considerable numbers of SSR markers were available. While SSRs have become increasingly important tools for molecular genetic analysis, another potentially useful and widely used marker, Single Nucleotide Polymorphism (SNP), has not been developed yet in peanut.
To date, seven genetic linkage maps have been published for cultivated x cultivated populations using SSR markers [12][13][14][16][17][18]49]. Among the 1,343 polymorphic SSRs that we assembled, 593 were mapped in these seven maps (Table 1; Additional file 1). When these maps were constructed, the total available polymorphic SSR markers numbered about six hundred. Therefore, the range of mapped SSR loci in these genetic maps was only from 131 to 324, and these maps still need to be saturated by adding more markers for further molecular research, such as QTL mapping, map-based cloning, and marker-assisted selection in peanut breeding. With a total of 1,343 polymorphic markers available now, including recently generated BAC-end sequence SSRs [18], EST-SSRs [7], and genomic SSRs [36], we presume that construction of a higher density genetic linkage map with 500 SSR loci in the cultivated peanut is feasible. Molecular markers are frequently polymorphic in one population, but monomorphic in another. Among the seven genetic linkage maps in cultivated peanut, two maps were constructed using mapping populations from China, three from India, and two from the USA. Some of these informative SSR markers detected polymorphism only in one of three regional populations, but not others, indicating that there is genetic variation between regional populations presumably due to differences in their lineages. However, there were still 45 SSR markers which consistently detected polymorphism across all regional populations of peanuts from China, India and USA. These SSR markers thus may represent the most variable markers so far detected within the peanut genome and corresponding to frequent mutant loci in this crop.

Conclusions
From an analysis of published literature revealing a total of 9,274 SSR DNA markers in peanut, we identified 1,343 markers detecting polymorphism. The information from such a comprehensive database of polymorphic SSR markers not only facilitates better understanding the nature of SSRs in the peanut genome, but also provides a useful source for conducting additional genetic and genomic studies to improve this crop.