An efficient method for developing SNP markers based on EST data combined with high resolution melting (HRM) analysis
© Ujino-Ihara et al; licensee BioMed Central Ltd. 2010
Received: 19 November 2009
Accepted: 2 March 2010
Published: 2 March 2010
In order to identify single nucleotide polymorphisms (SNPs) efficiently from a species with a large genome, SNPs were identified from an expressed sequence tag (EST) database combined with High Resolution Melting (HRM) analysis.
A total of 574 sequence tagged sites (STSs) were generated from Cryptomeria japonica and HRM analysis was used to screen for polymorphisms in these STS markers. STSs were designed in two ways: 1) putative SNP sites were identified by comparing ESTs from specific contigs, then 226 primer pairs designed for the purpose to amplify these SNPs; 2) 348 primer pairs were randomly designed using reads from the 3' end of cDNA. HRM analysis revealed that 325 markers among eight individuals were polymorphic, and that STSs, including putative SNP sites, exhibited higher levels of polymorphism.
Our results indicate that the combination of SNP screening from an EST database combined with HRM analysis is a highly efficient way to develop SNP markers for expressed genes. This method will contribute to both genetic mapping and the identification of SNPs in non-model organisms.
Sugi, Cryptomeria japonica, is one of the most important commercial tree species in Japan. Recently, linkage maps have been constructed for this species, based on co-dominant markers including RFLP (Restriction Fragment Length Polymorphism), CAPS (Cleaved Amplified Polymorphic Sequences), and SSR (Simple Sequence Repeat) markers [1–3]. Although these linkage maps include hundreds of markers, more DNA markers are required to generate a denser linkage map. Furthermore, some markers do not segregate in all crosses, thus increasing the number of markers can greatly enhance the scope and resolution of QTL analyses using the progenies of controlled crosses.
C. japonica has large genome (~10100 Mbp) , therefore full genomic sequences have not been obtained. On the other hand, because of the importance of this species to Japanese forestry, expressed sequence tags (ESTs) have been generated using several types of tissues from several individuals [5–7]. Redundant sequences found in EST databases can be a useful resource for mining SNPs or developing DNA markers, since mapping expressed genes to a linkage map makes the map more useful for QTL analysis or Marker-Assisted Selection (MAS). The discovery of SNPs in ESTs has been conducted for several species using programs such as PolyBayes  and QualitySNP . We used QualitySNP in our study because it allowed us to consider both paralogous genes and sequence errors without having a sequence quality file and reference sequences.
After mining putative SNPs from a database, methods to identify STS polymorphisms are required. HRM analysis is an efficient SNP detection method that identifies differences in the shapes of melting curves between different genotypes using an intercalating dye that binds to the double-stranded DNA . In short, the melting curve changes its shape due to mismatches in heteroduplexes or variation of nucleotides in homoduplexes that exist in the post-PCR mixture. HRM analysis has considerable advantages over other SNP detection methods because it only requires PCR products stained with specific dyes. As a result of these advantages, HRM analysis has gained popularity and applied to not only mutation screening of specific genes but genetic mapping in plant species [11–13]. Using HRM, SNPs have been developed in almond based on alignments of peach/almond ESTs obtained from public database . In this study, we report an efficient way to develop a large number of HRM markers for genetic mapping in non-model organisms by combining SNP mining from an EST database with HRM analysis, using publicly available software.
Polymorphisms of the STS markers were screened with eight C. japonica individuals. Six of the individuals were from natural populations, representing the species' natural distribution in Japan: Yakushima, Shimowada, Ishinomaki, Oki, Ajigasawa, and Ashu. Two further sources of parental material were used for mapping populations, YI38 and YI96. These two samples were derived from a local cultivar on Kyushu-island. Since SNP markers detected by QualitySNP may include spurious markers (because paralogous sequences may be assembled in a contig), but those that segregate in an expected manner are likely to be true SNPs, the validity of the HRM markers was tested by segregation analysis using the progeny of YI38 and YI96. DNA of all plant material was extracted using a modified CTAB method .
High Resolution Melting Analysis
PCR amplification was performed in 10 μl reaction volumes containing 200 μM of each dNTP, 0.8 units of GXL DNA polymerase (Takara Co. Ltd.), 1 × buffer provided for the GXL DNA polymerase that contained 1.0 mM MgCl2, 0.2 μM of each primer, 0.5 ul DMSO, 0.5 ul of EvaGreen, and 5 ng of template DNA (Fukuoka et al., personal communication). The PCR amplification was carried out for 10 min at 95°C, followed by 45 cycles of 40 sec at 94°C, 30 sec at 60°C, and 15 sec at 72°C. High resolution Melting Analysis was carried out using a Lightcycler 480 (Roche) according to the manufacturer's instructions. If the amplification product yield was low, we decreased the PCR annealing temperature to 56°C.
Results and Discussion
SNP discovery and the development of STS markers
A total of 55634 ESTs of Cryptomeria japonica were available on the NCBI EST database. After removing possible organelle and bacterial genes, 55530 ESTs were assembled into 10368 contigs and 13783 singlets. In the QualitySNP analysis, contigs with at least four ESTs were considered for SNP screening. Of the 10368 contigs, 3809 (3310919 bp) were screened and 1246 SNPs with high confidence were found in 314 contigs (8.2% of the total number of contigs screened). The overall SNP frequency was one SNP per 2657.2 bp for the 3809 screened contigs.
In the case of maritime pine , SNPs were also surveyed from EST data using Phred [20, 21], Phrap http://www.phrap.org, and PolyBayes. SNP abundance was estimated as one SNP per 660 bp in a set of 940 contigs representing 942216 bp. A similar study was also carried out for white spruce, and 12264 SNPs were found when 6459 contigs of at least two cDNA clones were surveyed with Polybayes . These authors estimated an overall frequency of one SNP per 700 nucleotide sites. Compared to previous studies, the proportion of contigs with a putative SNP site and the SNP frequency in C. japonica appears to be much lower. This may be the result of the different software used, the numbers of ESTs screened, or the number of individuals (or varieties) examined in the EST database. In our study, parameters used in EST clustering and SNP detection were severe to avoid assembling paralogous sequences into contigs; this may have also affected the SNP discovery rate.
For comparison, 412 primer pairs were also randomly designed from 3' ESTs (singlet-STS). These primer pairs were designed from 3' ESTs because previous studies have suggested that STSs derived from 3' ESTs are more polymorphic than STSs derived from 5' ESTs .
Summary of the development of HRM markers
Detection of nucleotide differences for STSs by HRM analysis
When SNPs were screened by CAPS analysis, 72.3% of STSs were polymorphic . The detection efficiency of CAPS is comparable to that of HRM analysis of contig-STSs, however 15 individuals were screened using 24 or 36 endonucleases in the CAPS analysis. The screening of CAPS markers takes more time and labor. In the SSCP analysis, 37.3% of tested STSs were polymorphic among 10 individuals with 12 different electrophoresis conditions . Therefore, if the SNP frequency was similar in both cases, HRM analysis is likely to be more sensitive than SSCP analysis. As described above, HRM analysis shows higher sensitivity even when they were analyzed under one experimental condition. Therefore HRM analysis is a more efficient way to detect SNPs in terms of the sensitivity, cost, time and labor in genomic mapping of non-model organisms. Changing experimental condition will further improve the detection efficiency. It is important to note that the panel individuals and individuals used in the cDNA library construction were not same. The absence of polymorphism in the HRM analysis was sometimes unrelated to the sensitivity of the technique, but due to the absence of the predicted SNPs in panel DNA.
We demonstrated here that using HRM analysis for converting expressed sequences to DNA markers is very useful for genetic mapping. Although there are SNP typing methods that have higher throughput than HRM analysis, HRM analysis can be alternative method for moderate scale genome project in aspects of time and cost. The method described here used mainly publicly available resources, therefore it is easily applicable to non-model organisms with large genome size, such as C. japonica.
This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research (B), 20380096 and 21380102, Grant-in-Aid (Development of Technologies for Control of Pollen Production by Genetic Engineering) from the Forest Agency of Japan, and Program for Promotion of Basic and Applied Researches for Innovations in Bio-oriented Industry (BRAIN). The authors would like to thank Dr Hiroyuki Fukuoka for providing the protocol for the HRM analysis and two anonymous reviewers for their valuable comments.
- Mukai Y, Suyama Y, Tsumura Y, Kawahara T, Yoshimaru H, Kondo T, Tomaru N, Kuramoto N, Murai M: A linkage map for sugi (Cryptomeria japonica) based on RFLP, RAPD, and isozyme loci. Theor Appl Genet. 1995, 90: 835-840. 10.1007/BF00222019.PubMedView Article
- Iwata H, Ujino-Ihara T, Yoshimaru H, Nagasaka K, Mukai Y, Tsumura Y: Cleaved amplified polymorphic sequence markers in sugi, Cryptomeria japonica D. Don, and their location on a linkage map. Theor Appl Genet. 2001, 103: 881-895. 10.1007/s001220100732.View Article
- Tani N, Takahashi T, Iwata H, Mukai Y, Ujino-Ihara T, Matsumoto A, Yoshimura K, Yoshimaru H, Murai M, Nagasaka K, Tsumura Y: A consensus linkage map for sugi (Cryptomeria japonica) from two pedigrees, based on microsatellites and expressed sequence tags. Genetics. 2003, 165: 1551-1568.PubMed CentralPubMed
- Hizume M, Kondo T, Shibata F, Ishizuka R: Flow cytometric determination of genome size in the Taxodiaceae, Cupressaceae sensu stricto, and Sciadopityaceae. Cytologia. 2001, 66: 307-311.View Article
- Ujino-Ihara T, Kanamori H, Yamane H, Taguchi Y, Namiki N, Mukai Y, Yoshimura K, Tsumura Y: Comparative analysis of expressed sequence tags of conifers and angiosperms reveals sequences specifically conserved in conifers. Plant Mol Biol. 2005, 59: 895-907. 10.1007/s11103-005-2080-y.PubMedView Article
- Yoshida K, Nishiguchi M, Futamura N, Nanjo T: Expressed sequence tags from Cryptomeria japonica sapwood during the drying process. Tree Physiol. 2007, 27: 1-9.PubMedView Article
- Futamura N, Totoki Y, Toyoda A, Igasaki T, Nanjo T, Seki M, Sakaki Y, Mari A, Shinozaki K, Shinohara K: Characterization of expressed sequence tags from a full-length enriched cDNA library of Cryptomeria japonica male strobili. BMC Genomics. 2008, 9: 383-10.1186/1471-2164-9-383.PubMed CentralPubMedView Article
- Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR: A general approach to single-nucleotide polymorphism discovery. Nat Genet. 1999, 23: 452-456. 10.1038/70570.PubMedView Article
- Tang J, Vosman B, Voorrips RE, Linden van der CG, Leunissen JA: QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species. BMC Bioinformatics. 2006, 7: 438-10.1186/1471-2105-7-438.PubMed CentralPubMedView Article
- Reed GH, Wittwer CT: Sensitivity and specificity of single-nucleotide polymorphism scanning by high-resolution melting analysis. Clin Chem. 2004, 50: 1748-54. 10.1373/clinchem.2003.029751.PubMedView Article
- Croxford AE, Rogers T, Caligari PD, Wilkinson MJ: High-resolution melt analysis to identify and map sequence-tagged site anchor points onto linkage maps: a white lupin (Lupinus albus) map as an exemplar. New Phytol. 2008, 180: 594-607. 10.1111/j.1469-8137.2008.02588.x.PubMedView Article
- Chagné D, Gasic K, Crowhurst RN, Han Y, Bassett HC, Bowatte DR, Lawrence TJ, Rikkerink EH, Gardiner SE, Korban SS: Development of a set of SNP markers present in expressed genes of the apple. Genomics. 2008, 22: 353-358. 10.1016/j.ygeno.2008.07.008.View Article
- Wu SB, Tavassolian I, Rabiei G, Hunt P, Wirthensohn M, Gibson JP, Ford CM, Sedgley M: apping SNP-anchored genes using high-resolution melting analysis in almond. Mol Genet Genomics. 2009, 282: M273-81. 10.1007/s00438-009-0464-4.View Article
- Wu SB, Wirthensohn M, Hunt P, Gibson J, Sedgley M: High resolution melting curve (HRM) analysis of almond SNPs derived from EST database. Theor Appl Genet. 2008, 118: 1-14. 10.1007/s00122-008-0870-8.PubMedView Article
- Huang X, Madan A: CAP3: A DNA Sequence Assembly Program. Genome Res. 1999, 9: 868-877. 10.1101/gr.9.9.868.PubMed CentralPubMedView Article
- Wheelan SJ, Church DM, Ostell JM: Spidey: a tool for mRNA-to-genomic alignments. Genome Res. 2001, 11: 1952-1957.PubMed CentralPubMed
- Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000, 132: 365-386.PubMed
- Tsumura Y, Yoshimura K, Tomaru N, Ohba K: Molecular phylogeny of conifers using RFLP analysis of PCR-amplified specific chloroplast genes. Theor Appl Genet. 1995, 91: 1222-1236. 10.1007/BF00220933.PubMedView Article
- Dantec LL, Chagné D, Pot D, Cantin O, Garnier-Géré P, Bedon F, Frigerio JM, Chaumeil P, Léger P, Garcia V, Laigret F, De Daruvar A, Plomion C: Automated SNP detection in expressed sequence tags: statistical considerations and application to maritime pine sequences. Plant Mol Biol. 2004, 54: 461-470. 10.1023/B:PLAN.0000036376.11710.6f.PubMedView Article
- Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8: 186-194.PubMedView Article
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-85.PubMedView Article
- Pavy N, Parsons LS, Paule C, MacKay J, Bousquet J: Automated SNP detection from a large collection of white spruce expressed sequences: contributing factorsand approaches for the categorization of SNPs. BMC Genomics. 2006, 7: 174-10.1186/1471-2164-7-174.PubMed CentralPubMedView Article
- Kado T, Yoshimaru H, Tsumura Y, Tachida H:DNA variation in a conifer, Cryptomeria japonica (Cupressaceae sensu lato). Genetics. 2003, 164: 1547-1559.PubMed CentralPubMed
- Ujino-Ihara T, Matsumuto A, Iwata H, Yoshimura K, Tsumura Y: Single-strand conformation polymorphism of sequence-tagged site markers based on partial sequences of cDNA clones in Cryptomeria japonica. Genes Genet Syst. 2002, 77: 251-257. 10.1266/ggs.77.251.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.