Improved haplotype-based detection of ongoing selective sweeps towards an application in Arabidopsis thaliana
© Günther et al; licensee BioMed Central Ltd. 2011
Received: 11 February 2011
Accepted: 5 July 2011
Published: 5 July 2011
The increasing amount of genome information allows us to address various questions regarding the molecular evolution and population genetics of different species. Such genome-wide data sets including thousands of individuals genotyped at hundreds of thousands of markers require time-efficient and powerful analysis methods. Demography and sampling introduce a bias into present population genetic tests of natural selection, which may confound results. Thus, a modification of test statistics is necessary to introduce time-efficient and unbiased analysis methods.
We present an improved haplotype-based test of selective sweeps in samples of unequally related individuals. For this purpose, we modified existing tests by weighting the contribution of each individual based on its uniqueness in the entire sample. In contrast to previous tests, this modified test is feasible even for large genome-wide data sets of multiple individuals. We utilize coalescent simulations to estimate the sensitivity of such haplotype-based test statistics to complex demographic scenarios, such as population structure, population growth and bottlenecks. The analysis of empirical data from humans reveals different results compared to previous tests. Additionally, we show that our statistic is applicable to empirical data from Arabidopsis thaliana. Overall, the modified test leads to a slight but significant increase of power to detect selective sweeps among all demographic scenarios.
The concept of this modification might be applied to other statistics in population genetics to reduce the intrinsic bias of demography and sampling. Additionally, the combination of different test statistics may further improve the performance of tests for natural selection.
The recent advent of genome-wide surveys of genetic variation provides the opportunity to study genome-wide patterns of selection in model species. Such genome-wide scans detected new candidate regions for positive selection as well as previously identified target genes for selection, which included the lactase gene in European humans  or FRIGIDA in Arabidopsis thaliana.
Based on the assumption that the frequency of a new advantageous allele increases rapidly and that extended linkage disequilibrium (LD) around the selected site is expected [3, 4], several tests for selective sweeps were designed in the last years [1, 2, 5–9]. The power of detecting selection with these haplotype-based tests was estimated to be higher than with frequency-based statistics as Tajima's D. Although it is known that demographic history may cause a similar departure from the neutral model than selective sweeps and that test statistics are highly sensitive to these scenarios [10–14], only the pairwise haplotype sharing score (PHS, ) corrects for demographic history and relatedness. Unfortunately, because of pairwise comparisons between individuals for each allele, calculating the PHS has a complexity of O(n2) and is infeasible for large present and future data sets. However, since demography and unequally related individuals introduce a bias and potentially cause flawed results in sweep detection, a correction is required. Population structure also confounds genome-wide association studies and several approaches were developed to circumvent these problems . The ideal sample for an association study as well as for scans for selective sweeps consists of equally related individuals with a star-like phylogeny. For samples from natural populations this assumption is unrealistic.
In order to correct for demographic effects in haplotype-based detection of ongoing selective sweeps, we modified the integrated haplotype score (iHS) statistic introduced by Voight et al.  by weighting the contribution of each individual according to its genetic similarity to all other individuals in the sample. Closely related individuals generally share more alleles and haplotypes because of common ancestry. The concept of weighting to account for an unequally related sample is already established in other fields of evolutionary analysis. It was introduced as branch-proportional sequence weighting in the construction of sequence profiles from homologous proteins  and also has been shown to improve the accuracy of multiple sequence alignments in CLUSTALW . Here, we describe the weighted iHS (WiHS) method as an improved test statistic to detect ongoing selective sweeps. We utilize coalescent simulations of different complex demographic scenarios to estimate the detection power and the false discovery rate of the new method and compare it to existing methods. Finally, we apply the modified test statistic to empirical data from Arabidopsis thaliana and humans.
Materials and methods
Test statistic to detect selective sweeps
The new test statistic is based on the integrated haplotype score (iHS, ). The iHS is derived from the extended haplotype homozygosity (EHH, ) and assumes that selected haplotypes will be longer than the haplotypes around non-selected alleles in the same region because of hitchhiking of linked variation with the selected mutation. The EHH is defined as the probability that two haplotypes with the same core allele at position x are identical over the complete interval between the core site and a position y. The original EHH considers all individuals as equally weighted in the computation of the score.
where h is a set of individuals carrying the same haplotype between x and y, H is the set of all haplotypes, m is the number of individuals carrying the same core allele at position x and n is the total sample size. For the classical EHH calculation, ΣU(I i ) is replaced by the constant 1.
where mean f is the mean score of all sites with the frequency f and SD f is the associated standard deviation.
Python scripts used for the tests are available from http://evoplant.uni-hohenheim.de
Simulation of selective sweeps
Parameters for the msms simulations
Population scaled mutation rate (per site)
6 · 10-3
Population scaled recombination rate (per site)
8 · 10-4
Effective population size
Number of sampled SNPs
To assess whether high scoring SNPs cluster around the selected site, the absolute values of the scores were averaged in a window of ±25 SNPs around the selected site. These values were then used as final test statistic and compared to a null distribution estimated from neutral simulations of the panmictic model.
Application to empirical data sets
We applied our new test statistic to two empirical data sets. The first data set was HapMap 2  of the East Asian (JPT+CHB), European (CEU) and Yoruba (YRI) populations consisting of 120 chromosomes from each population. We included all SNPs for which an ancestral state was available from dbSNP 130 . The estimated recombination rates were downloaded from the HapMap project and a polynomial curve was fitted to the markers for conversion between physical and genetic distances. Additionally, we analyzed SNP data from 199 A. thaliana accessions genotyped at approximately 220,000 SNP sites . The alleles were polarized using the genome of the related species Arabidopsis lyrata. For conversion from physical to genetic distances, we fitted a polynomial curve to 253 markers, for which physical and genetic positions are known . All gene annotations were obtained from TAIR version 8 .
Comparison of sweep statistics
A comparison of the iHS and WiHS tests shows that WiHS performs better than iHS for allele frequencies > 40% even in panmictic populations. As the power itself is based on a single stringent threshold for the test score based on a significance level, we compared the normalized test scores between iHS and WiHS directly and found that WiHS assigns higher absolute scores to the SNPs surrounding the selected site (pairwise Wilcoxon-test, p < 10-15). The scores around neutral sites are essentially identical for both tests (Additional File 1 Figure s1), which is expected for normalized scores. Thus, this difference demonstrates a better performance of WiHS in the detection of selective sweeps. While the absolute power decreased for selection weaker than 2N e s = 200 (Figure 4), a difference between iHS and WiHS was still observed and significant (pairwise Wilcoxon-test; p < 10-6, p < 10-10 and p < 10-15 for 2N e s = 50, 2N e s = 100 and 2N e s = 150, respectively). This difference is a consequence of the sampling process, because it is impossible to sample genetically equidistant individuals and therefore even random samples of a panmictic population exhibit a certain degree of structure. The weighting corrects for this bias and improves the power of selection tests.
Performance under different demographic scenarios
Additionally, a model of exponential population growth followed by a constant population size was simulated (Figure 1B). The model by  resembles the population history of European A. thaliana accessions. Therefore, we regard these simulations as a test case for the analysis of empirical data from A. thaliana. Compared to the panmictic model, the detection power was decreased by more than 20% (Figure 4). Nevertheless, WiHS had a power 5.5% higher than the power of iHS and the scores around the selected site were significantly higher for WiHS (pairwise Wilcoxon-test; p < 10-9). For the bottleneck model, a previously panmictic population was reduced to one fifth of its size with a later recovery to the original population size (Figure 1C). The bottleneck led to the strongest decrease in detection power (Figure 4), but WiHS still performed better and scored the SNPs in the sweep region higher (pairwise Wilcoxon-test; p < 10-8). For models with a non-constant population size, which is the case in the growth and bottleneck model, msms requires a defined start time of the selective sweep. The sweeps were initiated directly before the bottleneck or the start of population growth for the simulations above. Simulating different starts for the sweep showed no trend in the relation between time and detection power in both scenarios (Additional File 1 Figures s3, s4).
Selective sweeps in the HapMap data
Ranking of previously reported candidate genes in Human HapMap2 data
ITGB4BP, CEP2, SPAG4
CHST5, ADAT1, KARS
ITGB4BP, CEP2, SPAG4
ITGB4BP, CEP2, SPAG4
CHST5, ADAT1, KARS
Selective sweeps in the A. thaliana data
In addition, a more detailed look was taken at the genes among the top 6 windows (Figure 6). The top ranked window overlaps with a region on chromosome 3 that was previously suggested as a sweep candidate . This window includes ARR5, a gene involved in the cytokin signaling pathway, whose mutant shows a reduced rosette size and an increased sensitivity to red light. The windows ranked second and third contain FKF1, an F-box protein which is involved in the regulation of flower development and response to blue light, and ANN5, which is contributing in the response to heat, cold, salt stress, red light and water deprivation. Finally, the fourth and sixth ranked regions on chromosome 4 comprise LUG1, a regulator of AGAMOUS involved in the flower development.
Detection of selective sweeps
Even in unstructured populations, sampling and relatedness introduce a bias into the sample. We improved the accuracy of detecting selective sweeps with haplotype based methods by weighting the contribution of each individual to the statistic according to its uniqueness in the sample. The improvement was observed in all simulated demographic scenarios including a panmictic population, a model of two subpopulations, exponential population growth and a population bottleneck. The increase of detection power of WiHS compared to iHS was less than expected but significant, reaching a maximum of 6.5%, 1%, 5.5% and 1.5% in the panmictic, island, population growth and bottleneck models, respectively. Simulation of different models and different model parameters, such as more severe bottlenecks, may give different results than the simulations in this study. The highest improvement was achieved in panmictic and growing populations. As the latter scenario was previously fitted to European accessions of A. thaliana, our improvement can result in additional sweep candidates for this species. While the detection power decreased in the more complex models, there was no significant increase of FDR if the sample was incorrectly assumed to arise from a panmictic population. As iHS and WiHS are genome-wide normalized scores, an excess of extreme scores and false positives under different demographic models is avoided.
The presented approach corrects for genome-wide IBD by upweighting more unique individuals in the sample. Since selective sweeps generate locally elevated IBD, which was suggested as a test for selection , one could also think of an opposite weighting based on local IBD. Local weighting would require the calculation of an IBD matrix for every single region, causing numerous pairwise comparisons between individuals and inflating the running time, which is beyond the scope of this paper.
Our simulation results extend the findings from previous studies for other test statistics [13, 14, 36–38] and show that haplotype-based tests are sensitive to demographic scenarios such as population structure and exponential growth. To identify candidates for selective sweeps, the search for outlier regions is commonly used, although they may represent the outliers of a neutral distribution . Therefore, additional validation using tests based on other characters than haplotype length, such as site frequency spectrum [28, 39–44] or population differentiation [44–46], will increase the reliability of sweep detections. Recently, compositions of different statistics have been shown to perform better in the detection of causal variants than each statistic separately [47–50] and the WiHS statistic might be included in such composite approaches as well and lead to a further improvement of these methods.
Recent selection in empirical data sets
The analysis of empirical data sets provides an insight into the effect of the modification under real conditions. Among the top scoring windows of the HapMap data, some prominent candidate regions were found, such as LCT for lactose metabolism, TYRP1 for skin pigmentation and SPAG4 for sperm motility. Most but not all of these genes ranked better by WiHS, so we found only a weak significance. We are aware of the fact that some of these genes represent only candidates for positive selection that have not been validated. The trend suggests that general long-haplotype pattern in these regions is better detected by the WiHS and it is still possible that the ranking generated by WiHS is more accurate in the identification of selective sweeps.
The A. thaliana results revealed some promising candidates for selective sweeps. As the windows are still quite big, looking for particular candidate genes in these regions remains some kind of fishing in murky waters. Therefore, we leave the identification of sweep candidates to further studies, which employ a combination of different tests and use a more precise estimation of the genetic map. However, the simulations and the detection of some interesting genes in our preliminary scan suggest that WiHS is useful for the detection of selective sweeps in A. thaliana.
Next-generation sequencing projects will provide sufficiently large data sets for the genome-wide detection of natural selection in many species (e.g. 1000genomes.org, 1001genomes.org, The Drosophila Genetic Resource Panel). The upcoming flood of data demands for time efficient and accurate analysis methods. Several methods operate with an equal contribution of individuals, which means that all individuals in the sample are assumed to be statistically independent. As it is very likely that not all pairs of individuals share the same most recent common ancestor, the assumption of independence should be violated in most biological samples. Thereby, unequally related individuals introduce a minor but significant bias into analyses, because the contribution of closely related individuals is overestimated while the contribution of others is underestimated. Such bias may be increased by demographic history and population structure. Genome-wide marker data allow to assess the relationship between individuals. This information can be used to cope with the dependency and to reduce the bias in estimates by differentially weighting the contribution of each individual. This concept could be extended to other unweighted statistics in population genetics. The consistent improvement across all simulated scenarios shows the general positive effect of differential weighting. Nevertheless, the slight increase of power leaves room for further improvement in the calculation of weights for each individual and the incorporation of these weights in test statistics, and for the detection of selective sweeps in general.
We thank the High Performance Computing Center Stuttgart (HLRS) for assistance with the bwGRID, and Sariel Hübner and Inka Gawenda as well as two anonymous reviewers for discussion and comments on the manuscript. This work was funded by a Volkswagen Foundation Evolutionary Biology scholarship to TG (I/84 225).
- Voight BF, Kudaravalli S, Wen X, Pritchard JK: A map of recent positive selection in the human genome. PLoS Biol. 2006, 4: e72-10.1371/journal.pbio.0040072.PubMedPubMed CentralView ArticleGoogle Scholar
- Toomajian C, Hu TT, Aranzana MJ, Lister C, Tang C, Zheng H, Zhao K, Calabrese P, Dean C, Nordborg M: A nonparametric test reveals selection for rapid flowering in the Arabidopsis genome. PLoS Biol. 2006, 4: e137-10.1371/journal.pbio.0040137.PubMedPubMed CentralView ArticleGoogle Scholar
- Hudson RR, Bailey K, Skarecky D, Kwiatowski J, Ayala FJ: Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster. Genetics. 1994, 136: 1329-1340.PubMedPubMed CentralGoogle Scholar
- Sabeti PC, Reich DE, Higgins JM, Levine HZP, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES: Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002, 419: 832-837. 10.1038/nature01140.PubMedView ArticleGoogle Scholar
- Wang ET, Kodama G, Baldi P, Moyzis RK: Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci USA. 2006, 103: 135-140. 10.1073/pnas.0509691102.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang C, Bailey DK, Awad T, Liu G, Xing G, Cao M, Valmeekam V, Retief J, Matsuzaki H, Taub M, Seielstad M, Kennedy GC: A whole genome long-range haplotype (WGLRH) test for detecting imprints of positive selection in human populations. Bioinformatics. 2006, 22: 2122-2128. 10.1093/bioinformatics/btl365.PubMedView ArticleGoogle Scholar
- Kimura R, Fujimoto A, Tokunaga K, Ohashi J: A practical genome scan for population-specific strong selective sweeps that have reached flxation. PLoS ONE. 2007, 2: e286-10.1371/journal.pone.0000286.PubMedPubMed CentralView ArticleGoogle Scholar
- Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, Consortium IH, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MMY, Tsui SKW, Xue H, Wong JTF, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PKH, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PIW, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Sham PC, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Johnson TA, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CDM, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Yakub I, Birren BW, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J: Genome-wide detection and characterization of positive selection in human populations. Nature. 2007, 449: 913-918. 10.1038/nature06250.PubMedPubMed CentralView ArticleGoogle Scholar
- Tang K, Thornton KR, Stoneking M: A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome. PLoS Biol. 2007, 5: e171-10.1371/journal.pbio.0050171.PubMedPubMed CentralView ArticleGoogle Scholar
- Wakeley J, Aliacar N: Gene genealogies in a metapopulation. Genetics. 2001, 159: 893-905.PubMedPubMed CentralGoogle Scholar
- Przeworski M: The signature of positive selection at randomly chosen loci. Genetics. 2002, 160: 1179-1189.PubMedPubMed CentralGoogle Scholar
- Schmid KJ, Ramos-Onsins S, Ringys-Beckstein H, Weisshaar B, Mitchell-Olds T: A multilocus sequence survey in Arabidopsis thaliana reveals a genome-wide departure from a neutral model of DNA sequence polymorphism. Genetics. 2005, 169 (3): 1601-15.PubMedPubMed CentralView ArticleGoogle Scholar
- Teshima KM, Coop G, Przeworski M: How reliable are empirical genomic scans for selective sweeps?. Genome Res. 2006, 16: 702-712. 10.1101/gr.5105206.PubMedPubMed CentralView ArticleGoogle Scholar
- Zeng K, Mano S, Shi S, Wu CI: Comparisons of site- and haplotype-frequency methods for detecting positive selection. Molecular Biology and Evolution. 2007, 24 (7): 1562-74. 10.1093/molbev/msm078.PubMedView ArticleGoogle Scholar
- Price AL, Zaitlen NA, Reich D, Patterson N: New approaches to population stratification in genome-wide association studies. Nature reviews. Genetics. 2010, 11 (7): 459-463.PubMedPubMed CentralView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994, 10: 19-29.PubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMedPubMed CentralView ArticleGoogle Scholar
- Kang HM, Zaitlen Na, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E: Efficient control of population structure in model organism association mapping. Genetics. 2008, 178 (3): 1709-23. 10.1534/genetics.107.080101.PubMedPubMed CentralView ArticleGoogle Scholar
- Ewing G, Hermisson J: MSMS: A Coalescent Simulation Program Including Recombination, Demographic Structure, and Selection at a Single Locus. Bioinformatics. 2010, 26 (16): 2064-2065. 10.1093/bioinformatics/btq322.PubMedPubMed CentralView ArticleGoogle Scholar
- Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT, Jiang R, Muliyati NW, Zhang X, Amer MA, Baxter I, Brachi B, Chory J, Dean C, Debieu M, de Meaux J, Ecker JR, Faure N, Kniskern JM, Jones JDG, Michael T, Nemri A, Roux F, Salt DE, Tang C, Todesco M, Traw MB, Weigel D, Marjoram P, Borevitz JO, Bergelson J, Nordborg M: Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010, 465 (7298): 627-31. 10.1038/nature08800.PubMedPubMed CentralView ArticleGoogle Scholar
- Nordborg M, Hu TT, Ishino Y, Jhaveri J, Toomajian C, Zheng H, Bakker E, Calabrese P, Gladstone J, Goyal R, Jakobsson M, Kim S, Morozov Y, Padhukasahasram B, Plagnol V, Rosenberg NA, Shah C, Wall JD, Wang J, Zhao K, Kalbeisch T, Schulz V, Kreitman M, Bergelson J: The Pattern of Polymorphism in Arabidopsis thaliana. PLoS Biology. 2005, 3: e196-10.1371/journal.pbio.0030196.PubMedPubMed CentralView ArticleGoogle Scholar
- Kim S, Plagnol V, Hu TT, Toomajian C, Clark RM, Ossowski S, Ecker JR, Weigel D, Nordborg M: Recombination and linkage disequilibrium in Arabidopsis thaliana. Nature genetics. 2007, 39 (9): 1151-5. 10.1038/ng2115.PubMedView ArticleGoogle Scholar
- François O, Blum MGB, Jakobsson M, Rosenberg NA: Demographic history of european populations of Arabidopsis thaliana. PLoS genetics. 2008, 4 (5): e1000075-10.1371/journal.pgen.1000075.PubMedPubMed CentralView ArticleGoogle Scholar
- International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MMY, Tsui SKW, Xue H, Wong JTF, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PKH, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PIW, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CDM, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Yakub I, Birren BW, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J: A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007, 449: 851-861. 10.1038/nature06258.View ArticleGoogle Scholar
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Research. 2001, 29: 308-311. 10.1093/nar/29.1.308.PubMedPubMed CentralView ArticleGoogle Scholar
- Hu TT, Pattyn P, Bakker EG, Cao J, Cheng JF, Clark RM, Fahlgren N, Fawcett Ja, Grimwood J, Gundlach H, Haberer G, Hollister JD, Ossowski S, Ottilar RP, Salamov Aa, Schneeberger K, Spannagl M, Wang X, Yang L, Nasrallah ME, Bergelson J, Carrington JC, Gaut BS, Schmutz J, Mayer KFX, Van de Peer Y, Grigoriev IV, Nordborg M, Weigel D, Guo YL:The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nature genetics. 2011, 43 (5):Google Scholar
- Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, Radenbaugh A, Singh S, Swing V, Tissier C, Zhang P, Huala E: The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic acids research. 2008, D1009-14. 36 DatabaseGoogle Scholar
- Hussin J, Nadeau P, Lefebvre JF, Labuda D: Haplotype allelic classes for detecting ongoing positive selection. BMC Bioinformatics. 2010, 11: 65-10.1186/1471-2105-11-65.PubMedPubMed CentralView ArticleGoogle Scholar
- Wright S: Isolation by distance. Genetics. 1943Google Scholar
- Akey JM: Constructing genomic maps of positive selection in humans: where do we go from here?. Genome research. 2009, 19 (5): 711-22. 10.1101/gr.086652.108.PubMedPubMed CentralView ArticleGoogle Scholar
- Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK: Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009, 19: 826-837. 10.1101/gr.087577.108.PubMedPubMed CentralView ArticleGoogle Scholar
- Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Montano A: GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists. Genome biology. 2007, 8: R3-10.1186/gb-2007-8-1-r3.PubMedPubMed CentralView ArticleGoogle Scholar
- Verburg JG, Huynh QK: Purification and Characterization of an Antifungal Chitinase from Arabidopsis thaliana. Plant Physiology. 1991, 95 (2): 450-5. 10.1104/pp.95.2.450.PubMedPubMed CentralView ArticleGoogle Scholar
- Childs LH, Witucka-Wall H, Günther T, Sulpice R, V Korff M, Stitt M, Walther D, Schmid KJ, Altmann T: Single feature polymorphism (SFP)-based selective sweep identification and association mapping of growth-related metabolic traits in Arabidopsis thaliana. BMC Genomics. 2010, 11: 188-10.1186/1471-2164-11-188.PubMedPubMed CentralView ArticleGoogle Scholar
- Albrechtsen A, Moltke I, Nielsen R: Natural selection and the distribution of identity-by-descent in the human genome. Genetics. 2010, 186: 295-308. 10.1534/genetics.110.113977.PubMedPubMed CentralView ArticleGoogle Scholar
- Santiago E, Caballero A: Variation after a selective sweep in a subdivided population. Genetics. 2005, 169: 475-483.PubMedPubMed CentralView ArticleGoogle Scholar
- Chevin LM, Billiard S, Hospital F: Hitchhiking both ways: effect of two interfering selective sweeps on linked neutral variation. Genetics. 2008, 180: 301-316. 10.1534/genetics.108.089706.PubMedPubMed CentralView ArticleGoogle Scholar
- Huff CD, Harpending HC, Rogers AR: Detecting positive selection from genome scans of linkage disequilibrium. BMC Genomics. 2010, 11: 8-10.1186/1471-2164-11-8.PubMedPubMed CentralView ArticleGoogle Scholar
- Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123 (3): 585-95.PubMedPubMed CentralGoogle Scholar
- Kim Y, Stephan W: Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics. 2002, 160 (2): 765-77.PubMedPubMed CentralGoogle Scholar
- Jensen JD, Kim Y, DuMont VB, Aquadro CF, Bustamante CD: Distinguishing between selective sweeps and demography using DNA polymorphism data. Genetics. 2005, 170 (3): 1401-10. 10.1534/genetics.104.038224.PubMedPubMed CentralView ArticleGoogle Scholar
- Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, Bustamante C: Genomic scans for selective sweeps using SNP data. Genome Research. 2005, 15 (11): 1566-75. 10.1101/gr.4252305.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhu L, Bustamante CD: A composite-likelihood approach for detecting directional selection from DNA sequence data. Genetics. 2005, 170 (3): 1411-21. 10.1534/genetics.104.035097.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen H, Patterson N, Reich D: Population differentiation as a test for selective sweeps. Genome research. 2010, 20 (3): 393-402. 10.1101/gr.100545.109.PubMedPubMed CentralView ArticleGoogle Scholar
- Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG: Measures of human population structure show heterogeneity among genomic regions. Genome Research. 2005, 15: 1468-1476. 10.1101/gr.4398405.PubMedPubMed CentralView ArticleGoogle Scholar
- Akey JM, Ruhe AL, Akey DT, Wong AK, Connelly CF, Madeoy J, Nicholas TJ, Neff MW: Tracking footprints of artificial selection in the dog genome. Proceedings of the National Academy of Sciences of the United States of America. 2010, 107 (3): 1160-5. 10.1073/pnas.0909918107.PubMedPubMed CentralView ArticleGoogle Scholar
- Zeng K, Shi S, Wu CI: Compound tests for the detection of hitchhiking under positive selection. Molecular Biology and Evolution. 2007, 24 (8): 1898-908. 10.1093/molbev/msm119.PubMedView ArticleGoogle Scholar
- Grossman SR, Shylakhter I, Karlsson EK, Byrne EH, Morales S, Frieden G, Hostetter E, Angelino E, Garber M, Zuk O, Lander ES, Schaffner SF, Sabeti PC: A Composite of Multiple Signals Distinguishes Causal Variants in Regions of Positive Selection. Science. 2010, 166 (February): 2008-2011.Google Scholar
- Pavlidis P, Jensen JD, Stephan W: Searching for Footprints of Positive Selection in Whole-genome SNP Data from Non-equilibrium Populations. Genetics. 2010, 185 (3): 907-922.PubMedPubMed CentralView ArticleGoogle Scholar
- Lin K, Li H, Schlötterer C, Futschik A: Distinguishing Positive Selection from Neutral Evolution: Boosting the Performance of Summary Statistics. Genetics. 2011, 187 (1): 229-244.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.