Markers typed in genome-wide analysis identify regions showing deviation from Hardy-Weinberg equilibrium

Background Deviations from Hardy-Weinberg equilibrium (HWE) are commonly thought of as indicating genotyping errors, population stratification or some other artefact. However they could also arise through important biological mechanisms. In particular, genetic variants having a recessive effect on the successful fertilisation and/or development of an embryo might be manifest through such deviations in an unselected sample of "control" subjects. Findings We investigated genotypes from 463842 autosomal markers from 1504 British subjects. We identified regions in which several neighbouring markers exhibited deviation from HWE in the same direction by considering "heterozygosity scores" in windows of 10 markers. The heterozygosity score for each marker was defined as -log(p) or log(p) according to whether the marker demonstrated increased heterozygosity or homozygosity. In each window the marker with the highest absolute score was ignored and the positive and negative scores were summed for the other nine markers. Windows were selected on the basis of this sum exceeding a given threshold, for which we used values of 50 or 15. For the threshold of 50, we identified 7 regions with increased heterozygosity and for the threshold of 15 we identified 22 regions with increased heterozygosity, 23 with increased homozygosity and 2 containing both kinds of window. The most impressive of these results came from a group of 6 markers at 17q21, each of which showed increased heterozygosity significant at p < 10-190. Conclusion The human genome contains regions which deviate markedly from HWE and these might harbour genes influencing embryonic survival.


Findings
When marker allele frequencies in controls deviate markedly from Hardy-Weinberg equilibrium (HWE) this is commonly taken as an indicator that the genotyping is unreliable or that there is marked population stratification and the marker is discarded [1]. However if common polymorphisms influence embryonic survival then it is expected that these may also lead to such deviations. The existence of such loci is supported by a genome-wide tendency for siblings to share alleles more than would be expected by chance [2].
As previously suggested, we reasoned that if groups of nearby markers all showed deviation from HWE then this could not result purely from genotyping errors since there would be no reason for the same kind of error to be repli-cated in each marker [3]. Hence we used the control data from the 1958 British Birth Cohort which we obtained online from the Wellcome Trust Case-Control Consortium (WTCCC) after their approval was granted [4]. We used the genotypes called by the Chiamo algorithm and excluded those having either a studywise missing data proportion of more than 0.05 or a studywise minor allele frequency of less than 0.05 along with a studywise missing data proportion of more than 0.01. Naturally, for the purposes of this study, we did not exclude markers on grounds of deviation from HWE. Genotypes for 463842 autosomal markers were investigated, typed in 1504 subjects. We used sliding windows of ten markers across the sample and for each of the ten markers in each window we checked for deviation from HWE using a chi-squared test and recorded the resultant p-values. We assigned a "heterozygosity score" which was defined as -log 10 (p) for markers showing increased heterozygosity (i.e. a positive number) and as log 10 (p) for those showing increased homozygosity. We then excluded the marker having the highest absolute value for this score and considered only the scores from the other nine markers. The aim of the approach was to ignore regions where only a single marker produced a marked deviation from HWE but to identify those in which a group of markers all supported deviation in the same direction. We then summed all the positive heterozygosity scores and all the negative heterozygosity scores from the nine markers and tested whether the absolute value of either sum exceeded a predetermined threshold. For the current study, we used threshold values of 15 and of 50.
For each set of ten markers reaching the specified threshold using this process, we went on to investigate departure from HWE of two-marker and three-marker haplotypes using a method we have described elsewhere [5] to produce a one degree of freedom chi-squared test for departure from HWE, summarised by a "heterozygosity score" defined as -log 10 (p) or log 10 (p).
When there were overlapping sets of ten markers which exceeded the threshold they were amalgamated together, building up regions in which there was evidence for deviation from HWE. We obtained lists of genes within 200 kb either side of these regions by interrogating the UCSC genome browser [6]. Using the threshold of 50, 7 regions were identified as showing increased heterozygosity. In addition, there were 68 markers which individually produced results significant at p < 10 -50 but which were not supported by other markers nearby and hence which might represent genotyping errors, of which 10 demonstrated increased heterozygosity and 58 increased homozygosity. Using a threshold of 15 implicated 22 regions and 37 isolated markers as showing increased heterozygosity and 23 regions and 285 isolated markers as showing increased homozygosity. There were 2 regions containing a mixture of 10-marker windows meeting the criterion of 15 for both increased heterozygosity and homozygosity.
The most convincing evidence for a real departure from HWE occurs at 17q21 in the region around rs2693363, as shown in Table 4. This marker and five others flanking it are each individually significant at p < 10-190. It does not seem plausible that this result could occur through a set of genotyping errors or through some other artefact and so we can only conclude that there really is a marked excess of heterozygosity in this region. Using the threshold of 50, it seems unlikely that any of the results could have occurred by chance. Perhaps the least convincing result is at 6p25.3 (Table 2), where rs815593 had a heterozygosity score of 100.6 and rs11757245 has a score of 86.9. No other markers nearby support deviation from HWE and one could argue that it is possible that the result for each marker is due to genotyping error and that it is mere coincidence that the two happen to lie close to each other. When the threshold is set as low as 15, we expect that a number of the results might have occurred by chance. Given that results from nearby markers are not independent, it is possible that a region might happen to show deviation from HWE at p < 10-3 or p < 10-4 and that several markers in this region might be significant at this level and hence produce a combined score exceeding the threshold of 15. On the other hand, many regions produced a score far in excess of this and a substantial proportion of regions identified using this lower threshold are likely to represent a real biological effect.
With regard to the comparison of single marker and haplotype-based analyses, there were no regions in which there was a haplotype analysis which provided stronger evidence for increased heterozygosity than the most significant single marker analysis. We would take this to indicate that the information supporting departure from HWE was captured by the single marker. For example, given the allele frequencies of rs11757245 at 6p25.3 (Table 2), one would expect 204.4 subjects to have genotype BB. In fact, this genotype occurs in only 14 subjects, a finding consistent with this polymorphism itself or one in close LD with it having a marked effect on survival. However when we considered regions in which there was a deviation towards excess homozygosity rather than heterozygosity, identified using the threshold of 15, there were a few for which a haplotype analysis was more significant than any single marker analysis. One interpretation of this might be that there could be an untyped polymorphism in LD with one or more of the haplotypes. For example, the frequencies at rs649022 at 4q26 of both the AA and BB genotypes are somewhat increased from HWE with p < 10 -12 (Additional File 3 Table S7). When the haplotypes of this marker are considered along with the next two markers, rs594125 and rs11726138, the deviation in favour of increased homozygosity is significant at p < 10 -20 .  File 3 Table S7), yet these markers span PKHD1, the gene for polycystic kidney and hepatic disease, mutations in which are a known cause of autosomal recessive kidney disease (ARKD) which can result in stillbirth or death in infancy or childhood. By contrast, as we have already noted there is extremely strong statistical evidence to support increased homozygosity around rs2693363 at 17q21.31 ( The table shows markers and genes in a region of 1q31-41 showing increased heterozygosity using a threshold for the summed heterozygosity scores (ignoring the highest-scoring marker) exceeeding 50. Observed counts are shown for each marker genotype with the expected counts in the row below. Heterozygosity scores, defined as -log(p) for increased heterozygosity and log(p) for increased homozygosity, are shown for individual markers and for two and three marker haplotypes.      basis of a formal statistical test. The nature of our approach means that we have only sought to identify regions in which the effect is apparent in more than one marker. It is quite likely that at least some of the single markers showing deviation from HWE which we have ignored do so because of a real effect rather than through genotyping error although we note that they more often showed increased homozygosity whereas the regions implicated by groups of markers showed more marked deviations towards heterozygosity. This may suggest that a substantial proportion of these isolated markers do represent genotyping errors. Likewise, the marker set we have used does not provide 100% coverage of the genome. Hence there may be many more regions of HWE present than those highlighted by the present study.
Although it seems clear that deviations from HWE exist, the mechanisms driving this are not clear. One proposal we made is that a recessive lethal polymorphism could lead to decreased homozygosity in surviving subjects. Such a polymorphism might cause death antenatally or in childhood or might prevent successful fertilisation. We argue that the effect on reproductive fitness of the parent would be minimal if it produced very early termination or prevented fertilisation. Nevertheless, such polymorphisms would need to be very common indeed if they were to be detectable in a sample size of only 1504, as we have used.
Although the best implicated regions demonstrate increased heterozygosity, we also find regions with increased homozygosity, some with p values less than 10 -10 or 10 -20 or even smaller. Theoretically, increased homozygosity could occur through the presence of deletions or population stratification but is seems hard to conceive that these mechanisms could produce an effect of such magnitude.
To conclude, we have obtained good evidence that some regions of the human genome demonstrate deviation from HWE in an unselected sample of adults from the UK population. We believe that these preliminary findings warrant further exploration.