Markers typed in genome-wide analysis identify regions showing deviation from Hardy-Weinberg equilibrium
BMC Research Notes volume 2, Article number: 29 (2009)
Deviations from Hardy-Weinberg equilibrium (HWE) are commonly thought of as indicating genotyping errors, population stratification or some other artefact. However they could also arise through important biological mechanisms. In particular, genetic variants having a recessive effect on the successful fertilisation and/or development of an embryo might be manifest through such deviations in an unselected sample of "control" subjects.
We investigated genotypes from 463842 autosomal markers from 1504 British subjects. We identified regions in which several neighbouring markers exhibited deviation from HWE in the same direction by considering "heterozygosity scores" in windows of 10 markers. The heterozygosity score for each marker was defined as -log(p) or log(p) according to whether the marker demonstrated increased heterozygosity or homozygosity. In each window the marker with the highest absolute score was ignored and the positive and negative scores were summed for the other nine markers. Windows were selected on the basis of this sum exceeding a given threshold, for which we used values of 50 or 15.
For the threshold of 50, we identified 7 regions with increased heterozygosity and for the threshold of 15 we identified 22 regions with increased heterozygosity, 23 with increased homozygosity and 2 containing both kinds of window. The most impressive of these results came from a group of 6 markers at 17q21, each of which showed increased heterozygosity significant at p < 10-190.
The human genome contains regions which deviate markedly from HWE and these might harbour genes influencing embryonic survival.
When marker allele frequencies in controls deviate markedly from Hardy-Weinberg equilibrium (HWE) this is commonly taken as an indicator that the genotyping is unreliable or that there is marked population stratification and the marker is discarded . However if common polymorphisms influence embryonic survival then it is expected that these may also lead to such deviations. The existence of such loci is supported by a genome-wide tendency for siblings to share alleles more than would be expected by chance .
As previously suggested, we reasoned that if groups of nearby markers all showed deviation from HWE then this could not result purely from genotyping errors since there would be no reason for the same kind of error to be replicated in each marker . Hence we used the control data from the 1958 British Birth Cohort which we obtained online from the Wellcome Trust Case-Control Consortium (WTCCC) after their approval was granted . We used the genotypes called by the Chiamo algorithm and excluded those having either a studywise missing data proportion of more than 0.05 or a studywise minor allele frequency of less than 0.05 along with a studywise missing data proportion of more than 0.01. Naturally, for the purposes of this study, we did not exclude markers on grounds of deviation from HWE. Genotypes for 463842 autosomal markers were investigated, typed in 1504 subjects. We used sliding windows of ten markers across the sample and for each of the ten markers in each window we checked for deviation from HWE using a chi-squared test and recorded the resultant p-values. We assigned a "heterozygosity score" which was defined as -log10 (p) for markers showing increased heterozygosity (i.e. a positive number) and as log10 (p) for those showing increased homozygosity. We then excluded the marker having the highest absolute value for this score and considered only the scores from the other nine markers. The aim of the approach was to ignore regions where only a single marker produced a marked deviation from HWE but to identify those in which a group of markers all supported deviation in the same direction. We then summed all the positive heterozygosity scores and all the negative heterozygosity scores from the nine markers and tested whether the absolute value of either sum exceeded a predetermined threshold. For the current study, we used threshold values of 15 and of 50.
For each set of ten markers reaching the specified threshold using this process, we went on to investigate departure from HWE of two-marker and three-marker haplotypes using a method we have described elsewhere  to produce a one degree of freedom chi-squared test for departure from HWE, summarised by a "heterozygosity score" defined as -log10 (p) or log10 (p).
When there were overlapping sets of ten markers which exceeded the threshold they were amalgamated together, building up regions in which there was evidence for deviation from HWE. We obtained lists of genes within 200 kb either side of these regions by interrogating the UCSC genome browser .
Table 1, 2, 3, 4, 5, 6 show the results when we applied a threshold of 50 to identify sets of markers demonstrating deviation from HWE. (Additional File 1 is Table S5: HWETable5.doc and Additional File 2 is Table S6: HWETable6.doc. Results using a threshold of 15 are presented in Additional File 3 Table S7: HWETable7.doc.) Using the threshold of 50, 7 regions were identified as showing increased heterozygosity. In addition, there were 68 markers which individually produced results significant at p < 10-50 but which were not supported by other markers nearby and hence which might represent genotyping errors, of which 10 demonstrated increased heterozygosity and 58 increased homozygosity. Using a threshold of 15 implicated 22 regions and 37 isolated markers as showing increased heterozygosity and 23 regions and 285 isolated markers as showing increased homozygosity. There were 2 regions containing a mixture of 10-marker windows meeting the criterion of 15 for both increased heterozygosity and homozygosity.
The most convincing evidence for a real departure from HWE occurs at 17q21 in the region around rs2693363, as shown in Table 4. This marker and five others flanking it are each individually significant at p < 10-190. It does not seem plausible that this result could occur through a set of genotyping errors or through some other artefact and so we can only conclude that there really is a marked excess of heterozygosity in this region. Using the threshold of 50, it seems unlikely that any of the results could have occurred by chance. Perhaps the least convincing result is at 6p25.3 (Table 2), where rs815593 had a heterozygosity score of 100.6 and rs11757245 has a score of 86.9. No other markers nearby support deviation from HWE and one could argue that it is possible that the result for each marker is due to genotyping error and that it is mere coincidence that the two happen to lie close to each other. When the threshold is set as low as 15, we expect that a number of the results might have occurred by chance. Given that results from nearby markers are not independent, it is possible that a region might happen to show deviation from HWE at p < 10-3 or p < 10-4 and that several markers in this region might be significant at this level and hence produce a combined score exceeding the threshold of 15. On the other hand, many regions produced a score far in excess of this and a substantial proportion of regions identified using this lower threshold are likely to represent a real biological effect.
With regard to the comparison of single marker and haplotype-based analyses, there were no regions in which there was a haplotype analysis which provided stronger evidence for increased heterozygosity than the most significant single marker analysis. We would take this to indicate that the information supporting departure from HWE was captured by the single marker. For example, given the allele frequencies of rs11757245 at 6p25.3 (Table 2), one would expect 204.4 subjects to have genotype BB. In fact, this genotype occurs in only 14 subjects, a finding consistent with this polymorphism itself or one in close LD with it having a marked effect on survival. However when we considered regions in which there was a deviation towards excess homozygosity rather than heterozygosity, identified using the threshold of 15, there were a few for which a haplotype analysis was more significant than any single marker analysis. One interpretation of this might be that there could be an untyped polymorphism in LD with one or more of the haplotypes. For example, the frequencies at rs649022 at 4q26 of both the AA and BB genotypes are somewhat increased from HWE with p < 10-12 (Additional File 3 Table S7). When the haplotypes of this marker are considered along with the next two markers, rs594125 and rs11726138, the deviation in favour of increased homozygosity is significant at p < 10-20. Inspection of the counts of haplotype combinations revealed that the haplotypes BAB and AAB were homozygous approximately twice as often as would be expected under HWE, with expected counts of 48.6 and 27.5 and observed counts of 83 and 59, respectively.
For most of the regions implicated there were a number of different genes within 200 kB, making it impossible to draw firm conclusions about which might harbour biologically meaningful polymorphisms. It would be difficult to avoid making subjective judgements about the relative weight given to statistical evidence and to biological plausibility. For example, a number of markers around rs1326581 at 6p12.2 combined to provide relatively weak statistical evidence for increased homozygosity (only just exceeding the threshold of 15, Additional File 3 Table S7), yet these markers span PKHD1, the gene for polycystic kidney and hepatic disease, mutations in which are a known cause of autosomal recessive kidney disease (ARKD) which can result in stillbirth or death in infancy or childhood. By contrast, as we have already noted there is extremely strong statistical evidence to support increased homozygosity around rs2693363 at 17q21.31 (Table 4) but none of the identified genes in the region are really obvious candidates to have a recessive lethal effect.
One indication for genes having a biologically significant role in influencing departures from HWE might be that similar genes were found in different implicated regions. There were several possible examples of this phenomemon which were apparent when the threshold of 15 was considered, as shown in Additional File 3 Table S7. Two cytogenetically distinct implicated regions contain CSMD1 and CSMD3, the genes for CUB and Sushi multiple domains 1 and 3, although the third gene of the family, CSMD2, did not occur in an implicated region. Three loci related to ribosomal protein S26 were in separated implicated regions: LOC728937 (similar to 40S ribosomal protein S26), RPS26P3 (ribosomal protein S26 pseudogene 3) and LOC644191 (40S ribosomal protein S26). However the gene for ribosomal protein S26 itself, RPS26, was not in an implicated region and nor was RPS26L1 (ribosomal protein S26-like 1). Two loci related to FMR1 were in separate implicated regions: NUFIP1P (nuclear fragile × mental retardation protein interacting protein 1 pseudogene) and CYF1P1 (cytoplasmic FMR1 interacting protein 1), although NUF1P1 and CYF1P2 were not. Three loci related to golgin subfamily a were in different implicated regions: LOC643707 (golgi autoantigen, golgin subfamily a, 6 pseudogene), LOC192130 (golgi autoantigen, golgin subfamily a, 4 pseudogene) and LOC729786 (similar to golgi autoantigen, golgin subfamily a, 8A). However the UCSC browser lists loci containing the phrase "golgin subfamily a" in 11 other regions which did not show departure from HWE. Finally, olfactory receptor genes and/or pseudogenes were found in four different implicated regions but there are over 400 of these distributed in a number of genomic regions.
This simple exploratory analysis clearly demonstrates that there are regions of the human genome which deviate markedly from HWE in a sample of unselected British adults. The evidence is stronger for some regions than for others and we have not attempted to quantify this on the basis of a formal statistical test. The nature of our approach means that we have only sought to identify regions in which the effect is apparent in more than one marker. It is quite likely that at least some of the single markers showing deviation from HWE which we have ignored do so because of a real effect rather than through genotyping error although we note that they more often showed increased homozygosity whereas the regions implicated by groups of markers showed more marked deviations towards heterozygosity. This may suggest that a substantial proportion of these isolated markers do represent genotyping errors. Likewise, the marker set we have used does not provide 100% coverage of the genome. Hence there may be many more regions of HWE present than those highlighted by the present study.
Although it seems clear that deviations from HWE exist, the mechanisms driving this are not clear. One proposal we made is that a recessive lethal polymorphism could lead to decreased homozygosity in surviving subjects. Such a polymorphism might cause death antenatally or in childhood or might prevent successful fertilisation. We argue that the effect on reproductive fitness of the parent would be minimal if it produced very early termination or prevented fertilisation. Nevertheless, such polymorphisms would need to be very common indeed if they were to be detectable in a sample size of only 1504, as we have used.
Although the best implicated regions demonstrate increased heterozygosity, we also find regions with increased homozygosity, some with p values less than 10-10 or 10-20 or even smaller. Theoretically, increased homozygosity could occur through the presence of deletions or population stratification but is seems hard to conceive that these mechanisms could produce an effect of such magnitude.
To conclude, we have obtained good evidence that some regions of the human genome demonstrate deviation from HWE in an unselected sample of adults from the UK population. We believe that these preliminary findings warrant further exploration.
Leal SM: Detection of genotyping errors and pseudo-SNPs via deviations from Hardy-Weinberg equilibrium. Genet Epidemiol. 2005, 29 (3): 204-214. 10.1002/gepi.20086.
Zollner S, Wen X, Hanchard NA, Herbert MA, Ober C, Pritchard JK: Evidence for extensive transmission distortion in the human genome. Am J Hum Genet. 2004, 74 (1): 62-72. 10.1086/381131.
Xu J, Turner A, Little J, Bleecker ER, Meyers DA: Positive results in association studies are associated with departure from Hardy-Weinberg equilibrium: hint for genotyping error?. Hum Genet. 2002, 111 (6): 573-574. 10.1007/s00439-002-0819-y.
WTCCC: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007, 447 (7145): 661-678. 10.1038/nature05911.
Curtis D, Vine AE, Knight J: Study of regions of extended homozygosity provides a powerful method to explore haplotype structure of human populations. Ann Hum Genet. 2008, 72 (Pt 2): 261-278. 10.1111/j.1469-1809.2007.00411.x.
UCSC browser: genome.ucsc.edu. [http://genome.ucsc.edu/cgi-bin/hgGateway]
AEV is supported by Wellcome Trust Project Grant 076392.
The authors declare that they have no competing interests.
DC conceived the project. AEV carried out the analyses. Both contributed to the preparation of the manuscript. Both authors have read and approved the final manuscript
Electronic supplementary material
Additional file 1: HWETable5.doc. Table 5. Region of 14q11.1-11.2 with summed heterozygosity score exceeding 50. (DOC 218 KB)
Additional file 2: HWETable6.doc. Table 6. Region of 15q11-14 with summed heterozygosity score exceeding 50. (DOC 208 KB)
About this article
Cite this article
Vine, A.E., Curtis, D. Markers typed in genome-wide analysis identify regions showing deviation from Hardy-Weinberg equilibrium. BMC Res Notes 2, 29 (2009). https://doi.org/10.1186/1756-0500-2-29