Sampling strategies for accurate computational inferences of gametic phase across highly polymorphic major histocompatibility complex loci
© Alcaide et al; licensee BioMed Central Ltd. 2011
Received: 18 November 2010
Accepted: 26 May 2011
Published: 26 May 2011
Genes of the Major Histocompatibility Complex (MHC) are very popular genetic markers among evolutionary biologists because of their potential role in pathogen confrontation and sexual selection. However, MHC genotyping still remains challenging and time-consuming in spite of substantial methodological advances. Although computational haplotype inference has brought into focus interesting alternatives, high heterozygosity, extensive genetic variation and population admixture are known to cause inaccuracies. We have investigated the role of sample size, genetic polymorphism and genetic structuring on the performance of the popular Bayesian PHASE algorithm. To cover this aim, we took advantage of a large database of known genotypes (using traditional laboratory-based techniques) at single MHC class I (N = 56 individuals and 50 alleles) and MHC class II B (N = 103 individuals and 62 alleles) loci in the lesser kestrel Falco naumanni.
Analyses carried out over real MHC genotypes showed that the accuracy of gametic phase reconstruction improved with sample size as a result of the reduction in the allele to individual ratio. We then simulated different data sets introducing variations in this parameter to define an optimal ratio.
Our results demonstrate a critical influence of the allele to individual ratio on PHASE performance. We found that a minimum allele to individual ratio (1:2) yielded 100% accuracy for both MHC loci. Sampling effort is therefore a crucial step to obtain reliable MHC haplotype reconstructions and must be accomplished accordingly to the degree of MHC polymorphism. We expect our findings provide a foothold into the design of straightforward and cost-effective genotyping strategies of those MHC loci from which locus-specific primers are available.
Highly polymorphic genes of the Major Histocompatibility Complex (MHC) have become very popular molecular markers among evolutionary biologists because of their traditional consideration as 'good genes' involved in pathogen resistance and sexual selection (reviewed by [1, 2]). Despite a plethora of new methods and technical advances (reviewed by ), MHC genotyping still remains challenging and time-consuming. Recently, Bayesian computational inference of gametic phase coupled to Sanger sequencing of PCR amplicons has emerged as a promising alternative [4–7]. These in-silico methods permit researchers to infer how multiple segregating sites are distributed within the same chromosome and are believed to provide haplotype information in a more straightforward and cost-effective way than laboratory-based methods such as cloning, non-denaturing gel electrophoresis and others (reviewed in ). Even though extremely variable MHC loci subjected to the effects of natural selection violate several assumptions of the underlying neutral coalescent theory , computer packages such as PHASE have shown to perform admirably in many cases [7–10]. The current version of PHASE, that provides a biologically realistic prior for the distribution of haplotypic frequencies , has become one of the most preferred options among evolutionary biologist because of its good performance and the possibility to deal with gaps and polymorphic sites with up to four segregating sites. The accuracy of gametic phase inference has shown to be, however, very sensitive to high heterozygosity, large numbers of alleles and population admixture [e.g. ]. The two first factors are particularly common among MHC genes, a fact that can explain low success rates for particular data sets . In spite of the cost and sample manipulation advantages put forward by these approaches [reviewed in ], only a few studies (e.g. [8, 10]) have addressed in detail the relative role of different parameters on PHASE performance when working with highly polymorphic and recombining MHC loci usually exhibiting the genetic hallmarks of balancing and positive selection (i.e. excess of heterozygous sites and non-synonymous substitutions). In this study, we have taken advantage of a large database of MHC class I and class II genotypes built from traditional molecular cloning in the lesser kestrel Falco naumanni. Our mains goals were i) test the performance of analytical approaches to haplotype inference in the kestrel MHC, and ii) evaluate the influence of sample size, genetic polymorphism and genetic structure on the accuracy of computational approaches dealing with phase-unknown diploid genotypes.
The MHC of the lesser kestrel is well suited for this study because of the specific amplification via the polymerase chain reaction (PCR) of single, highly polymorphic and positively selected MHC class I (exon 3) and MHC class II B (exon 2) loci [11, 12]. Both loci are 270 base pairs in length and encode for part of the antigen-binding region of MHC class I and MHC class II molecules, respectively. Heterozygosity has been shown to be extremely large in natural populations at both loci (> 90%, [13, 14]). A large proportion of the MHC alleles used in this study were isolated during previous studies and many others are derived from ongoing research [[11–14], authors unpublished data, see additional file 1]. The handling and sampling of the birds was done in accordance with Spanish laws concerning animal welfare, and under permission of the different National Governments.
Polymorphisms statistics at the kestrel MHC class I and class II data sets used in this study.
MHC class I
MHC class II
The knowledge of the real genotypes beforehand permitted us to generate those ambiguous DNA sequences resulting from the overlapping of the two alleles isolated per individual at each MHC locus (see additional file 2). These consensus DNA sequences were generated using the software BioEdit . With this information, we performed a reverse approach through which analytical approaches relying on ambiguous diploid data would be validated with respect to the genotypes inferred using traditional laboratory-based techniques. Bayesian computational inference of MHC gametic phase was performed using the popular, user-friendly PHASE module implemented in the software DNAsp ver 5.0 . Calculations were carried out over 1,000 iterations, 10 thinning interval and 1,000 burn-in iterations and considering a model that accounted for recombination. All the advanced options available for the algorithm were settled as default. PHASE accuracy was measured as the percentage of correctly assigned alleles. We concluded that the two alleles at each locus were correctly inferred when all nucleotide positions matched perfectly to those previously revealed by laboratory-based methods. To verify the identity of each allele, we took advantage of the output window provided by default by the software DNAsp 5.0 and we exported the alignment as a FASTA file subsequently handled in BioEdit.
The main objective of this study was to provide useful information regarding the number of individuals to be sampled, given a particular degree of genetic polymorphism, to computationally infer the gametic phase of MHC genes with reliability. Starting from a "worst-case" scenario similar to that used in our simulations (i.e. no occurrence of homozygous individuals and with homogenous distributions of allele frequencies), we recommend a first exploratory view of 25-30 individuals. Although PHASE can miscall nucleotides during the reconstruction of haplotypes, our experience suggests that the overall number of alleles inferred is not very different from the actual number. Depending on the number of alleles inferred by PHASE, researchers might add more individuals until the allele to individual ratio reaches at least the 1:2 threshold. Sampling strategies must therefore be designed according to the extent of MHC polymorphisms found within a particular study population. Hopefully, researchers might find homozygous genotypes or genotypes comprised by alleles just differing in one or a few nucleotides during sampling. This might be indeed very useful regarding the verification of the set of inferred alleles. It is also advisable to ground-truth the data set by performing molecular cloning in a selected number of individuals. Molecular cloning, however, is extremely prone to report false polymorphisms and therefore, it is important to contrast cloned alleles with direct sequencing chromatograms. Special caveats should be considered in the case of synonymous diploid genotypes (i.e. different combinations of alleles can generate the same direct sequencing chromatogram). However, careful examination of our allele repertoire suggests that these cases are rare in kestrels (< 1% of possible genotypes). The additional aid of technologies such as conformational polymorphism analyses (e.g. ) may nonetheless become very useful to resolve these particular cases. Researchers must pay special attention to generate high-quality direct sequencing chromatograms to minimize the risk of miscall double peaks. In this respect, the performance and location of sequencing primers as well as bi-directional sequencing must be carefully addressed. Finally, it is important to bear in mind that these approaches can only be achieved when locus-specific primers are available [[19, 20], this study]. That said, our better genomic knowledge of the MHC in both model and non-model species (e.g. [21, 22]) forecasts an encouraging future in this respect.
Acknowledgements and Funding
During the development of the present study, MA and AR were supported by post-doctoral and I3P pre-doctoral fellowships from the MICINN of the Spanish Government and the CSIC, respectively. This study was supported by several research projects (Projects CGL2004-04120/BOS, CGL2006-07481/BOS and CGL2009-10652/BOS, and HORUS Project P06-RNM-01712).
- Sommer S: The importance of immune gene variability (MHC) in evolutionary ecology and conservation. Front Zool. 2005, 2: 16-10.1186/1742-9994-2-16.PubMedPubMed CentralView ArticleGoogle Scholar
- Piertney S, Oliver M: The evolutionary ecology of the major histocompatibility complex. Heredity. 2006, 96: 7-21.PubMedGoogle Scholar
- Babik W: Methods for MHC genotyping in non model vertebrates. Mol Ecol Res. 2010, 10: 237-251. 10.1111/j.1755-0998.2009.02788.x.View ArticleGoogle Scholar
- Niu TH, Qin ZHS, Xu XP, Liu JS: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am J Hum Genet. 2002, 70: 157-169. 10.1086/338446.PubMedPubMed CentralView ArticleGoogle Scholar
- Stephens M, Smith NJ, Donnelly P: A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001, 68: 978-989. 10.1086/319501.PubMedPubMed CentralView ArticleGoogle Scholar
- Stephens M, Donnelly P: A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet. 2003, 73: 1162-1169. 10.1086/379378.PubMedPubMed CentralView ArticleGoogle Scholar
- Harrigan RJ, Mazza ME, Sorenson MD: Computation versus cloning: evaluation of two methods for haplotype determination. Mol Ecol Res. 2008, 8: 1239-1248. 10.1111/j.1755-0998.2008.02241.x.View ArticleGoogle Scholar
- Bos DH, Turner SM, Dewoody JA: Haplotype inference from diploid sequence data: evaluating performance using non-neutral MHC sequences. Hereditas. 2007, 144: 228-234. 10.1111/j.2007.0018-0661.01994.x.PubMedView ArticleGoogle Scholar
- Garrick RC, Sunnucks P, Dyer RJ: Nuclear gene phylogeography usingh PHASE: dealing with unresolved genotypes, lost alleles, and systematic bias in parameter estimation. BMC Evol Biol. 2010, 10: 118-10.1186/1471-2148-10-118.PubMedPubMed CentralView ArticleGoogle Scholar
- Bettencourt BF, Santos MR, Fialho RN, Couto AR, Peixoto MJ, Pinheiro JP: Evaluation of two methods for computational HLA haplotype inference using a real data set. BMC Bioinformatics. 2008, 9: 68-10.1186/1471-2105-9-68.PubMedPubMed CentralView ArticleGoogle Scholar
- Alcaide M, Edwards SV, Negro JJ: Characterization, polymorphism, and evolution of MHC class II B genes in birds of prey. J Mol Evol. 2007, 65: 541-554. 10.1007/s00239-007-9033-9.PubMedView ArticleGoogle Scholar
- Alcaide M, Edwards SV, Cadahia L, Negro JJ: MHC class I genes of birds of prey: isolation, polymorphism and diversifying selection. Conserv Genet. 2009, 10: 1349-1355. 10.1007/s10592-008-9653-7.View ArticleGoogle Scholar
- Alcaide M, Edwards SV, Negro JJ, Serrano D, Tella JL: Extensive polymorphism and geographical variation at a positively selected MHC class II B gene of the lesser kestrel (Falco naumanni). Mol Ecol. 2008, 17: 2652-2665. 10.1111/j.1365-294X.2008.03791.x.PubMedView ArticleGoogle Scholar
- Alcaide M, Lemus JA, Blanco G, Tella JL, Serrano D, Negro JJ, Rodríguez A, García-Montijano M: MHC diversity and differential exposure to pathogens in kestrels (Aves: Falconidae). Mol Ecol. 2010, 19: 691-705. 10.1111/j.1365-294X.2009.04507.x.PubMedView ArticleGoogle Scholar
- Alcaide M, Serrano D, Tella JL, Negro JJ: Strong philopatry derived from capture-recapture records does not lead to fine-scale genetic differentiation in lesser kestrels. J Anim Ecol. 2009, 78: 468-475. 10.1111/j.1365-2656.2008.01493.x.PubMedView ArticleGoogle Scholar
- Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser. 1999, 41: 95-98.Google Scholar
- Librado P, Rozas J: DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009, 25: 1451-1452. 10.1093/bioinformatics/btp187.PubMedView ArticleGoogle Scholar
- Alcaide M, Lopez L, Tanferna A, Blas J, Sergio F, Hiraldo F: Simultaneous analysis of multiple PCR amplicons enhances capillary SSCP discrimination of MHC alleles. Electrophoresis. 2010, 31: 1353-1356. 10.1002/elps.200900709.PubMedView ArticleGoogle Scholar
- Bettinotti MP, Hadzikadic L, Ruppe E, Dhillon G, Stroncek DS, Marincola FM: New HLA-A, -B, and -C locus-specific primers for PCR amplification from cDNA: application in clinical immunology. J Immunol Methods. 2003, 279: 143-148. 10.1016/S0022-1759(03)00233-3.PubMedView ArticleGoogle Scholar
- Hughes CR, Miles S, Walbroehl JM: Support for the minimal essential MHC hypothesis: a parrot with a single, highly polymorphic MHC class II B gene. Immunogenetics. 2008, 60: 219-231. 10.1007/s00251-008-0287-1.PubMedView ArticleGoogle Scholar
- Worley K, Gillingham M, Jensen P, Kennedy LJ, Pizzari T, Kaufman J, Richardson D: Single locus typing of MHC class I and class II B loci in a population of red jungle fowl. Immunogenetics. 2008, 60: 233-247. 10.1007/s00251-008-0288-0.PubMedView ArticleGoogle Scholar
- Cloutier A, Mills JA, Baker AJ: Characterization and locus-specific typing of MHC class I genes in the red-billed gull (Larus scolopinus) provides evidence for major, minor, and nonclassical loci. Immunogenetics. 2011, 63: 377-394. 10.1007/s00251-011-0516-x.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.