- Short Report
- Open Access
- Published:

# A note on the use of the generalized odds ratio in meta-analysis of association studies involving bi- and tri-allelic polymorphisms

*BMC Research Notes*
**volume 4**, Article number: 172 (2011)

## Abstract

### Background

The generalized odds ratio (GOR) was recently suggested as a genetic model-free measure for association studies. However, its properties were not extensively investigated. We used Monte Carlo simulations to investigate type-I error rates, power and bias in both effect size and between-study variance estimates of meta-analyses using the GOR as a summary effect, and compared these results to those obtained by usual approaches of model specification. We further applied the GOR in a real meta-analysis of three genome-wide association studies in Alzheimer's disease.

### Findings

For bi-allelic polymorphisms, the GOR performs virtually identical to a standard multiplicative model of analysis (e.g. per-allele odds ratio) for variants acting multiplicatively, but augments slightly the power to detect variants with a dominant mode of action, while reducing the probability to detect recessive variants. Although there were differences among the GOR and usual approaches in terms of bias and type-I error rates, both simulation- and real data-based results provided little indication that these differences will be substantial in practice for meta-analyses involving bi-allelic polymorphisms. However, the use of the GOR may be slightly more powerful for the synthesis of data from tri-allelic variants, particularly when susceptibility alleles are less common in the populations (≤10%). This gain in power may depend on knowledge of the direction of the effects.

### Conclusions

For the synthesis of data from bi-allelic variants, the GOR may be regarded as a multiplicative-like model of analysis. The use of the GOR may be slightly more powerful in the tri-allelic case, particularly when susceptibility alleles are less common in the populations.

## Findings

The generalized odds ratio (GOR) was recently suggested as a model-free measure of effect that might overcome the problem of a genetic model misspecification in meta-analyses of association studies [1]. In the context of case-control genetic association studies for a binary trait and under assumption of random sampling, the GOR measures the probability that a case has a higher mutation load (i.e. a larger number of high-risk alleles) than a control divided by the probability that a control has a higher mutation load than a case.

In this note, we highlight advantages and limitations of the use of the GOR as a measure of effect in meta-analyses of bi- and tri-allelic polymorphisms through simulation. Our results are further complemented by a re-analysis of a real meta-analysis of three genome-wide association studies covering >311,000 bi-allelic markers in Alzheimer's disease.

## Results

### Performance of the GOR in the bi-allelic model

#### Type-I error rates

Type-I error rates obtained from meta-analyses employing the GOR as a summary effect size are comparable to the multiplicative and dominant models of analysis (Table 1).

#### Power

Compared to the use of multiplicative approaches, the power to detect variants with a dominant model of action was typically only slightly higher for meta-analyses using the GOR as summary estimate. For variants following a multiplicative pattern of action, all non-recessive models of analysis were highly comparable. Interestingly, the largest differences observed among the per-allele, log-additive trend (LAT) and the GOR were found in true recessive and over-dominant models, where the performance of the GOR is slightly inferior for the former, but reasonable better for the latter (Figure 1).

#### Bias in the estimated statistical heterogeneity (τ^{2})

Compared to both per-allele and LAT approaches, the median bias in τ^{2} obtained by the GOR is typically lower in scenarios where the genetic variant is less common in the populations (e.g. minor allele frequency [MAF] = 10%) and acts either dominantly or multiplicatively. For the latter model of action, bias is slightly positive. In addition, for common markers (MAF = 40%) following a dominant model of action, the GOR provides less biased τ^{2} estimates compared to the specification of a multiplicative model. Importantly, for a common variant (MAF = 40%) acting multiplicatively, meta-analyses using the GOR as an effect size provide upwardly biased estimates of τ^{2} compared to true underlying average increment in the between-study variance per additional copy of the risk allele (Figure 2). This upward bias in the estimated statistical heterogeneity is also found in both dominant and recessive models of analysis.

#### Bias in the estimated genetic effect size

The GOR provides nearly unbiased summary effects for less common variants (MAF = 10%) acting dominantly, regardless of the meta-analytical model and τ^{2}. Conversely, when the variant follows a multiplicative model of action and is common (MAF = 40%), GOR-based meta-analyses overestimate the true underlying increase in the effect size per additional copy of the risk allele (on average 20%) [Additional file 1: Supplementary tables S1-S2].

### Performance of the GOR in the tri-allelic model

#### Type-I error rates

The performance of each model of analysis depends on the underlying between-study variability, allele frequencies and meta-analytical model, but type-I error rates for LAT- and GOR-based meta-analyses are comparable, whereas false discoveries tend to be higher for the per-allele approach when statistical heterogeneity is present (i.e. τ^{2} >0). However, the extent of these differences is smaller in random-effects calculations [Additional file 1: Supplementary tables S3-S4].

#### Power: two alleles acting on the same direction

When at least one of the risk-alleles is less common in the populations (*f* = 10%), and both exhibit either a dominant or multiplicative mode of action, power obtained by using the GOR as a summary effect is higher than that provided by either the per-allele or LAT approaches (Figure 3).

#### Power: two alleles acting on opposite directions

When prior evidence on the direction of the effects of the susceptibility alleles is available, similar power is achieved with the use of the per-allele, LAT and GOR, regardless of the meta-analytical model, *f* and statistical heterogeneity [Additional files 1: Supplementary tables S5-S7].

On the other hand, when no prior evidence on the direction of effects is available (e.g. initial screenings), the per-allele model of analysis displays a superior performance compared to the use of either the LAT- or GOR-based approaches. In particular, compared to both GOR and LAT approaches, the gain in power for meta-analyses using the per-allele OR may range from 1.5- to 10-fold depending on the number of combined studies (Figure 4).

#### Power: when only one allele displays a significant effect

Power is comparable among the GOR, LAT and per-allele odds ratio when only one allele displays a significant effect. This is specially true when the high-risk allele is less common in the populations (*f* = 10%), particularly when *f* (A_{2}) = *f* (A3) = 10%. Overall, for common variants acting multiplicatively, the best performance is achieved with both GOR and LAT. When the risk allele is either recessive or dominant and is common, the best approach may depend on the frequency of the remaining alleles, but power is comparable among the three tested approaches whenever *f* (A_{2}) ≅ *f* (A3) [Additional file 1: Supplementary tables S8-S10].

### Real application

Results for the seven "top hits" variants associated with late-onset Alzheimer's disease are presented in Table 2. As expected, the largest association signal arose from the variant rs41377151, located at the 3' end of the apolipoprotein C-I (*APOC1*) gene within the Apolipoprotein E (*APOE)*/*APOC1* gene cluster on chromosome 19q13.3. This polymorphism is only 10.9 kb away from rs7412 variant (Arg176Cys) [2], which is one of the alleles that dictate the *APOE* ε status [3]. In addition, the remaining signals are also commensurate with results from previous [4] and more recent, large investigations [2, 5, 6].

In agreement with our simulation-based results, plots of summary ORs and *P*-values (Figure 5) based on real data suggest a good concordance between GOR and both LAT and per-allele approaches, followed by the dominant and recessive models, respectively.

## Discussion

The GOR was suggested as a model-free approach for the synthesis of genetic association studies. The rational is that the GOR provides more flexibility for the true underlying genetic effect to describe the difference between two cumulative distribution functions of the latent variables, particularly when the assumption of proportional odds is violated. Furthermore, an additional advantage is that this ordinal measure of association is easily interpretable in practice [1].

Recent meta-analyses have applied the GOR claiming that this might be considered a different genetic model or an independent approach compared to the specification of traditional genetic model of analysis [7, 8]. However, here we show that, since the GOR inherently assumes an ordinal mutation load (e.g. 1, 2 and 3 for genotypes *A*_{1}*A*_{1}, *A*_{1}*A*_{2}, *and A*_{2}*A*_{2}, respectively), this measure of assocation performs like a multiplicative model of analysis for bi-allelic polymorphisms. For diallelic variants, our simulations show that GOR-based results are highly correlated to those obtained by both LAT and per-allele ORs, resulting in similar type-I error rates and power compared to these traditional multiplicative models of analysis. In addition, a real meta-analysis of three GWAs in Alzheimer's disease indicates that limited. For example, under a fixed-effects framework and assumption of a threshold of *P*<10^{-5} (probably realistic due to the small samples sizes available), the total number of markers considered promising for further replication [9] would be 10, 13, 13, 14 and two for the per-allele, LAT, GOR, dominant and recessive approaches, respectively. Under a random-effects model, the correspondent numbers would be two for the recessive model and 8 for the remaining approaches.

Nonetheless, other important considerations in meta-analysis of genetic association studies involving bi-allelic polymorphism are biases in the estimated effect size [10] and heterogeneity [11]. In this respect, the most negative aspect of using the GOR as a measure of association in practice is that this measure provides inflated effects for bi-allelic variants following a multiplicative model of action. Although this inflation may be only mild for less common markers (i.e. median bias of ~5% for variants with MAF = 10%), the average upward bias in the observed genetic effect augments with increasing MAFs, reaching up to 20% for MAFs around 40%.

On the other hand, our data showed that the use of the GOR may be advantageous in meta-analyses involving tri-allelic polymorphisms as long as genotypes can be correctly ordered in terms of mutation load. In fact, a reasonable gain in power in the order of 2 to 15% may be achieved for the detection of association signals from variants with small frequencies (e.g. f ~10%) compared to the use of per-allele or LAT odds ratios. The observation that higher power might be obtained with GOR in scenarios with a larger number of alleles of low frequency may serve as hypothesis-generating information to extent the use of the GOR to meta-analysis of different types of genetic variants. For example, a special case might the use of the GOR in meta-analysis of structural variants such as copy-number variations (CNVs), which tend to exhibit a substantial number of alleles, yielding a correspondent large number of possible genotype categories [12]. Since the GOR handles categories with zero counts [13], and a different number of genotypes may be considered per study (for instance, in the case of specific allele sizes in some populations), the properties of the GOR in meta-analysis of CNVs is a topic worth of further investigation.

In summary, although there are differences in the statistical properties among the investigated approaches for bi-allelic variants, the absolute magnitude of these differences may be actually small and likely to be of very limited practical significance. An exception might be the use of the GOR in meta-analyses involving tri-allelic polymorphisms with less common alleles, since GOR uses of the complete genotypic distribution (e.g. the GOR less affected by zero cells). For these scenarios, the use of the GOR as a measure of effect may be slightly more powerful than traditional measures. However, the performance of GOR-based meta-analyses will depend on some knowledge about the direction of the effects when there are two alleles modulating the risk of disease in opposite directions.

## Material and methods

### Simulation procedures and scenarios

We simulated meta-analyses of association studies using approaches that rely on multinomial distributions described in detail elsewhere (autosomal markers) [9, 10]. Hardy-Weinberg equilibrium is assumed to hold for the whole population, whereas the susceptibility alleles are considered the causal variants or surrogate markers in tight linkage disequilibrium (*r*^{2} = 1.0). For the bi-allelic case, we simulated data assuming the susceptibility variant *A*_{2} (minor allele) and non-risk allele *A*_{1}.

Under a three-allele model, we denote *A*_{1}, *A*_{2} and *A*_{3} as the possible alleles with frequencies *f*(*A*_{
1
}), *f*(*A*_{
2
}) and *f*(*A*_{
3
}^{)}, respectively, yielding six possible distinct genotypes (*A*_{1}*A*_{1}, *A*_{1}*A*_{2}, *A*_{1}*A*_{3}, *A*_{2}*A*_{2}, *A*_{2}*A*_{3} and *A*_{3}*A*_{3}).

For each possible combination of the parameters presented in Table 3 we considered meta-analyses that included two up to 30 studies (case-to-control ratio of 1:1).

For the tri-allelic case, three possible scenarios were considered: (*i*), among the alleles, two were susceptibility variants (e.g. both increase the susceptibility for the trait with the same magnitude), (*ii*) two alleles were associated with the trait, but in opposite directions (i.e. one increases, while the other decreases the risk for the trait in a similar magnitude) and (*iii*) only one out of the three alleles displays significant effects on the trait. We further assumed that the mechanism of action is similar for both alleles when there are two alleles with genuine effects on the trait (e.g. both act multiplicatively, or both act dominantly, and so forth). For scenarios with two alleles modulating the risk of disease, two additional situations of practical interest were investigated: (*ii-a*) the two alleles are associated with the susceptibility of disease in opposite directions and investigators have *no prior* evidence on the direction of these effects (e.g. initial agnostic screenings) and (*ii*-*b*) two alleles are associated with the susceptibility of disease in opposite directions, but investigators *posses prior* evidence on the direction of the effects (e.g. meta-analyses from the literature). For consistency, allele *A*_{2} is coined to be the protective variant, whereas allele *A*_{3} is the susceptibility one in these scenarios.

### Bi-allelic polymorphisms

#### Assessment of bias

The percentage bias was computed as and for genetic effect sizes and between-study variance, respectively, where is the (average) observed summary effect, μ is the true average genetic effect across population-specific genetic effects, τ^{2} is the true between-study variance and is the method-of-moments-based estimate of τ^{2}. Both and μ are captured as the natural logarithm of the odds ratio (Table 3). Use of alternative bias estimators (e.g. mean squared error) yielded qualitatively analogous results (data not shown).

### Tri-allelic polymorphisms

Meta-analyses involving three-allele polymorphisms may rely on a diversity of approaches to summarize effects across studies. However, because the assumption of multiplicative effects yields, on average, the lowest rates of false-positive results in bi-allelic markers [9, 10], we compared the GOR to two approaches that assume a multiplicative mode of action: the per-allele OR, which yields to three correlated odds ratios (OR[A_{3} vs A_{1}], OR[A_{3} vs A_{2}] and OR[A_{2} vs A_{1}]) and the log-additive trend approach.

### The generalized odds ratio

For a binary trait (e.g. case-control studies), GOR measures the probability that a randomly sampled case has a genotype with a higher mutation load (i.e. a larger number of high-risk alleles) than a randomly sampled control divided by the probability that a randomly sampled case has a genotype with lower mutation load than a randomly sampled control [1].

The GOR for a binary trait and an *m*-allelic variant can be computed as [13]:

where *J* is the total number of genotypes (categories) given the number of alleles, i.e., *J* = *m*(*m*+1)/2, *m* is the number of alleles, (i.e. the proportion of the subjects with genotype *j*, for *j* = 1,..,*J*, in which the higher the value of *j*, the higher the mutation load) in the group *i* (*i* = 0 or 1 for controls and cases, respectively). In the present investigation, the large-sample variance for GOR was computed from the asymptotic standard error of the Goodman-Kruskal γ [1]. Stata and R codes to compute the GOR and its large-sample variance are available from the first author upon request.

### Mutational load order

The order of the *j* th genotypic category (i.e. mutational load) for the GOR and log-additive trend is anticipated to impact statistical power. Hence, for the situation *ii*-a (initial agnostic screenings), we set as genotypic order and for situation *ii*-b (meta-analyses from the literature with prior information on the direction of effects).

### Assessment of power and type-I error

Empirical power and type-I error rates (i.e. false-positive discoveries) were computed as the proportion of simulations that gave a two-sided *P*-value < 5%. Because there are three correlated OR estimates for the tri-allelic case for the per-allele model, we corrected the α level using the Dunn-Šidák procedure. Specifically, power and type-I error rates for the per-allele model (tri-allelic case) were computed as the proportion of the simulations that gave one or more *P*-values < α_{
corrected
}, where .

### Real application

We compared results based on the GOR as a summary effect to those obtained by usual approaches of model specification in a real meta-analysis of three independent genome-wide studies in late-onset Alzheimer's disease. After standard control measures, a total of 311,915 bi-allelic polymorphisms were scored in 1411 participants (961 cases and 560 controls). Detailed description on the samples, genotyping platforms and diagnostics criteria are available elsewhere [4]. Results from individual studies were corrected for residual inflation of the test statistic using genomic control methods [14].

### Meta-analysis methods

Meta-analyses were carried out under both fixed- and random-effects models, represented by the general inverse-variance and DerSimonian-Laird methods, respectively [15, 16]. For the real application, statistical heterogeneity was test using the Cochran's *Q* test [11], and quantified using the *I*^{2} index [17].

All simulations were performed in Stata 11.1 package (Stata Corporation), whereas the meta-analysis of real data sets were carried out in PLINK [18].

## References

- 1.
Zintzaras E: The generalized odds ratio as a measure of genetic risk effect in the analysis and meta-analysis of association studies. Stat Appl Genet Mol Biol. 2010, 9: Article21-

- 2.
Lee JH, Cheng R, Graff-Radford N, Foroud T, Mayeux R: Analyses of the National Institute on Aging Late-Onset Alzheimer's Disease Family Study: implication of additional loci. Arch Neurol. 2008, 65: 1518-1526. 10.1001/archneur.65.11.1518.

- 3.
Nyholt DR, Yu CE, Visscher PM: On Jim Watson's APOE status: genetic information is hard to hide. Eur J Hum Genet. 2009, 17: 147-149. 10.1038/ejhg.2008.198.

- 4.
Reiman EM, Webster JA, Myers AJ, Hardy J, Dunckley T, Zismann VL, Joshipura KD, Pearson JV, Hu-Lince D, Huentelman MJ: GAB2 alleles modify Alzheimer's risk in APOE epsilon4 carriers. Neuron. 2007, 54: 713-720. 10.1016/j.neuron.2007.05.022.

- 5.
Hu X, Pickering E, Liu YC, Hall S, Fournier H, Katz E, Dechairo B, John S, Van EP, Soares H: Meta-Analysis for Genome-Wide Association Study Identifies Multiple Variants at the BIN1 Locus Associated with Late-Onset Alzheimer's Disease. PLoS One. 2011, 6: e16616-10.1371/journal.pone.0016616.

- 6.
Shi H, Medway C, Bullock J, Brown K, Kalsheker N, Morgan K: Analysis of Genome-Wide Association Study (GWAS) data looking for replicating signals in Alzheimer's disease (AD). Int J Mol Epidemiol Genet. 2010, 1: 53-66.

- 7.
Wang JL, Wang HG, Gao HQ, Zhai GX, Chang P, Chen YG: Endothelial nitric oxide synthase polymorphisms and erectile dysfunction: a meta-analysis. J Sex Med. 2010, 7: 3889-3898. 10.1111/j.1743-6109.2010.01968.x.

- 8.
Zintzaras E: Is catechol-O-methyl transferase 472G/A gene polymorphism a marker associated with alcoholism?. Psychiatr Genet. 2011, 21: 29-36. 10.1097/YPG.0b013e3283413615.

- 9.
Pereira TV, Patsopoulos NA, Pereira AC, Krieger JE: Strategies for genetic model specification in the screening of genome-wide meta-analysis signals for further replication. Int J Epidemiol. 2011, 40: 457-469. 10.1093/ije/dyq203.

- 10.
Pereira TV, Patsopoulos NA, Salanti G, Ioannidis JP: Discovery properties of genome-wide association signals from cumulatively combined data sets. Am J Epidemiol. 2009, 170: 1197-1206. 10.1093/aje/kwp262.

- 11.
Pereira TV, Patsopoulos NA, Salanti G, Ioannidis JP: Critical interpretation of Cochran's Q test depends on power and prior assumptions about heterogeneity. Research Synthesis Methods. 2010, 1: 149-161. 10.1002/jrsm.13.

- 12.
McCarroll SA: Extending genome-wide association studies to copy-number variation. Hum Mol Genet. 2008, 17: R135-R142. 10.1093/hmg/ddn282.

- 13.
Agresti A: Generalized odds ratios for ordinal data. Biometrics. 1980, 36: 59-67. 10.2307/2530495.

- 14.
Bacanu SA, Devlin B, Roeder K: The power of genomic control. Am J Hum Genet. 2000, 66: 1933-1944. 10.1086/302929.

- 15.
Borenstein M, Hedges L, Higgins JPT, Rothstein HR: A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods. 2010, 1: 97-111. 10.1002/jrsm.12.

- 16.
DerSimonian R, Laird N: Meta-analysis in clinical trials. Control Clin Trials. 1986, 7: 177-188. 10.1016/0197-2456(86)90046-2.

- 17.
Higgins JP, Thompson SG: Quantifying heterogeneity in a meta-analysis. Stat Med. 2002, 21: 1539-1558. 10.1002/sim.1186.

- 18.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.

## Acknowledgements

TVP is funded by grants from the Fundação de Amparo à Pesquisa do Estado de São Paulo (State of São Paulo Research Foundation, FAPESP). The authors are deeply indebted to the two anonymous reviewers for their extensive and valuable comments on the manuscript.

## Author information

## Additional information

### Competing interests

The authors declare that they have no competing interests.

### Authors' contributions

TVP carried out the computational experiments, tabulated the data and drafted the manuscript. TVP and RCMN conceived the study. RCMN participated in its design and coordination and helped to draft the manuscript. Both authors read and approved the final manuscript.

## Electronic supplementary material

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

### Cite this article

Pereira, T.V., Mingroni-Netto, R.C. A note on the use of the generalized odds ratio in meta-analysis of association studies involving bi- and tri-allelic polymorphisms.
*BMC Res Notes* **4, **172 (2011). https://doi.org/10.1186/1756-0500-4-172

Received:

Accepted:

Published:

### Keywords

- Multiplicative Model
- Summary Effect
- Binary Trait
- High Mutation Load
- Generalize Odds Ratio