One of the main goals of genetic epidemiology is the identification and characterization of polymorphisms that present an increased risk of disease. It is increasingly assumed that complex diseases are the result of a myriad of genetic and environmental risk factors [1, 2]. This complex etiology limits the utility of traditional, parametric statistical approaches in genetic association studies [3, 4]. The ubiquitous nature of gene-gene and gene-environment interactions [1, 5, 6] has inspired the development the novel statistical approaches designed to detect epistasis [7–9].

Multifactor Dimensionality Reduction (MDR) is one such method [10]. MDR was designed to detect interactions in categorical independent variables and a dichotomous dependent variable (*i.e*. case/control status or drug treatment response/non-response). MDR performs an exhaustive search of all possible single-locus through *n*-locus interactions (as computationally feasible) to evaluate all possible high/low risk models of disease. MDR selects a single model as optimal for each *n*-locus interaction as a result of these evaluations. Permutation testing (PT) is used to determine the significance of these models. MDR is nonparametric and model-free, so no hypotheses concerning the value of any statistical parameter nor any genetic inheritance model are made [10]. MDR has successfully identified interactive effects in simulated data as well as real data applications in diseases such as hypertension [3, 11, 12], cancer [10, 13, 14], and atrial fibrillation [15, 16].

The end-goal of an MDR analysis is ultimately hypothesis generation (or refinement within candidate gene strategies) [17]. Hypothesis testing is used within the MDR analysis framework to determine whether resulting models are significantly different than expected by chance. Significance of a model is intended to indicate an interesting model that should be followed up in replication cohorts or functional studies. In recent work, there has been more emphasis on selecting all statistically significant models [17] in order to avoid missing a true signal (false negatives) in exchange for risking the selection of a few false positives. This generation of multiple hypotheses opens up questions about the PT procedure used to ascribe significance to this end set of models.

PT is a commonly used non-parametric statistical procedure that involves re-sampling the data without replacement to actually construct the distribution of the test statistic under the null hypothesis rather than make specific distributional assumptions. If the value of the test statistic based on the original samples is extreme relative to this distribution (i.e. if it falls far into the tail of the distribution), then the null hypothesis is rejected [18]. Validity of PT relies only on the property of exchangeability under the null hypothesis – that the joint distribution of the data samples must remain invariant to permutations of the data subscripts. Thus, permutation tests maintain a wide applicability under a much broader range of data and research conditions than most parametric tests [19]. In addition, PT requires minimal assumptions about the data being examined, yet often has power equal to, or even greater than, parametric counterparts that require stronger, and sometimes untenable data assumptions [20]. Unlike many parametric and other nonparametric tests, the results of permutation tests (the p-values) are unbiased [18]. The chief drawback of this method is that it is computationally expensive, but the easy availability of fast computing has made this a practical approach even for large datasets.

MDR implements PT to statistically test to the best model(s) [21]. Typically, omnibus PT is used, where a single null distribution is generated from the best model of each of at least one thousand randomized datasets. With a focus on selecting all potentially interesting models from the final MDR set, this omnibus method may be too conservative. *n*-locus PT is an alternative, where a separate null distribution is created for each *n*-level of interaction. So if single-locus through five-way interactions were evaluated in an original MDR analysis, a separate distribution would be created for the single-locus model, for the two-locus model, etc (for a total of five null distributions).

Currently, we compare the significance cut-offs, power, and false positive rates of omnibus PT and *n*-locus PT implemented in MDR for a wide range of disease models. We also examine the overall false positive rate of the MDR method using both types of PT. As the MDR method gains acceptance and is increasingly used in the genetics community, it is important that users understand how to properly apply PT.

### Multifactor Dimensionality Reduction (MDR)

Figure 1 (adapted from [10]) outlines the MDR procedure. Details of the algorithm and of the alternative PT strategies implemented in the current study can be found in Additional file 1.

### Data Simulations and Analysis

Simulated datasets that exhibit gene-gene interactions were generated for the purpose of evaluating the power and false positives of MDR using either omnibus or *n*-locus PT. Multiple disease models, as well as null data with no disease model, were generated with varying allele frequencies, heritability, and number of interacting functional polymorphisms. Details of the simulations and analysis are found in Additional file 1.