In the study, pairs of datasets were simulated, consisting of two samples from the normal distribution. Four parameter settings were selected to model the presence of a small, medium, large and no effect size. For each of the four effect size settings, 10,000 datasets were generated. The simulations were repeated for different sample sizes n = 10 to n = 100 in steps of size 10 to investigate the influence of sample size on the indices. In each case, the traditional p-value, the Bayes factor [7], the 95% and full ROPE [4], the probability of direction [11], the MAP-based p-value [12] and the e-value [13] were computed. The Bayes factor was computed as the Jeffreys-Zellner-Siow Bayes factor for the null hypothesis \(H_{0} :\delta = 0\) of no effect against the alternative hypothesis \(H_{0} :\delta \ne 0\), see [14] for details. The calculated quantities thus were the Bayes factor and the posterior distribution, which was then used to compute the other indices.
The entire procedure was repeated in three different prior settings for the effect size \(\delta\): The noninformative Jeffrey’s prior was always put on the standard deviation of the normal population, and a Cauchy prior \(C\left( {0,\gamma } \right)\) was placed on the standardised effect size. The settings \(C\left( {0,\sqrt 2 /2} \right)\), \(C\left( {0,1} \right)\) and \(C\left( {0,\sqrt 2 } \right)\) were selected, which correspond to a medium, wide and ultrawide prior on the effect size \(\delta\). Data file 1 contains the R code to simulate the data for the setting of the medium prior, and data file 2 and data file 3 contain the code for simulation of the data for the wide and ultrawide prior settings. Note that instead of the raw simulation data, the replication scripts are provided here. These provide both the raw simulation data (the seed of the random number generator is fixed in each case to guarantee reproducibility) and the analysis results so that other researchers should benefit more from the replication scripts than from the raw simulation data.
The above procedure was repeated for the fixed sample size \(n = 30\) to investigate the influence of noise, too. Gaussian noise \(N\left( {0,\varepsilon } \right)\) was added to the group data \(x\) and \(y\), where \(\varepsilon\) ranged from \(\varepsilon = 0.5\) to \(\varepsilon = 5\) in steps of \(0.5\). Data file 4 includes the R script to simulate the data for the influence of noise.
The percentage of significant results was computed for samples of increasing size \(n\) as the number of significant results divided by 10,000, which is an estimate for the type I error probabilities of the indices. The following significance thresholds were used in the files: A Bayes factor needed to be equal to or larger than three. The MAP-based p-value and traditional p-value were significant when \(p_{MAP} < .05\) and p < 0.05. Details on the thresholds for the significance of the ROPE, probability of direction and e-value can be found in the original study [15]. Data file 5 contains the code to simulate the data for the type I error rates of each index. In each case, the wide Cauchy prior \(C\left( {0,1} \right)\) was used to guarantee comparability of the results.