Methods
The model used for detection of AI in any condition and for comparing levels of AI between any two conditions has been described earlier and implemented in the package BayesASE [7, 22]. We give here the basic definitions and refer the reader to Additional file 1 for further details.
One important parameter in determining AI is the probability r of a read aligning to allele g1 (g2) given that it came from that allele, that we define as \({r}_{i,g1}\) (\({r}_{i,g2}\)). Low values of these probabilities correspond to a high degree of ambiguously mapped reads, which occurs when there is little sequence divergence between the two alleles. Reads that do not map ambiguously are termed allele specific reads or informative reads.
AI in condition i is measured by the parameter \({\theta }_{i}\) representing the proportion of reads originating from the allele g1, which that can be written as follows:
$${\theta }_{i}=\frac{{\mathbb{E}}({x}_{i,k}/{r}_{i,g1})}{{\mathbb{E}}({x}_{i,k}/{r}_{i,g1}+{y}_{i,k}/{r}_{i,g2})}=\frac{1/{\alpha }_{i}}{{\alpha }_{i}+1/{\alpha }_{i}}$$
When \({\theta }_{i}\) is close to 0, we have one extreme case of AI with almost all the reads originating from g2. When \({\theta }_{i}=0.5\), we have perfect allelic balance with 50% of the reads from each allele. With \({\theta }_{i}=1\), we are in the opposite direction of extreme AI with all the reads originating from g1.
The following null hypotheses are defined:
-
1.
Allelic balance in condition 1, i.e. null H1: \({\theta }_{1}=\) 0.5 or equivalently \({\alpha }_{1}=\) 1.
-
2.
Allelic balance in condition 2, i.e. null H2: \({\theta }_{2}=\) 0.5 or equivalently \({\alpha }_{2}=\) 1.
-
3.
Level of AI is the same in both conditions, i.e. null H3:\({\theta }_{1}={\theta }_{2}\) or equivalently \({\alpha }_{1}={\alpha }_{2}\).
To test these hypotheses, three cases are defined (Fig. 1):
-
1.
H1, H2 and H3 are satisfied
-
2.
H1 is satisfied, H2 and H3 are violated
-
3.
H1 and H2 are violated, H3 is satisfied
In our simulation, magnitudes of deviation from the null are reported as \(\Delta AI\). Given \({\theta }_{0}=0.5\),\({\Delta AI}_{1}=\frac{\left|{\theta }_{1}-{\theta }_{0}\right|}{{\theta }_{0}}\) for H1, \({\Delta AI}_{2}=\frac{\left|{\theta }_{2}-{\theta }_{0}\right|}{{\theta }_{0}}\) for H2, and \({\Delta AI}_{3}\) = \(\frac{\left|{\theta }_{2}-{\theta }_{1}\right|}{{\theta }_{1}}\) for H3. Simulated deviations of \(\Delta AI\) from the null are moderate, generally between 0.1 and 0.3, with a maximum of 0.5.
Scenarios that vary the number of allele specific reads, number of replicates, and AI in the different cases were simulated (Additional file 2).
The model used for the detection of AI in any condition and for comparing levels of AI between any two conditions has been described earlier and implemented in the package BayesASE (https://github.com/McIntyre-Lab/BayesASE) [7, 22].
Results
Type I error is controlled except when total allele specific reads exceed 2400 allele specific reads dispersed across 8 or more biological replicates (Fig. 2a, b). However, type I error is always less than 0.08.
Under all conditions Type I error is low (Fig. 2, Additional file 3), and only exceeds the nominal value of 5% in scenarios with very high numbers of allele specific reads (n > 2400) and biological replicates.
Power (Fig. 3) for detecting small deviations from the null (0.1) is less than 0.4 when the number of bioreps is 3 and only exceeds 0.6 when the number of bioreps is greater than 6 and the total number of allele specific reads is large. H1 is rejected with power > 80% when the total number of reads is at least 2400, and the number of independent biological replicates is at least 12 (an average of 200 allele specific reads per biological replicate). Power for rejecting H3 is, as expected, lower than H1 (Fig. 3, Additional file 4). For \(\Delta AI=\) 0.2 (central panels) and 960 informative reads, H1 is rejected with power > 80% with 3 biological replicates (average of 320 reads per replicate). H3 is rejected with power > 80% with 960 informative reads in 8 replicates (average of 120 reads per replicate). When \(\Delta AI=\) 0.3 (bottom panels), power approaches 100% except when the number of informative reads is low (120). As expected, the number of simulations does not affect estimates of power (Additional file 5). Power for the test of H3 for 3 biological replicates is maximal at 640 informative reads (Additional file 6). When \(\Delta AI=\) 0.3, most scenarios have power greater than 80% for both H1 and H3 (Fig. 3, Additional file 4). When \(\Delta AI\) is 0.5, power for both H1 and H3 is ~ 100% even when the total number of informative reads is low. This represents an extreme scenario, but one that is often observed in situations with loss of heterozygosity, indicating that in these scenarios relatively few reads are needed to detect AI with confidence (Additional file 7).
Discussion
Type I error rarely exceeds the nominal value of 5% even for very high numbers of allele specific reads, while increasing the number of allele specific reads substantially increases power. These observations are in agreement with other approaches [5, 15, 21, 36], including with simulations performed using a previous version of this model [22]. However, except for Zou et al. [5], the power of these approaches was not assessed for jointly changing the number of allele specific reads and biological replicates. BayesASE can directly test for a difference in AI between two conditions or genotypes and, accordingly, we can assess how variation in both the number of replicates and reads affects the power to not only detect AI in a condition but differences in AI between conditions.
BayesASE has adequate power to detect moderate deviations from the null hypothesis. However, the minimum number of reads and biological replicates to achieve this power is greater for smaller deviations from the null. Our simulations suggest that a minimum of 2400 informative reads across 12 replicates, 480 informative reads across 5 replicates (or a minimum of 3 replicates with a total of 960 informative reads), and 240 informative reads across 3 replicates results in > 80% power to detect \({\Delta AI}_{1}\) (or \({\Delta AI}_{2}\)) = 0.1, 0.2, and 0.3, respectively. While the power to detect \({\Delta AI}_{3}=\) 0.1 does not surpass 60% in our simulations, we can detect a difference in AI between conditions (\({\Delta AI}_{3})\) of 0.2 and 0.3 with comparable power for the same deviation from the null within a condition with the same number of informative reads but only when spread over more replicates (i.e. 8). A deviation from the null of \(\Delta AI\) = 0.3 has power > 80% in most scenarios and even higher deviations can be detected with almost 100% power. Such large differences are indicative of loss of heterozygosity as observed in cancers [17] and imprinting [9, 10].
The results presented here describe general trends. In order to estimate power (and type I error) for a particular scenario of interest, the simulator developed as a part of this work can be used. The simulator explicitly enables the exploration of the total number of reads relative to the number of informative reads. The number of informative reads depends on the length of the feature (exon, gene), and on the density of polymorphisms. While, it is possible to analyze individual SNPs, investigators should take care to ensure that individual reads are not used in support of multiple SNPs. In addition, both the number allele specific reads (dependent on the distribution of polymorphisms) and the overall number of reads (dependent on library size and expression levels) affect power and should be accounted for in any modeling approach comparing AI between conditions.
One aspect that will be interesting to test in future studies is the behavior of nearby genes. In organisms, such as D. melanogaster, in which topologically associated domains (TADs) aggregate genes with similar expression patterns [37], we could expect that TADs also discriminate different patterns of AI, if the imbalance is due to shared sequence (i.e. polymorphic enhancer), but not if the imbalance is due to gene-private sequence (i.e. polymorphic gene or promoter).
We present results of an extensive simulation study to quantify type I error and power in detecting AI using the model implemented in the BayesASE pipeline [7]. Both number of reads and number of replicates are important, and they both should be maximized. However, for any given number of reads, the best idea is to maximize the number of replicates. This is in agreement with previous studies that suggested that increased biological replication should be favored over increased depth of coverage [5, 38]. This of course should be balanced against the fact that having several replicates is more expensive. This said, we do not recommend designing any biological experiment with less than three biological replicates.