Genome-wide screens for effective siRNAs through assessing the size of siRNA effects
- Xiaohua Douglas Zhang^{1}Email author
https://doi.org/10.1186/1756-0500-1-33
© Zhang; licensee BioMed Central Ltd. 2008
Received: 17 May 2008
Accepted: 23 June 2008
Published: 23 June 2008
Abstract
Background
RNA interference (RNAi) has been seen as a revolution in functional genomics and system biology. Genome-wide RNAi research relies on the development of RNAi high-throughput screening (HTS) assays. One of the most fundamental challenges in RNAi HTS is to glean biological significance from mounds of data, which relies on the development of effective analytic methods for selecting effective small interfering RNAs (siRNAs).
Findings
Based on a recently proposed parameter, strictly standardized mean difference (SSMD), I propose an analytic method for genome-wide screens of effective siRNAs through assessing and testing the size of siRNA effects. Central to this method is the capability of SSMD in quantifying siRNA effects. This method has relied on normal approximation, which works only in the primary screens but not in the confirmatory screens. In this paper, I explore the non-central t-distribution property of SSMD estimates and use this property to extend the SSMD-based method so that it works effectively in either primary or confirmatory screens as well as in any HTS screens with or without replicates. The SSMD-based method maintains a balanced control of false positives and false negatives.
Conclusion
The central interest in genome-wide RNAi research is the selection of effective siRNAs which relies on the development of analytic methods to measure the size of siRNA effects. The new analytic method for hit selection provided in this paper offers a good analytic tool for selecting effective siRNAs, better than current analytic methods, and thus may have broad utility in genome-wide RNAi research.
Findings
Background
Mean difference, fold change, percent inhibition, percent activity, percent viability, Z-score and their robust versions have been used to quantify effect size of an siRNA or a compound in HTS assays [1–7]. However, these metrics have issues in capturing data variability or being affected by sample size and hence cannot effectively assess the size of effect. The p-values from the Z-score method (or equivalently Mean ± k SD and its variant Median ± k MAD) and classical t-test have widely been used to evaluate the chance of including siRNAs with no specific impact [1, 2, 5–7]. However, it is mean difference that these methods aim to test, and it is well-known that mean difference cannot effectively measure the magnitude of impact. In addition, the p-value from the Z-score method or t-test is affected by both sample size and the size of siRNA effect.
A recently proposed parameter, strictly standardized mean difference (SSMD) [8], measures the magnitude of impact more effectively than any other currently used metrics. SSMD has been applied for quality control in genome-scale RNAi research [8–10]. Utilizing the fact that SSMD effectively measures the size of effect, Zhang proposes an SSMD-based hit selection method to maintain a balanced control of both FPR and FNR [11]. This method has also been applied to select hits in RNAi HTS primary experiments [12]. However, this method is based on normal approximation, which works only in the primary screens but not in the confirmatory screens. Here I construct a new analytic method for hit selection in HTS assays using non-central t-distribution property of SSMD estimates. This method works effectively whether sample size is small or large.
Issues of hit selection methods in primary screens
Issues of hit selection methods in screens with replicates
In all confirmatory HTS screens and some primary screens, there are several sets of source plates. Each set is unique and has replicates (usually triplicates), thus each siRNA has replicates. Because plate-to-plate variability is usually higher than within-plate variability, a paired t-test is often used for hit selection in a confirmatory screen. That is, for each siRNA, we calculate the difference between the measured intensity of the siRNA and average intensity of a negative control in a plate, then calculate the corresponding p-value of the paired t-test in which the null hypothesis of zero mean difference is tested.
It is clear that the magnitude of paired difference for siRNA A is much larger than for siRNA B although the mean of the paired difference in Panel A1 (i.e., 2.5) is smaller than that in Panel B1 (i.e., 3). The black points in Panels A2 and B2 (i.e., random draws from the populations in Panels A1 and B1 respectively) also demonstrate that the magnitude of the paired difference in Panel A1 is larger than that in Panel B1. Therefore, a good metric for the assessment of siRNA impact should have a larger (or smaller) value for siRNA A than the value for siRNA B in the case where a large (or small) value of this metric indicates a large effect size. The t-value and p-value are both affected by sample size; thus we may obtain a larger p-value (or smaller t-value) corresponding to the samples from siRNA A than from siRNA B. For example, the p-values corresponding to the samples with 2 or 3 replicates from siRNA A are larger than the p-values corresponding to the samples with at least 10 replicates from siRNA B (Panels A4 and B4 of Figure 2). Therefore, the p-value from t-test or Z-score method cannot effectively measure the strength of siRNA impact.
Assessment of siRNA effects using SSMD
SSMD is a statistical parameter that measures the magnitude of both paired and unpaired differences and thus can be used to measure the magnitude of impact of siRNAs in both primary and confirmatory screens. For example, the values of SSMD between the siRNA and the negative control are 1.33, 3.54 and 3.54 in Plate D, E and F respectively, which appropriately indicates that the effect of the siRNA in Plate D is less than in Plates E and F and that the effect of the siRNA in Plate E is the same as in Plate F (Figure 1). The population values of SSMD for siRNA A and siRNA B are 3.54 and 0.71 respectively (Figure 2). The estimated SSMD values (denoted by the green points) all fall around the population values of SSMD (denoted by the orange lines) and do not have an increasing trend as sample size increases (Panels A3 and B3 of Figure 2). All these results indicate that SSMD appropriately indicates the effect size of an siRNA, better than percent inhibition/viability and p-value from t-test of testing no mean difference.
Based on both original and probability meanings of SSMD, an SSMD-based 1-2-3 rule [11], along with its extended version, has been proposed for classifying siRNA impact. The SSMD-based 1-2-3 rules provide a guideline for classifying the strength of siRNA impact. For example, in Figure 1, the siRNA in Plate D is classified as "moderate inhibition effect" and the siRNAs in Plates E and F are both classified as "strong inhibition effect". In Figure 2, siRNAs A and B are classified as "strong inhibition effect" and "weak inhibition effect" respectively. The 1-2-3 rule and extended 1-2-3 rule work in the situation where the population value of SSMD is known; they also work reasonably when sample size is large. In practice, the population value of SSMD is unknown and sample size is small especially in confirmatory RNAi HTS experiments. In such a case, we can provide a point estimate and a confidence interval of SSMD for each siRNA based on its estimated SSMD value [see additional file 1].
A balanced control of false positives and false negatives
SSMD-based decision rules and their false negative levels (FNLs) and restricted false positive levels (RFPLs) for hit selection in RNAi HTS experiments
I: Select up-regulated siRNAs (c_{1} ≥ c_{2} ≥ 0) | ||
---|---|---|
Selection Criterion | FNL | RFPL |
Ia: $\stackrel{\u02c6}{\beta}$ ≥ β* | ${\text{F}}_{t(\nu ,b{c}_{1})}\left({\scriptscriptstyle \frac{\beta *}{k}}\right)$ | $\text{1}-{\text{F}}_{t(\nu ,b{c}_{2})}\left({\scriptscriptstyle \frac{\beta *}{k}}\right)$ |
Ib: $\stackrel{\u02c6}{\beta}\ge k{Q}_{t(\nu ,b{c}_{1})}({\alpha}_{1})$ | α _{1} | $\text{1}-{\text{F}}_{t(\nu ,b{c}_{2})}\left({Q}_{t(\nu ,b{c}_{1})}({\alpha}_{1})\right)$ |
Ic: $\stackrel{\u02c6}{\beta}\ge k{Q}_{t(\nu ,b{c}_{2})}(1-{\alpha}_{2})$ | ${\text{F}}_{t(\nu ,b{c}_{1})}\left({Q}_{t(\nu ,b{c}_{2})}(1-{\alpha}_{2})\right)$ | α _{2} |
II: Select down-regulated siRNAs ( c_{1} ≤ c_{2} ≤ 0) | ||
Selection Criterion | FNL | RFPL |
IIa: $\stackrel{\u02c6}{\beta}$ ≤ β* | $\text{1}-{\text{F}}_{t(\nu ,b{c}_{1})}\left({\scriptscriptstyle \frac{\beta *}{k}}\right)$ | ${\text{F}}_{t(\nu ,b{c}_{2})}\left({\scriptscriptstyle \frac{\beta *}{k}}\right)$ |
IIb: $\stackrel{\u02c6}{\beta}\le k{Q}_{t(\nu ,b{c}_{1})}(1-{\alpha}_{1})$ | α _{1} | ${\text{F}}_{t(\nu ,b{c}_{2})}\left({Q}_{t(\nu ,b{c}_{1})}(1-{\alpha}_{1})\right)$ |
IIc: $\stackrel{\u02c6}{\beta}\le k{Q}_{t(\nu ,b{c}_{2})}({\alpha}_{2})$ | $\text{1}-{\text{F}}_{t(\nu ,b{c}_{1})}\left({Q}_{t(\nu ,b{c}_{2})}({\alpha}_{2})\right)$ | α _{2} |
The choice of an exact cutoff between 1.4 and 2.1 (or between -2.1 and -1.4) in a real primary experiment relies on the refined tolerance of false positives and false negatives and the capacity of follow-up studies after that experiment. For example, if one has a low tolerance in missing hits with SSMD greater than 2 or 3 (or less than -2 or -3), one may choose a cutoff between 1.4 and 1.6 (or between -1.6 and -1.4). On the other hand, if follow-up studies have a low capacity of including selected hits, one may choose a cutoff between 1.8 and 2.1 (or between -2.1 and -1.8). These cutoffs may maintain a balanced control of both RFPR for including siRNAs with weak or no effects and FNR for excluding siRNAs with strong effects.
Discussion
SSMD is usually applied to the measured intensity of each siRNA individually. In some screens, there may be a need to pool multiple measured values to a single value. For example, in the situations where there are two or more wells for each siRNA in a plate, we may use the mean or median of these replicates to represent the measured intensity of this siRNA. In screens where multiple siRNAs are designed to target the same gene to account for off-target effects, there may be a need to pool information across these siRNAs to form a single value for a gene. In those situations, SSMD can be applied to the pooled value for either an siRNA or a gene especially when the pooled value has a symmetric or nearly normal distribution.
Declarations
Acknowledgements
The author would like to thank Drs. Daniel Holder, Keith Soper and Joseph Heyse for their support in this research.
Authors’ Affiliations
References
- Chung NJ, Zhang XD, Kreamer A, Locco L, Kuan PF, Bartz S, Linsley PS, Ferrer M, Strulovici B: Median absolute deviation to improve hit selection for genome-scale RNAi screens. Journal of Biomolecular Screening. 2008, 13: 149-158. 10.1177/1087057107312035.View ArticlePubMedGoogle Scholar
- Espeseth AS, Huang Q, Gates A, Xu M, Yu Y, Simon AJ, Shi XP, Zhang XD, Hodor P, Stone DJ, Burchard J, Cavet G, Bartz S, Linsley P, Ray WJ, Hazuda D: A genome wide analysis of ubiquitin ligases in APP processing identifies a novel regulator of BACE1 mRNA levels. Molecular and Cellular Neuroscience. 2006, 33: 227-235. 10.1016/j.mcn.2006.07.003.View ArticlePubMedGoogle Scholar
- Gou D, Narasaraju T, Chintagari NR, Jin N, Wang PC, Liu L: Gene silencing in alveolar type II cells using cell-specific promoter in vitro and in vivo. Nucleic Acids Research. 2004, 32:Google Scholar
- Gou D, Zhang H, Baviskar PS, Liu L: Primer extension-based method for the generation of a siRNA/miRNA expression vector. Physiological Genomics. 2007, 31: 554-562. 10.1152/physiolgenomics.00005.2007.View ArticlePubMedGoogle Scholar
- Malo N, Hanley JA, Cerquozzi S, Pelletier J, Nadon R: Statistical practice in high-throughput screening data analysis. Nature Biotechnology. 2006, 24: 167-175. 10.1038/nbt1186.View ArticlePubMedGoogle Scholar
- Zhang XD, Yang XC, Chung NJ, Gates A, Stec E, Kunapuli P, Holder DJ, Ferrer M, Espeseth AS: Robust statistical methods for hit selection in RNA interference high-throughput screening experiments. Pharmacogenomics. 2006, 7: 299-309. 10.2217/14622416.7.3.299.View ArticlePubMedGoogle Scholar
- Zuck P, Murray EM, Stec E, Grobler JA, Simon AJ, Strulovici B, Inglese J, Flores OA, Ferrer M: A cell-based beta-lactamase reporter gene assay for the identification of inhibitors of hepatitis C virus replication. Analytical Biochemistry. 2004, 334: 344-355. 10.1016/j.ab.2004.07.031.View ArticlePubMedGoogle Scholar
- Zhang XD: A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics. 2007, 89: 552-561. 10.1016/j.ygeno.2006.12.014.View ArticlePubMedGoogle Scholar
- Zhang XD: Novel analytic criteria and effective plate designs for quality control in genome-wide RNAi screens. Journal of Biomolecular Screening. 2008, 13: 363-377. 10.1177/1087057108317062.View ArticlePubMedGoogle Scholar
- Zhang XD, Espeseth AS, Johnson EN, Chin J, Gates A, Mitnaul LJ, Marine SD, Tian J, stec EM, Kunapuli P, Holder DJ, Heyse JF, Strulovici B, Ferrer M: Integrating experimental and analytic approaches to improve data quality in genome-wide screens. Journal of Biomolecular Screening. 2008, 13: 378-389. 10.1177/1087057108317145.View ArticlePubMedGoogle Scholar
- Zhang XD: A new method with flexible and balanced control of false negatives and false positives for hit selection in RNA interference high-throughput screening assays. Journal of Biomolecular Screening. 2007, 12: 645-655. 10.1177/1087057107300645.View ArticlePubMedGoogle Scholar
- Zhang XD, Ferrer M, Espeseth AS, Marine SD, Stec EM, Crackower MA, Holder DJ, Heyse JF, Strulovici B: The use of strictly standardized mean difference for hit selection in primary RNA interference high-throughput screening experiments. Journal of Biomolecular Screening. 2007, 12: 497-509. 10.1177/1087057107300646.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.