|
|
Statistical considerations for high throughput
screening data |
Xian-Jin XIE, |
Division of Biostatistics,
Department of Clinical Sciences & Simmons Comprehensive Cancer
Center, The University of Texas Southwestern Medical Center, Dallas,
Texas 75390, USA; |
|
|
Abstract High throughput screening (HTS) is a widely used effective approach in genome-wide association and large scale protein expression studies, drug discovery, and biomedical imaging research. How to accurately identify candidate ‘targets’ or biologically meaningful features with a high degree of confidence has led to extensive statistical research in an effort to minimize both false-positive and false-negative rates. A large body of literature on this topic with in-depth statistical contents is available. We examine currently available statistical methods on HTS and aim to summarize some selected methods into a concise, easy-to-follow introduction for experimental biologists.
|
Keywords
high throughput screen
false-positive rate
false-negative rate
target discovery
predictive modeling
|
Issue Date: 01 August 2010
|
|
|
Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate – a practicaland powerful approach to multiple testing. J Roy Stat Soc Ser B Meth, 57: 289–300
|
|
Benjamini Y, Yekutieli D (2001). The control of the false discovery rate in multipletesting under dependency. Ann Stat, 29: 1165–1188
|
|
Cao J, Xie X J, Zhang S, Whitehurst A, White M (2009). Bayesian optimal discovery procedure for simultaneous significance testing. BMC Bioinformatics, 10(1): 5
doi: 10.1186/1471-2105-10-5
|
|
Cui X, Hwang J T G, Qiu J, Blades N J, Churchill GA (2005). Improved statistical tests for differential gene expression by shrinking variancecomponents estimates. Biostatistics, 6:59–75
doi: 10.1093/biostatistics/kxh018
|
|
Farcomeni A (2007). Some results on the control of thefalse discovery rate under dependence. Scand J Stat, 34(2): 275–297
doi: 10.1111/j.1467-9469.2006.00530.x
|
|
Grechanovsky E, Hochberg Y (1999). Closed procedures are better and often admit a shortcut. J Statist Plann Inference, 76(1–2): 79–91
doi: 10.1016/S0378-3758(98)00125-6
|
|
Hastie T, Tibshirani R, Friedman J (2001). Elements of Statistical Learning: Data Mining, Inference and Prediction. New York: Springer-Verlag
|
|
James W, Stein C (1961). Estimation with quadratic loss. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability,volume 1, Berkeley, 1961. University of California Press, 361–379
|
|
Jung S H (2005). Sample size for FDR-control in microarraydata analysis. Bioinformatics, 21(14): 3097–3104
doi: 10.1093/bioinformatics/bti456
|
|
Jung S H, Bang H, Young S (2005). Sample size calculation for multiple testing in microarray data analysis. Biostatistics, 6(1): 157–169
doi: 10.1093/biostatistics/kxh026
|
|
Koch G G, Gansky S A (1996). Statistical considerations for multiplicity in confirmatoryprotocols. Drug Inf J, 30: 523–534
|
|
Ling X B, Cohen H, Jin J, Lau I, Schilling J (2009). FDR made easy in differential feature discovery and correlation analyses. Bioinformatics, 25(11): 1461–1462
doi: 10.1093/bioinformatics/btp176
|
|
Mayr L M, Bojanic D (2009). Novel trends in high-throughput screening. Curr Opin Pharmacol, 9(5): 580–588
doi: 10.1016/j.coph.2009.08.004
|
|
Opgen-Rhein R, Strimmer K (2007). Accurate ranking of differentially expressed genes bya distribution-free shrinkage approach. Stat Appl Genet Mol Biol, 6(1): 9
doi: 10.2202/1544-6115.1252
|
|
Owzar K O, Barry W T, Jung S H, Sohn I, George S L (2008). Statistical challenges in preprocessing in microarray experiments in cancer. Clin Cancer Res, 14(19): 5959–5966
doi: 10.1158/1078-0432.CCR-07-4532
|
|
Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A (2005). False discovery rate, sensitivity and sample size formicroarray studies. Bioinformatics, 21(13): 3017–3024
doi: 10.1093/bioinformatics/bti448
|
|
Ripley B (1996). Pattern Recognition and Neural Networks. Cambridge University Press
|
|
Rocke D M (2004). Design and analysis of experimentswith high throughput biological assay data. Semin Cell Dev Biol, 15(6): 703–713
|
|
Storey J D (2002). A direct approach to false discoveryrates. J Roy Stat Soc Ser B Meth, 64(3): 479–498
doi: 10.1111/1467-9868.00346
|
|
Storey J D (2003). The positive false discovery rate:a Bayesian interpretation and the q-value. Ann Stat, 31(6): 2013–2035
doi: 10.1214/aos/1074290335
|
|
Storey J D (2007). The optimal discovery procedure:A new approach to simultaneous significance testing. J R Stat Soc, B, 69(3): 347–368
doi: 10.1111/j.1467-9868.2007.005592.x
|
|
Subramanian A, Tamayo P, Mootha V K, Mukherjee S, Ebert B L, Gillette M A, Paulovich A, Pomeroy S L, Golub T R, Lander E S, Mesirov J P (2005). Gene set enrichment analysis: a knowledge-based approachfor interpreting genome-wide expression profiles. Proc Natl Acad Sci USA, 102(43): 15545–15550
doi: 10.1073/pnas.0506580102
|
|
Westfall P H, Young S S (1993). Resampling-Based Multiple Testing. New York: John Wiley & Sons, Inc.
|
|
Whitehurst A W, Bodemann B O, Cardenas J, Ferguson D, Girard L, Peyton M, Minna J D, Michnoff C, Hao W, Roth M G, Xie X J, White M A (2007). Synthetic lethal screenidentification of chemosensitizer loci in cancer cells. Nature, 446(7137): 815–819
doi: 10.1038/nature05697
|
|
Xie X J (2008). On multiple testing, validation ofgene expression profiling, and translational research. Chin Med J (Engl), 121(13): 1247–1248, author reply 1247–1248
|
|
Xie X J, Whitehurst A, White M (2007). A practical efficient approach in high throughput screening: using FDR and foldchange. Nat Protoc, doi:10.1038/nprot.2007.188
doi: 10.1038/nprot.2007.188
|
|
Yan S F, Asatryan H, Li J, Zhou Y (2005). Novel statistical approach for primaryhigh-throughput screening hit selection. J Chem Inf Model, 45(6): 1784–1790
doi: 10.1021/ci0502808
|
|
Yekutieli D, Benjamini Y (1999). Resampling-based false discovery rate controlling multipletest procedures for correlated test statistics. J Statist Plann Inference, 82(1–2): 171–196
doi: 10.1016/S0378-3758(99)00041-5
|
|
Zhang J, Quan H, Ng J, Stepanavage M E (1997). Some statistical methods for multiple endpoints in clinicaltrials. Control Clin Trials, 18(3): 204–221
doi: 10.1016/S0197-2456(96)00129-8
|
|
Zhang S, Cao J (2009). A close examination of double filtering with fold change and T testin microarray analysis. BMC Bioinformatics, 10(1): 402
doi: 10.1186/1471-2105-10-402
|
|
Zhang X D, Heyse J F (2009). Determination of sample size in genome-scale RNAi screens. Bioinformatics, 25(7): 841–844
doi: 10.1093/bioinformatics/btp082
|
|
Zhou Y, Young J A, Santrosyan A, Chen K, Yan S F, Winzeler E A (2005). In silico gene function prediction using ontology-based pattern identification. Bioinformatics, 21(7): 1237–1245
doi: 10.1093/bioinformatics/bti111
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|