Please wait a minute...
Frontiers in Biology

ISSN 1674-7984

ISSN 1674-7992(Online)

CN 11-5892/Q

Front. Biol.    2010, Vol. 5 Issue (4) : 354-360    https://doi.org/10.1007/s11515-010-0053-2
Research articles
Statistical considerations for high throughput screening data
Xian-Jin XIE,
Division of Biostatistics, Department of Clinical Sciences & Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA;
 Download: PDF(110 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract High throughput screening (HTS) is a widely used effective approach in genome-wide association and large scale protein expression studies, drug discovery, and biomedical imaging research. How to accurately identify candidate ‘targets’ or biologically meaningful features with a high degree of confidence has led to extensive statistical research in an effort to minimize both false-positive and false-negative rates. A large body of literature on this topic with in-depth statistical contents is available. We examine currently available statistical methods on HTS and aim to summarize some selected methods into a concise, easy-to-follow introduction for experimental biologists.
Keywords high throughput screen      false-positive rate      false-negative rate      target discovery      predictive modeling      
Issue Date: 01 August 2010
 Cite this article:   
Xian-Jin XIE. Statistical considerations for high throughput screening data[J]. Front. Biol., 2010, 5(4): 354-360.
 URL:  
https://academic.hep.com.cn/fib/EN/10.1007/s11515-010-0053-2
https://academic.hep.com.cn/fib/EN/Y2010/V5/I4/354
Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate – a practicaland powerful approach to multiple testing. J Roy Stat Soc Ser B Meth, 57: 289–300
Benjamini Y, Yekutieli D (2001). The control of the false discovery rate in multipletesting under dependency. Ann Stat, 29: 1165–1188
Cao J, Xie X J, Zhang S, Whitehurst A, White M (2009). Bayesian optimal discovery procedure for simultaneous significance testing. BMC Bioinformatics, 10(1): 5

doi: 10.1186/1471-2105-10-5
Cui X, Hwang J T G, Qiu J, Blades N J, Churchill GA (2005). Improved statistical tests for differential gene expression by shrinking variancecomponents estimates. Biostatistics, 6:59–75

doi: 10.1093/biostatistics/kxh018
Farcomeni A (2007). Some results on the control of thefalse discovery rate under dependence. Scand J Stat, 34(2): 275–297

doi: 10.1111/j.1467-9469.2006.00530.x
Grechanovsky E, Hochberg Y (1999). Closed procedures are better and often admit a shortcut. J Statist Plann Inference, 76(1–2): 79–91

doi: 10.1016/S0378-3758(98)00125-6
Hastie T, Tibshirani R, Friedman J (2001). Elements of Statistical Learning: Data Mining, Inference and Prediction. New York: Springer-Verlag
James W, Stein C (1961). Estimation with quadratic loss. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability,volume 1, Berkeley, 1961. University of California Press, 361–379
Jung S H (2005). Sample size for FDR-control in microarraydata analysis. Bioinformatics, 21(14): 3097–3104

doi: 10.1093/bioinformatics/bti456
Jung S H, Bang H, Young S (2005). Sample size calculation for multiple testing in microarray data analysis. Biostatistics, 6(1): 157–169

doi: 10.1093/biostatistics/kxh026
Koch G G, Gansky S A (1996). Statistical considerations for multiplicity in confirmatoryprotocols. Drug Inf J, 30: 523–534
Ling X B, Cohen H, Jin J, Lau I, Schilling J (2009). FDR made easy in differential feature discovery and correlation analyses. Bioinformatics, 25(11): 1461–1462

doi: 10.1093/bioinformatics/btp176
Mayr L M, Bojanic D (2009). Novel trends in high-throughput screening. Curr Opin Pharmacol, 9(5): 580–588

doi: 10.1016/j.coph.2009.08.004
Opgen-Rhein R, Strimmer K (2007). Accurate ranking of differentially expressed genes bya distribution-free shrinkage approach. Stat Appl Genet Mol Biol, 6(1): 9

doi: 10.2202/1544-6115.1252
Owzar K O, Barry W T, Jung S H, Sohn I, George S L (2008). Statistical challenges in preprocessing in microarray experiments in cancer. Clin Cancer Res, 14(19): 5959–5966

doi: 10.1158/1078-0432.CCR-07-4532
Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A (2005). False discovery rate, sensitivity and sample size formicroarray studies. Bioinformatics, 21(13): 3017–3024

doi: 10.1093/bioinformatics/bti448
Ripley B (1996). Pattern Recognition and Neural Networks. Cambridge University Press
Rocke D M (2004). Design and analysis of experimentswith high throughput biological assay data. Semin Cell Dev Biol, 15(6): 703–713
Storey J D (2002). A direct approach to false discoveryrates. J Roy Stat Soc Ser B Meth, 64(3): 479–498

doi: 10.1111/1467-9868.00346
Storey J D (2003). The positive false discovery rate:a Bayesian interpretation and the q-value. Ann Stat, 31(6): 2013–2035

doi: 10.1214/aos/1074290335
Storey J D (2007). The optimal discovery procedure:A new approach to simultaneous significance testing. J R Stat Soc, B, 69(3): 347–368

doi: 10.1111/j.1467-9868.2007.005592.x
Subramanian A, Tamayo P, Mootha V K, Mukherjee S, Ebert B L, Gillette M A, Paulovich A, Pomeroy S L, Golub T R, Lander E S, Mesirov J P (2005). Gene set enrichment analysis: a knowledge-based approachfor interpreting genome-wide expression profiles. Proc Natl Acad Sci USA, 102(43): 15545–15550

doi: 10.1073/pnas.0506580102
Westfall P H, Young S S (1993). Resampling-Based Multiple Testing. New York: John Wiley & Sons, Inc.
Whitehurst A W, Bodemann B O, Cardenas J, Ferguson D, Girard L, Peyton M, Minna J D, Michnoff C, Hao W, Roth M G, Xie X J, White M A (2007). Synthetic lethal screenidentification of chemosensitizer loci in cancer cells. Nature, 446(7137): 815–819

doi: 10.1038/nature05697
Xie X J (2008). On multiple testing, validation ofgene expression profiling, and translational research. Chin Med J (Engl), 121(13): 1247–1248, author reply 1247–1248
Xie X J, Whitehurst A, White M (2007). A practical efficient approach in high throughput screening: using FDR and foldchange. Nat Protoc, doi:10.1038/nprot.2007.188

doi: 10.1038/nprot.2007.188
Yan S F, Asatryan H, Li J, Zhou Y (2005). Novel statistical approach for primaryhigh-throughput screening hit selection. J Chem Inf Model, 45(6): 1784–1790

doi: 10.1021/ci0502808
Yekutieli D, Benjamini Y (1999). Resampling-based false discovery rate controlling multipletest procedures for correlated test statistics. J Statist Plann Inference, 82(1–2): 171–196

doi: 10.1016/S0378-3758(99)00041-5
Zhang J, Quan H, Ng J, Stepanavage M E (1997). Some statistical methods for multiple endpoints in clinicaltrials. Control Clin Trials, 18(3): 204–221

doi: 10.1016/S0197-2456(96)00129-8
Zhang S, Cao J (2009). A close examination of double filtering with fold change and T testin microarray analysis. BMC Bioinformatics, 10(1): 402

doi: 10.1186/1471-2105-10-402
Zhang X D, Heyse J F (2009). Determination of sample size in genome-scale RNAi screens. Bioinformatics, 25(7): 841–844

doi: 10.1093/bioinformatics/btp082
Zhou Y, Young J A, Santrosyan A, Chen K, Yan S F, Winzeler E A (2005). In silico gene function prediction using ontology-based pattern identification. Bioinformatics, 21(7): 1237–1245

doi: 10.1093/bioinformatics/bti111
[1] Liang XUE, W. Andy TAO. Current technologies to identify protein kinase substrates in high throughput[J]. Front Biol, 2013, 8(2): 216-227.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed