Statistical considerations for high throughput
screening data

doi:10.1007/s11515-010-0053-2

Front. Biol.

2010, Vol. 5

Issue (4) : 354-360 https://doi.org/10.1007/s11515-010-0053-2

Research articles

Statistical considerations for high throughput screening data

Xian-Jin XIE,

Division of Biostatistics, Department of Clinical Sciences & Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA;

Download: PDF(110 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract High throughput screening (HTS) is a widely used effective approach in genome-wide association and large scale protein expression studies, drug discovery, and biomedical imaging research. How to accurately identify candidate ‘targets’ or biologically meaningful features with a high degree of confidence has led to extensive statistical research in an effort to minimize both false-positive and false-negative rates. A large body of literature on this topic with in-depth statistical contents is available. We examine currently available statistical methods on HTS and aim to summarize some selected methods into a concise, easy-to-follow introduction for experimental biologists.

Keywords high throughput screen false-positive rate false-negative rate target discovery predictive modeling

Issue Date: 01 August 2010

Cite this article:

Xian-Jin XIE. Statistical considerations for high throughput screening data[J]. Front. Biol., 2010, 5(4): 354-360.

URL:

https://academic.hep.com.cn/fib/EN/10.1007/s11515-010-0053-2
https://academic.hep.com.cn/fib/EN/Y2010/V5/I4/354

	Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate – a practicaland powerful approach to multiple testing. J Roy Stat Soc Ser B Meth, 57: 289–300
	Benjamini Y, Yekutieli D (2001). The control of the false discovery rate in multipletesting under dependency. Ann Stat, 29: 1165–1188
	Cao J, Xie X J, Zhang S, Whitehurst A, White M (2009). Bayesian optimal discovery procedure for simultaneous significance testing. BMC Bioinformatics, 10(1): 5 doi: 10.1186/1471-2105-10-5
	Cui X, Hwang J T G, Qiu J, Blades N J, Churchill GA (2005). Improved statistical tests for differential gene expression by shrinking variancecomponents estimates. Biostatistics, 6:59–75 doi: 10.1093/biostatistics/kxh018
	Farcomeni A (2007). Some results on the control of thefalse discovery rate under dependence. Scand J Stat, 34(2): 275–297 doi: 10.1111/j.1467-9469.2006.00530.x
	Grechanovsky E, Hochberg Y (1999). Closed procedures are better and often admit a shortcut. J Statist Plann Inference, 76(1–2): 79–91 doi: 10.1016/S0378-3758(98)00125-6
	Hastie T, Tibshirani R, Friedman J (2001). Elements of Statistical Learning: Data Mining, Inference and Prediction. New York: Springer-Verlag
	James W, Stein C (1961). Estimation with quadratic loss. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability,volume 1, Berkeley, 1961. University of California Press, 361–379
	Jung S H (2005). Sample size for FDR-control in microarraydata analysis. Bioinformatics, 21(14): 3097–3104 doi: 10.1093/bioinformatics/bti456
	Jung S H, Bang H, Young S (2005). Sample size calculation for multiple testing in microarray data analysis. Biostatistics, 6(1): 157–169 doi: 10.1093/biostatistics/kxh026
	Koch G G, Gansky S A (1996). Statistical considerations for multiplicity in confirmatoryprotocols. Drug Inf J, 30: 523–534
	Ling X B, Cohen H, Jin J, Lau I, Schilling J (2009). FDR made easy in differential feature discovery and correlation analyses. Bioinformatics, 25(11): 1461–1462 doi: 10.1093/bioinformatics/btp176
	Mayr L M, Bojanic D (2009). Novel trends in high-throughput screening. Curr Opin Pharmacol, 9(5): 580–588 doi: 10.1016/j.coph.2009.08.004
	Opgen-Rhein R, Strimmer K (2007). Accurate ranking of differentially expressed genes bya distribution-free shrinkage approach. Stat Appl Genet Mol Biol, 6(1): 9 doi: 10.2202/1544-6115.1252
	Owzar K O, Barry W T, Jung S H, Sohn I, George S L (2008). Statistical challenges in preprocessing in microarray experiments in cancer. Clin Cancer Res, 14(19): 5959–5966 doi: 10.1158/1078-0432.CCR-07-4532
	Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A (2005). False discovery rate, sensitivity and sample size formicroarray studies. Bioinformatics, 21(13): 3017–3024 doi: 10.1093/bioinformatics/bti448
	Ripley B (1996). Pattern Recognition and Neural Networks. Cambridge University Press
	Rocke D M (2004). Design and analysis of experimentswith high throughput biological assay data. Semin Cell Dev Biol, 15(6): 703–713
	Storey J D (2002). A direct approach to false discoveryrates. J Roy Stat Soc Ser B Meth, 64(3): 479–498 doi: 10.1111/1467-9868.00346
	Storey J D (2003). The positive false discovery rate:a Bayesian interpretation and the q-value. Ann Stat, 31(6): 2013–2035 doi: 10.1214/aos/1074290335
	Storey J D (2007). The optimal discovery procedure:A new approach to simultaneous significance testing. J R Stat Soc, B, 69(3): 347–368 doi: 10.1111/j.1467-9868.2007.005592.x
	Subramanian A, Tamayo P, Mootha V K, Mukherjee S, Ebert B L, Gillette M A, Paulovich A, Pomeroy S L, Golub T R, Lander E S, Mesirov J P (2005). Gene set enrichment analysis: a knowledge-based approachfor interpreting genome-wide expression profiles. Proc Natl Acad Sci USA, 102(43): 15545–15550 doi: 10.1073/pnas.0506580102
	Westfall P H, Young S S (1993). Resampling-Based Multiple Testing. New York: John Wiley & Sons, Inc.
	Whitehurst A W, Bodemann B O, Cardenas J, Ferguson D, Girard L, Peyton M, Minna J D, Michnoff C, Hao W, Roth M G, Xie X J, White M A (2007). Synthetic lethal screenidentification of chemosensitizer loci in cancer cells. Nature, 446(7137): 815–819 doi: 10.1038/nature05697
	Xie X J (2008). On multiple testing, validation ofgene expression profiling, and translational research. Chin Med J (Engl), 121(13): 1247–1248, author reply 1247–1248
	Xie X J, Whitehurst A, White M (2007). A practical efficient approach in high throughput screening: using FDR and foldchange. Nat Protoc, doi:10.1038/nprot.2007.188 doi: 10.1038/nprot.2007.188
	Yan S F, Asatryan H, Li J, Zhou Y (2005). Novel statistical approach for primaryhigh-throughput screening hit selection. J Chem Inf Model, 45(6): 1784–1790 doi: 10.1021/ci0502808
	Yekutieli D, Benjamini Y (1999). Resampling-based false discovery rate controlling multipletest procedures for correlated test statistics. J Statist Plann Inference, 82(1–2): 171–196 doi: 10.1016/S0378-3758(99)00041-5
	Zhang J, Quan H, Ng J, Stepanavage M E (1997). Some statistical methods for multiple endpoints in clinicaltrials. Control Clin Trials, 18(3): 204–221 doi: 10.1016/S0197-2456(96)00129-8
	Zhang S, Cao J (2009). A close examination of double filtering with fold change and T testin microarray analysis. BMC Bioinformatics, 10(1): 402 doi: 10.1186/1471-2105-10-402
	Zhang X D, Heyse J F (2009). Determination of sample size in genome-scale RNAi screens. Bioinformatics, 25(7): 841–844 doi: 10.1093/bioinformatics/btp082
	Zhou Y, Young J A, Santrosyan A, Chen K, Yan S F, Winzeler E A (2005). In silico gene function prediction using ontology-based pattern identification. Bioinformatics, 21(7): 1237–1245 doi: 10.1093/bioinformatics/bti111

[1]	Liang XUE, W. Andy TAO. Current technologies to identify protein kinase substrates in high throughput[J]. Front Biol, 2013, 8(2): 216-227.

Viewed

Full text

Abstract

Cited

Shared

Discussed