|
|
Application of Bayesian networks on large-scale
biological data |
Yi LIU1,Jing-Dong J. HAN1, 2, |
1.Chinese Academy of Sciences
Key Laboratory of Molecular Developmental Biology, Center for Molecular
Systems Biology, Institute of Genetics and Developmental Biology,
Chinese Academy of Sciences, Beijing 100101, China; 2.2010-07-06 15:15:16; |
|
|
Abstract The investigation of the interplay between genes, proteins, metabolites and diseases plays a central role in molecular and cellular biology. Whole genome sequencing has made it possible to examine the behavior of all the genes in a genome by high-throughput experimental techniques and to pinpoint molecular interactions on a genome-wide scale, which form the backbone of systems biology. In particular, Bayesian network (BN) is a powerful tool for the ab-initial identification of causal and non-causal relationships between biological factors directly from experimental data. However, scalability is a crucial issue when we try to apply BNs to infer such interactions. In this paper, we not only introduce the Bayesian network formalism and its applications in systems biology, but also review recent technical developments for scaling up or speeding up the structural learning of BNs, which is important for the discovery of causal knowledge from large-scale biological datasets. Specifically, we highlight the basic idea, relative pros and cons of each technique and discuss possible ways to combine different algorithms towards making BN learning more accurate and much faster.
|
Keywords
Bayesian networks (BN)
large-scale biological data
|
Issue Date: 01 April 2010
|
|
|
Akaike H(1974). A new look at the statistical modelidentification. IEEE Trans Automat Control, 19(6): 716―723
doi: 10.1109/TAC.1974.1100705
|
|
Chickering D M(1995). A Transformational Characterizationof Equivalent Bayesian Network Structures. Proc 11th Ann Conf Uncertainty Artif Intell, 87―98
|
|
Cvijovic D, Klinowski J(1995). Taboo search- an approach to the multiple minima problem. Science, 267: 664―666
doi: 10.1126/science.267.5198.664
|
|
Efron B, Hastie T, Johnstone I, Tibshirani R(2004). Least angle regression. Ann Statis, 32: 407―499
doi: 10.1214/009053604000000067
|
|
Friedman N(1997). Learning Belief Networks in the Presenceof Missing Values and Hidden Variables. Proc 14th Intl Conf Mach Learn, 125―133
|
|
Fu S, Desmarais M(2008). Fast Markov Blanket Discovery Algorithm Via Local Learningwithin Single Pass. Canadian Conf AI, 96―107
|
|
Geiger D, Heckerman D(1995). Learning Gaussian Networks. Proc 10th Ann Conf Uncertainty Artif Intell, 235―243
|
|
Geman S, Geman D(1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restorationof images. IEEE Trans Automat Control, 6(6): 721―741
|
|
Giudici P, Castelo R(2003). Improving Markov chain Monte Carlo Model search fordata mining. Mach Learn, 50(1―2): 127―158
doi: 10.1023/A:1020202028934
|
|
Grünwald P(2007). The Minimum Description Length principle. Cambridge, MA: MIT Press
|
|
Heckerman D(1999). A Tutorial on Learning with BayesianNetworks. In: Jordan M, ed. Learning in Graphical Models. Cambridge, MA: MIT Press
|
|
Heckerman D, Geiger D, Chickering D M(1995). LearningBayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning, 20(3): 197―243
|
|
Koivisto M(2006). Advances in Exact Bayesian StructureDiscovery in Bayesian Networks. Proc 22ndConf Uncertainty Artif Intell
|
|
Koivisto M, Sood K(2004). Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res, 5: 549―573
|
|
Koller D, Friedman N(2009). Probabilistic Graphical Models: Principles and Techniques. Cambridge, MA: MIT Press
|
|
Lauritzen S L, Spiegelhalter D J(1988). Local computations with probabilities on graphical structuresand their application to expert systems. J Royal Statist Society. Series B (Methodological), 50(2): 157―224
|
|
Meek C(1995). Causal inference and causal explanationwith background knowledge. Proc 11th AnnConf Uncertainty Artif Intell: 403―410
|
|
Moore A W, Lee M S(1998). Cached sufficient statistics for efficient machine learning withlarge datasets. J Artif Intell Res (JAIR)8: 67―91
|
|
Peña J M, Nilsson R, Björkegren J, Tegnér J(2007). Towards scalable and data efficientlearning of Markov boundaries. Intl J ApproxReasoning, 45(2): 211―232
doi: 10.1016/j.ijar.2006.06.008
|
|
Pearl J(1988). Probabilistic Reasoning in IntelligentSystems: Networks of Plausible Inference. San Fransisco, CA: Morgan KaufmannPublishers
|
|
Pearl J, Verma T(1991). A Theory of Inferred Causation. Proc 2ndIntl Conf Princip Knowledge Representation and Reasoning (KR'91): 441―452
|
|
Schwarz G E(1978). Estimating the dimension of a model. Ann Statis, 6(2): 461―464
doi: 10.1214/aos/1176344136
|
|
Silander T, Myllymäki P(2006). A Simple Approach for Finding the Globally Optimal BayesianNetwork Structure. Proc 22nd Conf UncertaintyArtif Intell
|
|
Spirtes P, Glymour C, Scheines R(2001). Causation,Prediction, and Search, 2nd ed. Cambridge,MA: MIT Press
|
|
Tsamardinos I, Brown L E, Aliferis C F(2006). The max-minhill-climbing Bayesian network structure learning algorithm. Mach Learn, 65(1): 31―78
doi: 10.1007/s10994-006-6889-7
|
|
van Steensel B, Braunschweig U, Filion G J, Chen M, van Bemmel J G, Ideker T(2010). Bayesiannetwork analysis of targeting interactions in chromatin. Genome Res, 20: 190―200
doi: 10.1101/gr.098822.109
|
|
Verma T, Pearl J(1991). Equivalence and synthesis of causal models. Proc Sixth Ann Conf Uncertainty Artif Intell, 255―270
|
|
Xie X, Geng Z(2008). A recursive method for structural learning of directed acyclic graphs. J Mach Learn Res, 9: 459―483
|
|
Yu H, Zhu S S, Zhou B, Xue H L, Han J D J(2008). Inferringcausal relationships among different histone modifications and geneexpression. Genome Res, 18(8): 1314―1324
doi: 10.1101/gr.073080.107
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|