|
|
On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications |
Lei XU() |
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China |
|
|
Abstract As a supplementary of [Xu L. Front. Electr. Electron. Eng. China, 2010, 5(3): 281-328], this paper outlines current status of efforts made on Bayesian Ying-Yang (BYY) harmony learning, plus gene analysis applications. At the beginning, a bird’s-eye view is provided via Gaussian mixture in comparison with typical learning algorithms and model selection criteria. Particularly, semi-supervised learning is covered simply via choosing a scalar parameter. Then, essential topics and demanding issues about BYY system design and BYY harmony learning are systematically outlined, with a modern perspective on Yin-Yang viewpoint discussed, another Yang factorization addressed, and coordinations across and within Ying-Yang summarized. The BYY system acts as a unified framework to accommodate unsupervised, supervised, and semi-supervised learning all in one formulation, while the best harmony learning provides novelty and strength to automatic model selection. Also, mathematical formulation of harmony functional has been addressed as a unified scheme for measuring the proximity to be considered in a BYY system, and used as the best choice among others. Moreover, efforts are made on a number of learning tasks, including a mode-switching factor analysis proposed as a semi-blind learning framework for several types of independent factor analysis, a hidden Markov model (HMM) gated temporal factor analysis suggested for modeling stationary temporal dependence, and a two-level hierarchical Gaussian mixture extended to cover semi-supervised learning, as well as a manifold learning modified to facilitate automatic model selection. Finally, studies are applied to the problems of gene analysis, such as genome-wide association, exome sequencing analysis, and gene transcriptional regulation.
|
Keywords
Bayesian Ying-Yang (BYY) harmony learning
harmony functional
automatic model selection
Gaussian mixture
hidden Markov model (HMM) gated temporal factor analysis
hierarchical Gaussian mixture
manifold learning
semi-supervised learning
semi-blind learning
genome-wide association
exome sequencing analysis
gene transcriptional regulation
|
Corresponding Author(s):
XU Lei,Email:lxu@cse.cuhk.edu.hk
|
Issue Date: 05 March 2012
|
|
1 |
Xu L. Bayesian Ying-Yang system, best harmony learning, and five action circling. A special issue on Emerging Themes on Information Theory and Bayesian Approach. Frontiers of Electrical and Electronic Engineering in China , 2010, 5(3): 281-328 doi: 10.1007/s11460-010-0108-9
|
2 |
Xu L. Bayesian-Kullback coupled YING-YANG machines: Unified learning and new results on vector quantization. In: Proceedings of the International Conference on Neural Information Processing . 1995, 977-988 (A further version in NIPS8. In: Touretzky D S, et al. eds. Cambridge: MIT Press, 444-450)
|
3 |
Xu L. Codimensional matrix pairing perspective of BYY harmony learning: Hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (A). Frontiers of Electrical and Electronic Engineering in China , 2011, 6(1): 86-119 doi: 10.1007/s11460-011-0135-1
|
4 |
Xu L. Advances on BYY harmony learning: Information theoretic perspective, generalized projection geometry, and independent factor autodetermination. IEEE Transactions on Neural Networks , 2004, 15(4): 885-902 doi: 10.1109/TNN.2004.828767
|
5 |
Xu L. Temporal BYY encoding, Markovian state spaces, and space dimension determination. IEEE Transactions on Neural Networks , 2004, 15(5): 1276-1295 doi: 10.1109/TNN.2004.833302
|
6 |
Xu L. Bayesian Ying Yang system, best harmony learning, and Gaussian manifold based family. In: Zurada et al. eds. Computational Intelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures . Lecture Notes in Computer Science , 2008, 5050: 48-78
|
7 |
Shi L, Tu S K, Xu L. Learning Gaussian mixture with automatic model selection: A comparative study on three Bayesian related approaches. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China , 2011, 6(2): 215-244 doi: 10.1007/s11460-011-0153-z
|
8 |
Shore J. Minimum cross-entropy spectral analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing , 1981, 29(2): 230-237 doi: 10.1109/TASSP.1981.1163539
|
9 |
Burg J P, Luenberger D G, Wenger D L. Estimation of structured covariance matrices. Proceedings of the IEEE , 1982, 70(9): 963-974 doi: 10.1109/PROC.1982.12427
|
10 |
Jaynes E T. Information theory and statistical mechanics. Physical Review , 1957, 106(4): 620-630 doi: 10.1103/PhysRev.106.620
|
11 |
Schwarz G. Estimating the dimension of a model. Annals of Statistics , 1978, 6(2): 461-464 doi: 10.1214/aos/1176344136
|
12 |
MacKay D J C. A practical Bayesian framework for backpropagation networks. Neural Computation , 1992, 4(3): 448-472 doi: 10.1162/neco.1992.4.3.448
|
13 |
Attias H. A variational Bayesian framework for graphical models. Advances in Neural Information Processing Systems , 2000, 12: 209-215
|
14 |
McGrory C A, Titterington D M. Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis , 2007, 51(11): 5352-5367 doi: 10.1016/j.csda.2006.07.020
|
15 |
Amari S I, Cichocki A, Yang H. A new learning algorithm for blind separation of sources. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advances in Neural Information Processing System 8 . Cambridge: MIT Press, 1996, 757-763
|
16 |
Bell A J, Sejnowski T J. An information-maximization approach to blind separation and blind deconvolution. Neural Computation , 1995, 7(6): 1129-1159 doi: 10.1162/neco.1995.7.6.1129
|
17 |
Xu L. Independent subspaces. In: Ramón J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence . Hershey, PA: IGI Global, 2008, 903-912 doi: 10.4018/978-1-59904-849-9.ch132
|
18 |
Bahl L, Brown P, de Souza P, Mercer R. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of 1986 IEEE International Conference on Acoustics, Speech, and Signal Processing . 1986, 11: 49-52
|
19 |
Valtchev V, Odell J J, Woodland P C, Young S J. MMIE training of large vocabulary recognition systems. Speech Communication , 1997, 22(4): 303-314 doi: 10.1016/S0167-6393(97)00029-0
|
20 |
Liao J C, Boscolo R, Yang Y L, Tran L M, Sabatti C, Roychowdhury V P. Network component analysis: Reconstruction of regulatory signals in biological systems. Proceedings of the National Academy of Sciences of the United States of America , 2003, 100(26): 15522-15527 doi: 10.1073/pnas.2136632100
|
21 |
Brynildsen M P, Tran L M, Liao J C. A Gibbs sampler for the identification of gene expression and network connectivity consistency. Bioinformatics , 2006, 22(24): 3040-3046 doi: 10.1093/bioinformatics/btl541
|
22 |
Redner R A, Walker H F. Mixture densities, maximum likelihood, and the EM algorithm. SIAM Review , 1984, 26(2): 195-239 doi: 10.1137/1026034
|
23 |
Xu L, Jordan M I. On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation , 1996, 8(1): 129-151 doi: 10.1162/neco.1996.8.1.129
|
24 |
Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Transactions on Neural Networks , 1993, 4(4): 636-649 doi: 10.1109/72.238318
|
25 |
Xu L. Best harmony, unified RPCL and automated model selection for unsupervised and supervised learning on Gaussian mixtures, three-layer nets and ME-RBF-SVM models. International Journal of Neural Systems , 2001, 11(1): 43-69 doi: 10.1016/S0129-0657(01)00049-7
|
26 |
Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (I) Unsupervised and semi-unsupervised learning. In: Amari S, Kassabov N, eds. Brain-like Computing and Intelligent Information Systems. Springer-Verlag , 1997, 241-274
|
27 |
Salah A A, Alpaydin E. Incremental mixtures of factor analyzers. In: Proceedings the 17th International Conference on Pattern Recognition . 2004, 1: 276-279
|
28 |
Williams P M. Bayesian regularization and pruning using a Laplace prior. Neural Computation , 1995, 7(1): 117-143 doi: 10.1162/neco.1995.7.1.117
|
29 |
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B: Methodological , 1996, 58(1): 267-288
|
30 |
Figueiredo M A F, Jain A K. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2002, 24(3): 381-396 doi: 10.1109/34.990138
|
31 |
Corduneanu A, Bishop C M. Variational Bayesian model selection for mixture distributions. In: Richardson T, Jaakkola T, eds. Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics . 2001, 27-34
|
32 |
Wallace C S, Dowe D L. Minimum message length and Kolmogorov complexity. Computer Journal , 1999, 42(4): 270-283 doi: 10.1093/comjnl/42.4.270
|
33 |
Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach (III): Models and algorithms for dependence reduction, data dimension reduction, ICA and supervised learning. In: Wong K M, et al. eds. Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective . Springer-Verlag, 1997, 43-60
|
34 |
Tu S K, Xu L. Parameterizations make different model selections: Empirical findings from factor analysis. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China , 2011, 6(2): 256-274 doi: 10.1007/s11460-011-0150-2
|
35 |
Xu L. BYY harmony learning, structural RPCL, and topological self-organizing on mixture models. Neural Networks , 2002, (8-9): 1125-1151 doi: 10.1016/S0893-6080(02)00084-9
|
36 |
Ghahramani Z, Beal M. Variational inference for Bayesian mixtures of factor analysers. Advances in neural information processing systems 12 . Cambridge, MA: MIT Press, 2000, 449-455
|
37 |
Utsugi A, Kumagai T. Bayesian analysis of mixtures of factor analyzers. Neural Computation , 2001, 13(5): 993-1002 doi: 10.1162/08997660151134299
|
38 |
Xu L. Learning algorithms for RBF functions and subspace based functions. In: Olivas E, et al. eds. Handbook of Research on Machine Learning, Applications and Trends: Algorithms, Methods and Techniques. Hershey, PA: IGI Global , 2009, 60-94 doi: 10.4018/978-1-60566-766-9.ch003
|
39 |
Xu L. BYY P-Q factor systems and harmony learning. Invited talk. In: Proceedings of International Conference on Neural Information Processing (ICONIP’2000) . 2000, 1: 548-558
|
40 |
Xu L. BYY harmony learning, independent state space, and generalized APT financial analyses. IEEE Transactions on Neural Networks , 2001, 12(4): 822-849 doi: 10.1109/72.935094
|
41 |
Xu L. A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving. Pattern Recognition , 2007, 40(8): 2129-2153 doi: 10.1016/j.patcog.2006.12.016
|
42 |
Xu L. Bayesian Ying Yang learning. In: Zhong N, Liu J, eds. Intelligent Technologies for Information Analysis. Berlin: Springer , 2004, 615-706
|
43 |
Barron A, Rissanen J, Yu B. The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory , 1998, 44(6): 2743-2760 doi: 10.1109/18.720554
|
44 |
Bishop C M. Training with noise is equivalent to Tikhonov regularization. Neural Computation , 1995, 7(1): 108-116 doi: 10.1162/neco.1995.7.1.108
|
45 |
Zhou Z H. When semi-supervised learning meets ensemble learning. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (A). Frontiers of Electrical and Electronic Engineering in China , 2011, 6(1): 6-16 doi: 10.1007/s11460-011-0126-2
|
46 |
Xu L. RBF nets, mixture experts, and Bayesian Ying-Yang learning. Neurocomputing , 1998, 19(1-3): 223-257 doi: 10.1016/S0925-2312(97)00091-X
|
47 |
Xu L. Independent component analysis and extensions with noise and time: A Bayesian Ying-Yang learning perspective. Neural Information Processing—Letters and Reviews , 2003, 1(1): 1-52
|
48 |
Xu L. BYY learning, regularized implementation, and model selection on modular networks with one hidden layer of binary units. Neurocomputing , 2003, 51: 277-301 doi: 10.1016/S0925-2312(02)00622-7
|
49 |
Shilov G E, Gurevich B L. Integral, Measure, and Derivative: A Unified Approach. Silverman R trans . New York: Dover Publications, 1978
|
50 |
Povey D, Woodland P C. Minimum phone error and Ismothing for improved discriminative training. In: Proceedings of 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing . 2002, 1: 105-108
|
51 |
Juang B H, Katagiri S. Discriminative learning for minimum error classification. IEEE Transactions on Signal Processing , 1992, 40(12): 3043-3054 doi: 10.1109/78.175747
|
52 |
Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing , 1997, 5(3): 257-265 doi: 10.1109/89.568732
|
53 |
Saul L K, Rahim M G. Maximum likelihood and minimum classification error factor analysis for automatic speech recognition. IEEE Transactions on Speech and Audio Processing , 2000, 8(2): 115-125 doi: 10.1109/89.824696
|
54 |
Rissanen J. Modeling by shortest data description. Automatica , 1978, 14(5): 465-471 doi: 10.1016/0005-1098(78)90005-5
|
55 |
Hinton G E, Dayan P, Frey B J, Neal R M. The “wake-sleep” algorithm for unsupervised neural networks. Science , 1995, 268(5214): 1158-1161 doi: 10.1126/science.7761831
|
56 |
Xu L, Oja E, Suen C Y. Modified Hebbian learning for curve and surface fitting. Neural Networks , 1992, 5(3): 441-457 doi: 10.1016/0893-6080(92)90006-5
|
57 |
Xu L, Krzyzak A, Oja E. A neural net for dual subspace pattern recognition methods. International Journal of Neural Systems , 1991, 2(3): 169-184 doi: 10.1142/S0129065791000169
|
58 |
Hinton G E, Zemel R S. Autoencoders, minimum description length and Helmholtz free energy. In: Cowan J D, Tesauro G, Alspector J, eds. Advances in Neural Information Processing Systems 6. San Mateo: Morgan Kaufmann , 1994, 449-455
|
59 |
Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rival penalized competitive learning. In: Proceedings of the 11th International Conference on Pattern Recognition . 1992, I: 672-675
|
60 |
Xu L. BYY data smoothing based learning on a small size of samples. In: Proceedings of International Joint Conference on Neural Networks . 1999, 1: 546-551
|
61 |
Xu L. Temporal BYY learning for state space approach, hidden Markov model, and blind source separation. IEEE Transactions on Signal Processing , 2000, 48(7): 2132-2144 doi: 10.1109/78.847796
|
62 |
Xu L. Machine learning problems from optimization perspective. Journal of Global Optimization , 2010, 47(3): 369-401 doi: 10.1007/s10898-008-9364-0
|
63 |
Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (II) From unsupervised learning to supervised learning, and temporal modeling. In: Wong K M, King I, Yeung D Y, eds. Proceedings of Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective . 1997, 29-42
|
64 |
Xu L. Bayesian-Kullback YING-YANG machines for supervised learning. In: Proceedings of the 1996 World Congress On Neural Networks. San Diego, CA , 1996, 193-200
|
65 |
Xu L. Bayesian Kullback Ying-Yang dependence reduction theory. Neurocomputing , 1998, 22(1-3): 81-111 doi: 10.1016/S0925-2312(98)00051-4
|
66 |
Xu L. Bayesian Ying-Yang system and theory as a unified statistical learning approach: (V) Temporal modeling for temporal perception and control. In: Proceedings of the International Conference on Neural Information Processing . 1998, 2: 877-884
|
67 |
Xu L. New advances on Bayesian Ying-Yang learning system with Kullback and non-Kullback separation functionals. In: Proceedings of 1997 IEEE-(INNS) Conference on Neural Networks . 1997, 3: 1942-1947
|
68 |
Xu L. Bayesian Ying-Yang machine, clustering and number of clusters. Pattern Recognition Letters , 1997, 18(11-13): 1167-1178 doi: 10.1016/S0167-8655(97)00121-9
|
69 |
Xu L. How many clusters?: A YING-YANG machine based theory for a classical open problem in pattern recognition. In: Proceedings of the 1996 IEEE International Conference on Neural Networks . 1996, 3: 1546-1551
|
70 |
Xu L. Bayesian Ying-Yang theory for empirical learning, regularization, and model selection: General formulation. In: Proceedings of International Joint Conference on Neural Networks . 1999, 1: 552-557
|
71 |
Xu L. Temporal BYY learning and its applications to extended Kalman filtering, hidden Markov model, and sensormotor integration. In: Proceedings of International Joint Conference on Neural Networks . 1999, 2: 949-954
|
72 |
Xu L. Temporal factor analysis: Stable-identifiable family, orthogonal flow learning, and automated model selection. In: Proceedings of International Joint Conference on Neural Networks . 2002, 472-476
|
73 |
Csiszár I, Tusnády G. Information geometry and alternating minimization procedures. Statistics and Decisions , 1984, (Suppl 1): 205-237
|
74 |
Xu L. Temporal Bayesian Ying-Yang dependence reduction, blind source separation and principal independent components. In: Proceedings of International Joint Conference on Neural Networks . 1999, 2: 1071-1076
|
75 |
Pang Z H, Tu S K, Su D, Wu X H, Xu L. Discriminative training of GMM-HMM acoustic model by RPCL learning. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China , 2011, 6(2): 283-290 doi: 10.1007/s11460-011-0152-0
|
76 |
Amari S, Nagaoka H. Methods of Information Geometry. London, U.K.: Oxford University Press, 2000
|
77 |
Belouchrani A, Cardoso J. Maximum likelihood source separation by the expectation maximization technique: deterministic and stochastic implementation. In. Proceedings of NOLTA95 . 1995, 49-53
|
78 |
McLachlan G J, Krishnan T. The EM Algorithms and Extensions. New York: John Wiley and Sons, 1997
|
79 |
Shi L, Tu S K, Xu L. Gene clustering by structural prior based local factor analysis model under Bayesian Ying-Yang harmony learning. In: Proceedings of the 2010 International Conference on Bioinformatics and Biomedicine . 2010, 696-699
|
81 |
Park M Y, Hastie T. Penalized logistic regression for detecting gene interactions. Biostatistics , 2008, 9(1): 30-50 doi: 10.1093/biostatistics/kxm010
|
82 |
Brown R G, Hwang P Y C. Introduction to Random Signals and Applied Kalman Filtering. 3rd ed . New York: John Wiley and Sons, 1997
|
83 |
Roweis S, Ghahramani Z. A unifying review of linear Gaussian models. Neural Computation , 1999, 11(2): 305-345 doi: 10.1162/089976699300016674
|
84 |
Ghahramani Z, Hinton G E. Variational learning for switching state-space models. Neural Computation , 2000, 12(4): 831-864 doi: 10.1162/089976600300015619
|
85 |
Shumway R H, Stoffer D S. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis , 1982, 3(4): 253-264 doi: 10.1111/j.1467-9892.1982.tb00349.x
|
86 |
Shumway R H, Stoffer D S. Dynamic linear models with switching. Journal of the American Statistical Association , 1991, 86(415): 763-769 doi: 10.2307/2290410
|
87 |
Digalakis V, Rohlicek J R, Ostendorf M. ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition. IEEE Transactions on Speech and Audio Processing , 1993, 1(4): 431-442 doi: 10.1109/89.242489
|
88 |
Wang P H, Shi L, Du L, Liu H W, Xu L, Bao Z. Radar HRRP statistical recognition with temporal factor analysis by automatic Bayesian Ying-Yang harmony learning. A special issue on Machine Learning and Intelligence Science: IScIDE2010 (B). Frontiers of Electrical and Electronic Engineering in China , 2011, 6(2): 300-317 doi: 10.1007/s11460-011-0149-8
|
89 |
Gales M J F, Young S. The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing , 2008, 1(3): 195-304 doi: 10.1561/2000000004
|
90 |
Cordell H J. Detecting gene-gene interactions that underlie human diseases. Nature Reviews Genetics , 2009, 10(6): 392-404 doi: 10.1038/nrg2579
|
91 |
Phillips P C. Epistasis — The essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics , 2008, 9(11): 855-867 doi: 10.1038/nrg2452
|
92 |
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M A, Bender D, Maller J, Sklar P, de Bakker P I, Daly M J, Sham P C. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics , 2007, 81(3): 559-575 doi: 10.1086/519795
|
93 |
Ritchie M D, Hahn LW, Moore J H. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genetic Epidemiology , 2003, 24(2): 150-157 doi: 10.1002/gepi.10218
|
94 |
Xu L, Amari S. Combining classifiers and learning mixtureof- experts. In: Ramón J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey, PA: IGI Global , 2008, 318-326 doi: 10.4018/978-1-59904-849-9.ch049
|
95 |
Tu S K, Chen R S, Xu L. A binary matrix factorization algorithm for protein complex prediction. Proteome Science , 2011, 9(Suppl 1): S18 doi: 10.1186/1477-5956-9-S1-S18
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|