Bayesian Ying-Yang system, best harmony learning, and five action circling

doi:10.1007/s11460-010-0108-9

Front Elect Electr Eng Chin

0, Vol.

Issue () : 281-328 https://doi.org/10.1007/s11460-010-0108-9

RESEARCH ARTICLE

Bayesian Ying-Yang system, best harmony learning, and five action circling

Lei XU(

)

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China

Download: PDF(2265 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Firstly proposed in 1995 and systematically developed in the past decade, Bayesian Ying-Yang learning

“Ying” is spelled “Yin” in Chinese Pin Yin. To keep its original harmony with Yang, we deliberately adopted the term “Ying-Yang” since 1995.

is a statistical approach for a two pathway featured intelligent system via two complementary Bayesian representations of a joint distribution on the external observation X and its inner representation R, which can be understood from a perspective of the ancient Ying-Yang philosophy. We have q(X,R)=q(X|R)q(R) as Ying that is primary, with its structure designed according to tasks of the system, and p(X,R)=p(R|X)p(X) as Yang that is secondary, with p(X) given by samples of X while the structure of p(R|X) designed from Ying according to a Ying-Yang variety preservation principle, i.e., p(R|X) is designed as a functional with q(X|R), q(R) as its arguments. We call this pair Bayesian Ying-Yang (BYY) system. A Ying-Yang best harmony principle is proposed for learning all the unknowns in the system, in help of an implementation featured by a five action circling under the name of A5 paradigm. Interestingly, it coincides with the famous ancient WuXing theory that provides a general guide to keep the A5 circling well balanced towards a Ying-Yang best harmony. This BYY learning provides not only a general framework that accommodates typical learning approaches from a unified perspective but also a new road that leads to improved model selection criteria, Ying-Yang alternative learning with automatic model selection, as well as coordinated implementation of Ying based model selection and Yang based learning regularization.

This paper aims at an introduction of BYY learning in a twofold purpose. On one hand, we introduce fundamentals of BYY learning, including system design principles of least redundancy versus variety preservation, global learning principles of Ying-Yang harmony versus Ying-Yang matching, and local updating mechanisms of rival penalized competitive learning (RPCL) versus maximum a posteriori (MAP) competitive learning, as well as learning regularization by data smoothing and induced bias cancelation (IBC) priori. Also, we introduce basic implementing techniques, including apex approximation, primal gradient flow, Ying-Yang alternation, and Sheng-Ke-Cheng-Hui law. On the other hand, we provide a tutorial on learning algorithms for a number of typical learning tasks, including Gaussian mixture, factor analysis (FA) with independent Gaussian, binary, and non-Gaussian factors, local FA, temporal FA (TFA), hidden Markov model (HMM), hierarchical BYY, three layer networks, mixture of experts, radial basis functions (RBFs), subspace based functions (SBFs). This tutorial aims at introducing BYY learning algorithms in a comparison with typical algorithms, particularly with a benchmark of the expectation maximization (EM) algorithm for the maximum likelihood. These algorithms are summarized in a unified Ying-Yang alternation procedure with major parts in a same expression while differences simply characterized by few options in some subroutines. Additionally, a new insight is provided on the ancient Chinese philosophy of Yin-Yang and WuXing from a perspective of information science and intelligent system.

Keywords Bayesian Ying-Yang (BYY) system Yin-Yang philosophy best harmony WuXing A5 paradigm randomized Hough transform (RHT) rival penalized competitive learning (RPCL) maximum a posteriori (MAP) semi-supervised learning automatic model selection Gaussian mixture factor analysis (FA) binary FA non-Gaussian FA local FA temporal FA three layer networks mixture of experts radial basis function (RBF) networks subspace based function (SBF) state space modeling hidden Markov model (HMM) hierarchical BYY apex approximation Ying-Yang alternation

Corresponding Author(s): XU Lei,Email:lxu@cse.cuhk.edu.hk

Issue Date: 05 September 2010

Cite this article:

Lei XU. Bayesian Ying-Yang system, best harmony learning, and five action circling[J]. Front Elect Electr Eng Chin, 0, (): 281-328.

URL:

https://academic.hep.com.cn/fee/EN/10.1007/s11460-010-0108-9
https://academic.hep.com.cn/fee/EN/Y0/V/I/281

1	Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd ed. New York: John Wiley & Sons, 2001
	Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd ed. New York: John Wiley & Sons, 2001
2	Xu L. Machine learning problems from optimization perspective. Journal of Global Optimization , 2010, 47: 369-401 doi: 10.1007/s10898-008-9364-0
	Xu L. Machinelearning problems from optimization perspective. Journal of Global Optimization, 2010, 47: 369―401 doi: 10.1007/s10898-008-9364-0
3	Xu L. Bayesian Ying Yang learning. Scholarpedia , 2007, 2(3): 1809http://scholarpedia.org/article/Bayesian_Ying_Yang_learning
	scholarpedia.org/article/Bayesian_Ying_Yang_learning
4	Aster R, Borchers B, Thurber C. Parameter Estimation and Inverse Problems. New York: Elsevier Academic Press, 2004
	Aster R, Borchers B, Thurber C. Parameter Estimation and Inverse Problems. New York: Elsevier Academic Press, 2004
5	Brown R G, Hwang P Y C. Introduction to Random Signals and Applied Kalman Filtering. 3rd ed. New York: John Wiley & Sons, 1997
	Brown R G, Hwang P Y C. Introduction to Random Signalsand Applied Kalman Filtering. 3rd ed. New York: John Wiley & Sons, 1997
6	Narendra K S, Parthasarathy K. Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks , 1990, 1(1): 4-27 doi: 10.1109/72.80202
	Narendra K S, Parthasarathy K. Identification and controlof dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1990, 1(1): 4―27 doi: 10.1109/72.80202
7	Redner R A, Walker H F. Mixture densities, maximum likelihood, and the EM algorithm. SIAM Review , 1984, 26(2): 195-239 doi: 10.1137/1026034
	Redner R A, Walker H F. Mixture densities, maximumlikelihood, and the EM algorithm. SIAMReview, 1984, 26(2): 195―239 doi: 10.1137/1026034
8	Xu L, Jordan M I. On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation , 1996, 8(1): 129-151
	Xu L, Jordan M I. On convergence propertiesof the EM algorithm for Gaussian mixtures. Neural Computation, 1996, 8(1): 129―151
9	Anderson T W, Rubin H. Statistical inference in factor analysis. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability . 1956, 5: 111-150
	Anderson T W, Rubin H. Statistical inference infactor analysis. In: Proceedings of theThird Berkeley Symposium on Mathematical Statistics and Probability. 1956, 5: 111―150
10	Rubi D, Thayer D. EM algorithm for ML factor analysis. Psychometrika , 1976, 57: 69-76
	Rubi D, Thayer D. EM algorithm for ML factoranalysis. Psychometrika, 1976, 57: 69―76
11	Bozdogan H, Ramirez D E. FACAIC: Model selection algorithm for the orthogonal factor model using AIC and FACAIC. Psychometrika , 1988, 53(3): 407-415 doi: 10.1007/BF02294221
	Bozdogan H, Ramirez D E. FACAIC: Model selection algorithmfor the orthogonal factor model using AIC and FACAIC. Psychometrika, 1988, 53(3): 407―415 doi: 10.1007/BF02294221
12	Burnham K P, Anderson D R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed. New York: Springer, 2002
	Burnham K P, Anderson D R. Model Selection and MultimodelInference: A Practical Information-Theoretic Approach. 2nd ed. New York: Springer, 2002
13	Tikhonov A N, Arsenin V Y. Solutions of Ill-Posed Problems. Washington: Winston and Sons, 1977
	Tikhonov A N, Arsenin V Y. Solutions of Ill-Posed Problems. Washington: Winston and Sons, 1977
14	Poggio T, Girosi F. Networks for approximation and learning. Proceedings of the IEEE , 1990, 78(9): 1481-1497 doi: 10.1109/5.58326
	Poggio T, Girosi F. Networks for approximationand learning. Proceedings of the IEEE, 1990, 78(9): 1481―1497 doi: 10.1109/5.58326
15	Amari S I, Cichocki A, Yang H. A new learning algorithm for blind separation of sources. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advances in Neural Information Processing System 8 . Cambridge: MIT Press, 1996, 757-763
	Amari S I, Cichocki A, Yang H. A new learning algorithm for blind separation of sources. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advances in Neural Information Processing System 8. Cambridge: MIT Press, 1996, 757―763
16	Bell A J, Sejnowski T J. An information-maximization approach to blind separation and blind deconvolution. Neural Computation , 1995, 7(6): 1129-1159 doi: 10.1162/neco.1995.7.6.1129
	Bell A J, Sejnowski T J. An information-maximizationapproach to blind separation and blind deconvolution. Neural Computation, 1995, 7(6): 1129―1159 doi: 10.1162/neco.1995.7.6.1129
17	Xu L. Independent component analysis and extensions with noise and time: A Bayesian Ying-Yang learning perspective. Neural Information Processing — Letters and Reviews , 2003, 1(1): 1-52
	Xu L. Independentcomponent analysis and extensions with noise and time: A BayesianYing-Yang learning perspective. NeuralInformation Processing — Letters and Reviews, 2003, 1(1): 1―52
18	Xu L. Independent subspaces. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence, Hershey(PA): IGI Global. 2008, 903-912
	Xu L. Independentsubspaces. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence, Hershey(PA): IGI Global. 2008, 903―912
19	Xu L. Least mean square error reconstruction principle for self-organizing neural-nets. Neural Networks , 1993, 6(5): 627-648 doi: 10.1016/S0893-6080(05)80107-8
	Xu L. Leastmean square error reconstruction principle for self-organizing neural-nets. Neural Networks, 1993, 6(5): 627―648 doi: 10.1016/S0893-6080(05)80107-8
20	McLachlan G J, Krishnan T. The EM Algorithms and Extensions. New York: John Wiley & Sons, 1997
	McLachlan G J, Krishnan T. The EM Algorithms and Extensions. New York: John Wiley & Sons, 1997
21	Dempster A P, Laird N M, Rubin D B. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B , 1977, 39(1): 1-38
	Dempster A P, Laird N M, Rubin D B. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: SeriesB, 1977, 39(1): 1―38
22	Amari S. Information geometry of the EM and EM algorithms for neural networks. Neural Networks , 1995, 8(9): 1379-1408 doi: 10.1016/0893-6080(95)00003-8
	Amari S. Informationgeometry of the EM and EM algorithms for neural networks. Neural Networks, 1995, 8(9): 1379―1408 doi: 10.1016/0893-6080(95)00003-8
23	Grenander U, Miller M. Pattern theory: From representation to inference. Oxford: Oxford University Press, 2007
	Grenander U, Miller M. Pattern theory: From representationto inference. Oxford: Oxford University Press, 2007
24	Mumford D. On the computational architecture of the neocortex II: The role of cortico-cortical loops. Biological Cybernetics , 1992, 66(3): 241-251 doi: 10.1007/BF00198477
	Mumford D. Onthe computational architecture of the neocortex II: The role of cortico-corticalloops. Biological Cybernetics, 1992, 66(3): 241―251 doi: 10.1007/BF00198477
25	Friston K. A theory of cortical responses. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences , 2005, 360(1456): 815-836 doi: 10.1098/rstb.2005.1622
	Friston K. Atheory of cortical responses. PhilosophicalTransactions of the Royal Society of London. Series B: BiologicalSciences, 2005, 360(1456): 815―836 doi: 10.1098/rstb.2005.1622
26	Yuille A L, Kersten D. Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences , 2006, 10(7): 301-308 doi: 10.1016/j.tics.2006.05.002
	Yuille A L, Kersten D. Vision as Bayesian inference:Analysis by synthesis? Trends in CognitiveSciences, 2006, 10(7): 301―308 doi: 10.1016/j.tics.2006.05.002
27	Schwarz G. Estimating the dimension of a model. Annals of Statistics , 1978, 6(2): 461-464 doi: 10.1214/aos/1176344136
	Schwarz G. Estimatingthe dimension of a model. Annals of Statistics, 1978, 6(2): 461―464 doi: 10.1214/aos/1176344136
28	Rissanen J. Modeling by shortest data description. Automatica , 1978, 14: 465-471 doi: 10.1016/0005-1098(78)90005-5
	Rissanen J. Modelingby shortest data description. Automatica, 1978, 14: 465―471 doi: 10.1016/0005-1098(78)90005-5
29	Rissanen J. Information and Complexity in Statistical Modeling. New York: Springer, 2007
	Rissanen J. Informationand Complexity in Statistical Modeling. New York: Springer, 2007
30	DeGroot M H. Optimal Statistical Decisions. Hooken: Wiley Classics Library, 2004
	DeGroot M H. Optimal Statistical Decisions. Hooken: Wiley Classics Library, 2004
31	Mackay D J C. A practical Bayesian framework for backpropagation networks. Neural Computation , 1992, 4(3): 448-472 doi: 10.1162/neco.1992.4.3.448
	Mackay D J C. A practical Bayesian framework for backpropagation networks. Neural Computation, 1992, 4(3): 448―472 doi: 10.1162/neco.1992.4.3.448
32	MacKay D. Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press, 2003
	MacKay D. InformationTheory, Inference, and Learning Algorithms. Cambridge: Cambridge UniversityPress, 2003
33	Wallace C S, Boulton D M. An information measure for classification. Computer Journal , 1968, 11(2): 185-194
	Wallace C S, Boulton D M. An information measure forclassification. Computer Journal, 1968, 11(2): 185―194
34	Wallace C S, Dowe D R. Minimum message length and Kolmogorov complexity. Computer Journal , 1999, 42(4): 270-280 doi: 10.1093/comjnl/42.4.270
	Wallace C S, Dowe D R. Minimum message length andKolmogorov complexity. Computer Journal, 1999, 42(4): 270―280 doi: 10.1093/comjnl/42.4.270
35	Bourlard H, Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics , 1988, 59: 291-294 doi: 10.1007/BF00332918
	Bourlard H, Kamp Y. Auto-association by multilayerperceptrons and singular value decomposition. Biological Cybernetics, 1988, 59: 291―294 doi: 10.1007/BF00332918
36	Palmieri F, Zhu J, Chang C. Anti-Hebbian learning in topologically constrained linear networks: A tutorial. IEEE Transactions on Neural Networks , 1993, 4(5): 748-761 doi: 10.1109/72.248453
	Palmieri F, Zhu J, Chang C. Anti-Hebbian learning in topologically constrained linearnetworks: A tutorial. IEEE Transactionson Neural Networks, 1993, 4(5): 748―761 doi: 10.1109/72.248453
37	Grossberg S, Carpenter G A. Adaptive resonance theory. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks . 2nd ed. Cambridge: MIT Press, 2002, 87-90
	Grossberg S, Carpenter G A. Adaptive resonance theory. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks. 2nd ed. Cambridge: MIT Press, 2002, 87―90
38	Carpenter G A, Grossberg S. A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing , 1987, 37: 54-115 doi: 10.1016/S0734-189X(87)80014-2
	Carpenter G A, Grossberg S. A massively parallel architecturefor a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 1987, 37: 54―115 doi: 10.1016/S0734-189X(87)80014-2
39	Kawato M. Cerebellum and motor control. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks . 2nd ed. Cambridge: MIT Press, 2002, 190-195
	Kawato M. Cerebellumand motor control. In: Arbib M A, ed. The Handbook of Brain Theoryand Neural Networks. 2nd ed. Cambridge: MIT Press, 2002, 190―195
40	Shidara M, Kawano K, Gomi H, Kawato M. Inversedynamics model eye movement control by Purkinje cells in the cerebellum. Nature , 1993, 365(6441): 50-52 doi: 10.1038/365050a0
	Shidara M, Kawano K, Gomi H, Kawato M. Inversedynamicsmodel eye movement control by Purkinje cells in the cerebellum. Nature, 1993, 365(6441): 50―52 doi: 10.1038/365050a0
41	Wolpert D, Kawato M. Multiple paired forward and inverse models for motor control. Neural Networks , 1998, 11(7-8): 1317-1329 doi: 10.1016/S0893-6080(98)00066-5
	Wolpert D, Kawato M. Multiple paired forward andinverse models for motor control. NeuralNetworks, 1998, 11(7―8): 1317―1329 doi: 10.1016/S0893-6080(98)00066-5
42	Hinton G E, Dayan P, Frey B J, Neal R N. The wake-sleep algorithm for unsupervised learning neural networks. Science , 1995, 268(5214): 1158-1160 doi: 10.1126/science.7761831
	Hinton G E, Dayan P, Frey B J, Neal R N. The wake-sleepalgorithm for unsupervised learning neural networks. Science, 1995, 268(5214): 1158―1160 doi: 10.1126/science.7761831
43	Dayan P, Hinton G E, Neal R M, Zemel R S. The Helmholtz machine. Neural Computation , 1995, 7(5): 889-904 doi: 10.1162/neco.1995.7.5.889
	Dayan P, Hinton G E, Neal R M, Zemel R S. The Helmholtzmachine. Neural Computation, 1995, 7(5): 889―904 doi: 10.1162/neco.1995.7.5.889
44	Jaakkola T S. Tutorial on variational approximation methods. In: Opper M, Saad D, eds. Advanced Mean Field Methods: Theory and Practice . Cambridge: MIT press, 2001, 129-160
	Jaakkola T S. Tutorial on variational approximation methods. In: Opper M, Saad D, eds. Advanced Mean Field Methods: Theory and Practice. Cambridge: MIT press, 2001, 129―160
45	Jordan M, Ghahramani Z, Jaakkola T, Saul L. Introduction to variational methods for graphical models. Machine Learning , 1999, 37(2): 183-233 doi: 10.1023/A:1007665907178
	Jordan M, Ghahramani Z, Jaakkola T, Saul L. Introductionto variational methods for graphical models. Machine Learning, 1999, 37(2): 183―233 doi: 10.1023/A:1007665907178
46	Corduneanu A, Bishop CM. Variational Bayesian model selection for mixture distributions. In: Jaakkola T, Richardson T, eds. Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics . 2001, 27-34
	Corduneanu A, Bishop CM. Variational Bayesian modelselection for mixture distributions. In: Jaakkola T, Richardson T, eds. Proceedings of the Eighth International Conference on ArtificialIntelligence and Statistics. 2001, 27―34
47	Xu L. Bayesian-Kullback coupled YING-YANG machines: Unified learning and new results on vector quantization. In: Proceedings of the International Conference on Neural Information Processing . 1995, 977-988 (A further version in NIPS8. In: Touretzky D S, et al. eds. Cambridge: MIT Press, 444-450)
	Xu L. Bayesian-Kullbackcoupled YING-YANG machines: Unified learning and new results on vectorquantization. In: Proceedings of the InternationalConference on Neural Information Processing. 1995, 977―988 (A further version in NIPS8. In: Touretzky D S, et al. eds. Cambridge:MIT Press, 444―450)
48	Xu L. Ying-Yang learning. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks . 2nd ed. Cambridge: MIT Press, 2002, 1231-1237
	Xu L. Ying-Yanglearning. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks. 2nd ed. Cambridge: MIT Press, 2002, 1231―1237
49	Xu L. Advances on BYY harmony learning: Information theoretic perspective, generalized projection geometry, and independent factor auto-determination. IEEE Transactions on Neural Networks , 2004, 15(4): 885-902 doi: 10.1109/TNN.2004.828767
	Xu L. Advanceson BYY harmony learning: Information theoretic perspective, generalizedprojection geometry, and independent factor auto-determination. IEEE Transactions on Neural Networks, 2004, 15(4): 885―902 doi: 10.1109/TNN.2004.828767
50	Xu L. Learning algorithms for RBF functions and subspace based functions. In: Olivas E, . eds. Handbook of Research on Machine Learning, Applications and Trends: Algorithms, Methods and Techniques. Hershey(PA): IGI Global, 2009, 60-94
	Xu L. Learningalgorithms for RBF functions and subspace based functions. In: Olivas E, et al. eds. Handbook of Research on Machine Learning, Applications and Trends:Algorithms, Methods and Techniques. Hershey(PA): IGI Global, 2009, 60―94
51	Xu L. Bayesian Ying Yang system, best harmony learning, and Gaussian manifold based family. In: Zurada. eds. Computational Intelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures. Lecture Notes in Computer Science , 2008, 5050: 48-78
	Xu L. BayesianYing Yang system, best harmony learning, and Gaussian manifold basedfamily. In: Zuradaet al. eds. ComputationalIntelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures.Lecture Notes in Computer Science, 2008, 5050: 48―78
52	Xu L, Oja E. Randomized Hough transform. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 1354-1361
	Xu L, Oja E. Randomized Hough transform. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 1354―1361
53	Veith I. The Yellow Emperor’s Classic of Internal Medicine. Berkeley: University of California Press, 1972
	Veith I. TheYellow Emperor’s Classic of Internal Medicine. Berkeley: University of CaliforniaPress, 1972
54	Vapnik, V. Estimation of Dependences Based on Empirical Data. Springer, 2006
	Vapnik, V. Estimationof Dependences Based on Empirical Data. Springer, 2006
55	Stone M. Cross-validation: A review. Mathematics, Operations and Statistics , 1978, 9(1): 127-140
	Stone M. Cross-validation:A review. Mathematics, Operations and Statistics, 1978, 9(1): 127―140
56	Rivals I, Personnaz L. On cross validation for model selection. Neural Computation , 1999, 11(4): 863-870 doi: 10.1162/089976699300016476
	Rivals I, Personnaz L. On cross validation for modelselection. Neural Computation, 1999, 11(4): 863―870 doi: 10.1162/089976699300016476
57	Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control , 1974, 19(6): 714-723 doi: 10.1109/TAC.1974.1100705
	Akaike H. Anew look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19(6): 714―723 doi: 10.1109/TAC.1974.1100705
58	Bozdogan H. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extension. Psychometrika , 1987, 52(3): 345-370 doi: 10.1007/BF02294361
	Bozdogan H. Modelselection and Akaike’s information criterion (AIC): The generaltheory and its analytical extension. Psychometrika, 1987, 52(3): 345―370 doi: 10.1007/BF02294361
59	Cavanaugh J E. Unifying the derivations for the Akaike and corrected Akaike information criteria. Statistics & Probability Letters , 1997, 33(2): 201-208 doi: 10.1016/S0167-7152(96)00128-9
	Cavanaugh J E. Unifying the derivations for the Akaike and corrected Akaike informationcriteria. Statistics & ProbabilityLetters, 1997, 33(2): 201―208 doi: 10.1016/S0167-7152(96)00128-9
60	Williams P M. Bayesian regularization and pruning using a Laplace prior. Neural Computation , 1995, 7(1): 117-143 doi: 10.1162/neco.1995.7.1.117
	Williams P M. Bayesian regularization and pruning using a Laplace prior. Neural Computation, 1995, 7(1): 117―143 doi: 10.1162/neco.1995.7.1.117
61	Tibshirani R, Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B , 1996, 58(1): 267-288
	Tibshirani R, Regressionshrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 1996, 58(1): 267―288
62	MacKay D J C. Bayesian interpolation. Neural Computation , 1992, 4(3): 415-447 doi: 10.1162/neco.1992.4.3.415
	MacKay D J C. Bayesian interpolation. Neural Computation, 1992, 4(3): 415―447 doi: 10.1162/neco.1992.4.3.415
63	Salah A A, Alpaydin E. Incremental mixtures of factor analyzers. In: Proceedings the 17th International Conference on Pattern Recognition . 2004, 1: 276-279
	Salah A A, Alpaydin E. Incremental mixtures of factor analyzers.In: Proceedings the 17th International Conferenceon Pattern Recognition. 2004, 1: 276―279
64	Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net and curve detection. IEEE Transactions on Neural Networks , 1993, 4(4): 636-649 doi: 10.1109/72.238318
	Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis,RBF net and curve detection. IEEE Transactionson Neural Networks, 1993, 4(4): 636―649 doi: 10.1109/72.238318
65	Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rival penalized competitive learning. In: Proceedings of the 11th International Conference on Pattern Recognition . 1992, I: 672-675
	Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rivalpenalized competitive learning. In: Proceedingsof the 11th International Conference on Pattern Recognition. 1992, I: 672―675
66	Xu L. Rival penalized competitive learning. Scholarpedia , 2007, 2(8): 1810http://www.scholarpedia.org/article/Rival_penalized_competitive_learning
	www.scholarpedia.org/article/Rival_penalized_competitive_learning
67	Corduneanu A, Bishop C M. Variational Bayesian model selection for mixture distributions. In: Richardson T, Jaakkola T, eds. Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics . 2001, 27-34
	Corduneanu A, Bishop C M. Variational Bayesian modelselection for mixture distributions. In: Richardson T, Jaakkola T, eds. Proceedings of the Eighth International Conference on ArtificialIntelligence and Statistics. 2001, 27―34
68	McGrory C A, Titterington D M. Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis , 2007, 51(11): 5352-5367 doi: 10.1016/j.csda.2006.07.020
	McGrory C A, Titterington D M. Variational approximationsin Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis, 2007, 51(11): 5352―5367 doi: 10.1016/j.csda.2006.07.020
69	Tu S, Xu L. A study of several model selection criteria for determining the number of signals. In: Proceedings of 2010 IEEE International Conference on Acoustics, Speech and Signal Processing . 2010, 1966-1969
	Tu S, Xu L. A study of several modelselection criteria for determining the number of signals. In: Proceedings of 2010 IEEE International Conferenceon Acoustics, Speech and Signal Processing. 2010, 1966―1969
70	Xu L. Fundamentals, challenges, and advances of statistical learning for knowledge discovery and problem solving: A BYY harmony perspective, keynote talk. In: Proceedings of the International Conference on Neural Networks and Brain . 2005, 1: 24-55
	Xu L. Fundamentals,challenges, and advances of statistical learning for knowledge discoveryand problem solving: A BYY harmony perspective, keynote talk. In: Proceedings of the International Conferenceon Neural Networks and Brain. 2005, 1: 24―55
71	Hinton G E, Zemel R S. Autoencoders, minimum description length and Helmholtz free energy. In: Cowan J D, Tesauro G, Alspector J, eds. Advances in Neural Information Processing Systems 6 . San Mateo: Morgan Kaufmann, 1994, 449-455
	Hinton G E, Zemel R S. Autoencoders, minimum descriptionlength and Helmholtz free energy. In: Cowan J D, Tesauro G, Alspector J, eds. Advances in Neural Information Processing Systems 6. San Mateo: Morgan Kaufmann, 1994, 449―455
72	Xu L. Data smoothing regularization, multi-sets-learning, and problem solving strategies. Neural Networks , 2003, 16(5-6): 817-825 doi: 10.1016/S0893-6080(03)00119-9
	Xu L. Datasmoothing regularization, multi-sets-learning, and problem solvingstrategies. Neural Networks, 2003, 16(5―6): 817―825 doi: 10.1016/S0893-6080(03)00119-9
73	Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (I) Unsupervised and semi-unsupervised learning. In: Amari S, Kassabov N, eds. Brain-like Computing and Intelligent Information Systems. Springer-Verlag , 1997, 241-274
	Xu L. BayesianYing Yang system and theory as a unified statistical learning approach:(I) Unsupervised and semi-unsupervised learning. In: Amari S, Kassabov N, eds. Brain-like Computing and Intelligent Information Systems. Springer-Verlag, 1997, 241―274
74	Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (II) From unsupervised learning to supervised learning and temporal modeling and (III) Models and algorithms for dependence reduction, data dimension reduction, ICA and supervised learning. In: Wong K M, King I, Yeung D Y, eds. Proceedings of Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective . 1997: 25-60
	Xu L. BayesianYing Yang system and theory as a unified statistical learning approach:(II) From unsupervised learning to supervised learning and temporalmodeling and (III) Models and algorithms for dependence reduction,data dimension reduction, ICA and supervised learning.In: Wong K M, King I, Yeung D Y, eds. Proceedings of Theoretical Aspects of Neural Computation: A MultidisciplinaryPerspective. 1997: 25―60
75	Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach (VII): Data smoothing. In: Proceedings of the International Conference on Neural Information Processing . 1998, 1: 243-248
	Xu L. BayesianYing Yang system and theory as a unified statistical learning approach(VII): Data smoothing. In: Proceedingsof the International Conference on Neural Information Processing. 1998, 1: 243―248
76	Bishop C M. Training with noise is equivalent to Tikhonov regularization. Neural Computation , 1995, 7(1): 108-116 doi: 10.1162/neco.1995.7.1.108
	Bishop C M. Training with noise is equivalent to Tikhonov regularization. Neural Computation, 1995, 7(1): 108―116 doi: 10.1162/neco.1995.7.1.108
77	Xu L. A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving. Pattern Recognition , 2007, 40(8): 2129-2153 doi: 10.1016/j.patcog.2006.12.016
	Xu L. Aunified perspective and new results on RHT computing, mixture basedlearning, and multi-learner based problem solving. Pattern Recognition, 2007, 40(8): 2129―2153 doi: 10.1016/j.patcog.2006.12.016
78	Xu L, Oja E, Kultanen P. A new curve detection method randomized Hough transform (RHT). Pattern Recognition Letters , 1990, 11(5): 331-338 doi: 10.1016/0167-8655(90)90042-Z
	Xu L, Oja E, Kultanen P. A new curve detection method randomized Hough transform(RHT). Pattern Recognition Letters, 1990, 11(5): 331―338 doi: 10.1016/0167-8655(90)90042-Z
79	Hough P V C. Method and means for recognizing complex patterns. US Patent, 3069654, 1962-12-18
	Hough P V C. Method and means for recognizing complex patterns. US Patent, 3069654, 1962-12-18
80	Xu L. Best harmony, unified RPCL and automated model selection for unsupervised and supervised learning on Gaussian mixtures, ME-RBF models and three-layer nets. International Journal of Neural Systems , 2001, 11(1): 3-69 doi: 10.1016/S0129-0657(01)00049-7
	Xu L. Bestharmony, unified RPCL and automated model selection for unsupervisedand supervised learning on Gaussian mixtures, ME-RBF models and three-layernets. International Journal of Neural Systems, 2001, 11(1): 3―69 doi: 10.1016/S0129-0657(01)00049-7
81	Xu L. Bayesian Ying-Yang learning theory for data dimension reduction and determination. Journal of Computational Intelligence in Finance , 1998, 6(5): 6-18
	Xu L. BayesianYing-Yang learning theory for data dimension reduction and determination. Journal of Computational Intelligence in Finance, 1998, 6(5): 6―18
82	Tu S, Xu L. Theoretical analysis and comparison of several criteria on linear model dimension reduction. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and Signal Separation. Lecture Notes in Computer Science , 2009, 5441: 154-162
	Tu S, Xu L. Theoretical analysis andcomparison of several criteria on linear model dimension reduction. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and SignalSeparation. Lecture Notes in Computer Science, 2009, 5441: 154―162
83	Xu L. BYY harmony learning, independent state space and generalized APT financial analyses. IEEE Transactions on Neural Networks , 2001, 12(4): 822-849 doi: 10.1109/72.935094
	Xu L. BYYharmony learning, independent state space and generalized APT financialanalyses. IEEE Transactions on Neural Networks, 2001, 12(4): 822―849 doi: 10.1109/72.935094
84	Xu L. Temporal BYY encoding, Markovian state spaces, and space dimension determination. IEEE Transactions on Neural Networks , 2004, 15(5): 1276-1295 doi: 10.1109/TNN.2004.833302
	Xu L. TemporalBYY encoding, Markovian state spaces, and space dimension determination. IEEE Transactions on Neural Networks, 2004, 15(5): 1276―1295 doi: 10.1109/TNN.2004.833302
85	Kalman R E. A new approach to linear filtering and prediction problems. Transactions of the ASME Journal of Basic Engineering , 1960, 35-45
	Kalman R E. A new approach to linear filtering and prediction problems. Transactions of the ASME Journal of Basic Engineering, 1960, 35―45
86	Sun K, Tu S, Gao D Y, Xu L. Canonical dual approach to binary factor analysis. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and Signal Separation. Lecture Notes in Computer Science , 2009, 5441: 346-353
	Sun K, Tu S, Gao D Y, Xu L. Canonical dualapproach to binary factor analysis. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and SignalSeparation. Lecture Notes in Computer Science, 2009, 5441: 346―353
87	Nathan S. Science and medicine in imperial China — The state of the field. The Journal of Asian Studies , 1988, 47(1): 41-90 doi: 10.2307/2056359
	Nathan S. Scienceand medicine in imperial China — The state of the field. The Journal of Asian Studies, 1988, 47(1): 41―90 doi: 10.2307/2056359
88	Wilhelm R, Baynes C. The I Ching or Book of Changes, with Foreword by Carl Jung. 3rd ed. Bollingen Series XIX . Princeton: Princeton University Press, 1967
	Wilhelm R, Baynes C. The I Ching or Book of Changes,with Foreword by Carl Jung. 3rd ed. Bollingen Series XIX. Princeton: Princeton UniversityPress, 1967
89	Hansen C. A Daoist Theory of Chinese Thought: A Philosophical Interpretation. New York: Oxford University Press, 2000
	Hansen C. ADaoist Theory of Chinese Thought: A Philosophical Interpretation. New York: Oxford University Press, 2000
90	Shilov G E, Gurevich B L. Integral, Measure, and Derivative: A Unified Approach. Silverman R trans. New York: Dover Publications , 1978
	Shilov G E, Gurevich B L. Integral, Measure, and Derivative:A Unified Approach. Silverman R trans. New York: Dover Publications, 1978
91	Ali S M, Silvey S D. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society: Series B , 1966, 28(1): 131-140
	Ali S M, Silvey S D. A general class of coefficientsof divergence of one distribution from another. Journal of the Royal Statistical Society: Series B, 1966, 28(1): 131―140
92	Kullback S, Leibler R A. On information and sufficiency. Annals of Mathematical Statistics , 1951, 22(1): 79-86 doi: 10.1214/aoms/1177729694
	Kullback S, Leibler R A. On information and sufficiency. Annals of Mathematical Statistics, 1951, 22(1): 79―86 doi: 10.1214/aoms/1177729694
93	Shore J. Minimum cross-entropy spectral analysis. IEEE Transactions on Acoustics, Speech and Signal Process , 1981, 29(2): 230-237 doi: 10.1109/TASSP.1981.1163539
	Shore J. Minimumcross-entropy spectral analysis. IEEE Transactionson Acoustics, Speech and Signal Process, 1981, 29(2): 230―237 doi: 10.1109/TASSP.1981.1163539
94	Burg J P, Luenberger D G, Wenger D L. Estimation of structured covariance matrices. Proceedings of the IEEE , 1982, 70(9): 963-974 doi: 10.1109/PROC.1982.12427
	Burg J P, Luenberger D G, Wenger D L. Estimation of structured covariance matrices. Proceedings of the IEEE, 1982, 70(9): 963―974 doi: 10.1109/PROC.1982.12427
95	Jaynes E T. Information theory and statistical mechanics. Physical Review , 1957, 106(4): 620-630 doi: 10.1103/PhysRev.106.620
	Jaynes E T. Information theory and statistical mechanics. Physical Review, 1957, 106(4): 620―630 doi: 10.1103/PhysRev.106.620
96	Xu L. Temporal BYY learning for state space approach, hidden Markov model and blind source separation. IEEE Transactions on Signal Processing , 2000, 48(7): 2132-2144 doi: 10.1109/78.847796
	Xu L. TemporalBYY learning for state space approach, hidden Markov model and blindsource separation. IEEE Transactions onSignal Processing, 2000, 48(7): 2132―2144 doi: 10.1109/78.847796
97	Jeffreys H. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences , 1946, 186(1007): 453-461 doi: 10.1098/rspa.1946.0056
	Jeffreys H. Aninvariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. SeriesA: Mathematical and Physical Sciences, 1946, 186(1007): 453―461 doi: 10.1098/rspa.1946.0056
98	Xu L. BYY learning system and theory for parameter estimation, data smoothing based regularization and model selection. Neural, Parallel and Scientific Computations , 2000, 8(1): 55-82
	Xu L. BYYlearning system and theory for parameter estimation, data smoothingbased regularization and model selection. Neural, Parallel and Scientific Computations, 2000, 8(1): 55―82
99	Xu L. BYY Σ-П factor systems and harmony learning. Invited talk. In: Proceedings of International Conference on Neural Information Processing (ICONIP’2000) . 2000, 1: 548-558
	Xu L. BYYΣ-П factor systems and harmony learning. Invited talk. In: Proceedings of International Conference onNeural Information Processing (ICONIP’2000). 2000, 1: 548―558
100	Xu L. Bayesian Ying Yang learning. In: Zhong N, Liu J, eds. Intelligent Technologies for Information Analysis . Berlin: Springer, 2004, 615-706
	Xu L. BayesianYing Yang learning. In: Zhong N, Liu J, eds. Intelligent Technologiesfor Information Analysis. Berlin: Springer, 2004, 615―706
101	Barron A, Rissanen J, Yu B. The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory , 1998, 44(6): 2743-2760 doi: 10.1109/18.720554
	Barron A, Rissanen J, Yu B. The minimum description length principle in coding andmodeling. IEEE Transactions on InformationTheory, 1998, 44(6): 2743―2760 doi: 10.1109/18.720554
102	Xu L, Amari S. Combining classifiers and learning mixtureof-experts. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 318-326
	Xu L, Amari S. Combining classifiers andlearning mixtureof-experts. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 318―326
103	Xu L. BYY learning, regularized implementation, and model selection on modular networks with one hidden layer of binary units. Neurocomputing , 2003, 51: 277-301 (Errata on Neurocomputing, 2003, 55(1-2): 405-406)
	Xu L. BYYlearning, regularized implementation, and model selection on modularnetworks with one hidden layer of binary units. Neurocomputing, 2003, 51: 277―301 (Errata on Neurocomputing, 2003, 55(1―2): 405―406)
104	Gales M J F, Young S. The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing , 2008, 1(3): 195-304 doi: 10.1561/2000000004
	Gales M J F, Young S. The application of hiddenMarkov models in speech recognition. Foundationsand Trends in Signal Processing, 2008, 1(3): 195―304 doi: 10.1561/2000000004
105	Su D, Wu X H, Xu L. GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection. In: Proceedings of 2010 IEEE International Conference on Acoustics, Speech and Signal Processing . 2010, 4890-4893
	Su D, Wu X H, Xu L. GMM-HMM acoustic model training by a two level procedurewith Gaussian components determined by automatic model selection. In: Proceedings of 2010 IEEE International Conferenceon Acoustics, Speech and Signal Processing. 2010, 4890―4893
106	Rosti A V, Gales M. Factor analysed hidden Markov models for speech recognition. Computer Speech and Language , 2004, 18(2): 181-200 doi: 10.1016/j.csl.2003.09.004
	Rosti A V, Gales M. Factor analysed hidden Markovmodels for speech recognition. ComputerSpeech and Language, 2004, 18(2): 181―200 doi: 10.1016/j.csl.2003.09.004
107	Gales M J F. Discriminative models for speech recognition. In: Proceedings of Information Theory and Applications Workshop . 2007, 170-176
	Gales M J F. Discriminative models for speech recognition. In: Proceedings of Information Theory and Applications Workshop. 2007, 170―176
108	Woodland P C, Povey D. Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech and Language , 2002, 16(1): 25-47 doi: 10.1006/csla.2001.0182
	Woodland P C, Povey D. Large scale discriminativetraining of hidden Markov models for speech recognition. Computer Speech and Language, 2002, 16(1): 25―47 doi: 10.1006/csla.2001.0182
109	Csiszár I, Tusnády G. Information geometry and alternating minimization procedures. Statistics and Decisions , 1984, (Suppl. 1): 205-237
	Csiszár I, Tusnády G. Information geometry andalternating minimization procedures. Statisticsand Decisions, 1984, (Suppl. 1): 205―237
110	Xu L, Oja E, Suen C Y. Modified Hebbian learning for curve and surface fitting. Neural Networks , 1992, 5(3): 441-457 doi: 10.1016/0893-6080(92)90006-5
	Xu L, Oja E, Suen C Y. Modified Hebbian learning for curve and surface fitting. Neural Networks, 1992, 5(3): 441―457 doi: 10.1016/0893-6080(92)90006-5
111	Xu L, Krzyzak A, Oja E. A neural net for dual subspace pattern recognition methods. International Journal of Neural Systems , 1991, 2(3): 169-184 doi: 10.1142/S0129065791000169
	Xu L, Krzyzak A, Oja E. A neural net for dual subspace pattern recognition methods. International Journal of Neural Systems, 1991, 2(3): 169―184 doi: 10.1142/S0129065791000169

[1]	Lei XU. On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications[J]. Front Elect Electr Eng, 2012, 7(1): 147-196.
[2]	Hailong ZHU, Peng LIU, Jiafeng LIU, Xianglong TANG. A primary-secondary background model with sliding window PCA algorithm[J]. Front Elect Electr Eng Chin, 2011, 6(4): 528-534.
[3]	Penghui WANG, Lei SHI, Lan DU, Hongwei LIU, Lei XU, Zheng BAO. Radar HRRP statistical recognition with temporal factor analysis by automatic Bayesian Ying-Yang harmony learning[J]. Front Elect Electr Eng Chin, 2011, 6(2): 300-317.
[4]	Lei SHI, Shikui TU, Lei XU. Learning Gaussian mixture with automatic model selection: A comparative study on three Bayesian related approaches[J]. Front Elect Electr Eng Chin, 2011, 6(2): 215-244.
[5]	Changshui ZHANG, Fei WANG. Graph-based semi-supervised learning[J]. Front Elect Electr Eng Chin, 2011, 6(1): 17-26.
[6]	Lei XU. Codimensional matrix pairing perspective of BYY harmony learning: hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology[J]. Front Elect Electr Eng Chin, 2011, 6(1): 86-119.
[7]	Rongyan WANG, Gang LIU, Jun GUO, Yu FANG, . Multi-class classifier of non-speech audio based on Fisher kernel[J]. Front. Electr. Electron. Eng., 2010, 5(1): 72-76.
[8]	Mingwei CAO, Guangguo BI, . Efficient detection for cooperative communication with two alternating relays[J]. Front. Electr. Electron. Eng., 2010, 5(1): 29-35.
[9]	SONG Yangqiu, LEE Jianguo, ZHANG Changshui, XIANG Shiming. Semi-supervised Gaussian random field transduction and induction[J]. Front. Electr. Electron. Eng., 2008, 3(1): 1-9.

Viewed

Full text

Abstract

Cited

Shared

Discussed