Please wait a minute...
Frontiers of Electrical and Electronic Engineering

ISSN 2095-2732

ISSN 2095-2740(Online)

CN 10-1028/TM

Front Elect Electr Eng Chin    0, Vol. Issue () : 281-328
Bayesian Ying-Yang system, best harmony learning, and five action circling
Lei XU()
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
 Download: PDF(2265 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Firstly proposed in 1995 and systematically developed in the past decade, Bayesian Ying-Yang learning

“Ying” is spelled “Yin” in Chinese Pin Yin. To keep its original harmony with Yang, we deliberately adopted the term “Ying-Yang” since 1995.

is a statistical approach for a two pathway featured intelligent system via two complementary Bayesian representations of a joint distribution on the external observation X and its inner representation R, which can be understood from a perspective of the ancient Ying-Yang philosophy. We have q(X,R)=q(X|R)q(R) as Ying that is primary, with its structure designed according to tasks of the system, and p(X,R)=p(R|X)p(X) as Yang that is secondary, with p(X) given by samples of X while the structure of p(R|X) designed from Ying according to a Ying-Yang variety preservation principle, i.e., p(R|X) is designed as a functional with q(X|R), q(R) as its arguments. We call this pair Bayesian Ying-Yang (BYY) system. A Ying-Yang best harmony principle is proposed for learning all the unknowns in the system, in help of an implementation featured by a five action circling under the name of A5 paradigm. Interestingly, it coincides with the famous ancient WuXing theory that provides a general guide to keep the A5 circling well balanced towards a Ying-Yang best harmony. This BYY learning provides not only a general framework that accommodates typical learning approaches from a unified perspective but also a new road that leads to improved model selection criteria, Ying-Yang alternative learning with automatic model selection, as well as coordinated implementation of Ying based model selection and Yang based learning regularization.

This paper aims at an introduction of BYY learning in a twofold purpose. On one hand, we introduce fundamentals of BYY learning, including system design principles of least redundancy versus variety preservation, global learning principles of Ying-Yang harmony versus Ying-Yang matching, and local updating mechanisms of rival penalized competitive learning (RPCL) versus maximum a posteriori (MAP) competitive learning, as well as learning regularization by data smoothing and induced bias cancelation (IBC) priori. Also, we introduce basic implementing techniques, including apex approximation, primal gradient flow, Ying-Yang alternation, and Sheng-Ke-Cheng-Hui law. On the other hand, we provide a tutorial on learning algorithms for a number of typical learning tasks, including Gaussian mixture, factor analysis (FA) with independent Gaussian, binary, and non-Gaussian factors, local FA, temporal FA (TFA), hidden Markov model (HMM), hierarchical BYY, three layer networks, mixture of experts, radial basis functions (RBFs), subspace based functions (SBFs). This tutorial aims at introducing BYY learning algorithms in a comparison with typical algorithms, particularly with a benchmark of the expectation maximization (EM) algorithm for the maximum likelihood. These algorithms are summarized in a unified Ying-Yang alternation procedure with major parts in a same expression while differences simply characterized by few options in some subroutines. Additionally, a new insight is provided on the ancient Chinese philosophy of Yin-Yang and WuXing from a perspective of information science and intelligent system.

Keywords Bayesian Ying-Yang (BYY) system      Yin-Yang philosophy      best harmony      WuXing      A5 paradigm      randomized Hough transform (RHT)      rival penalized competitive learning (RPCL)      maximum a posteriori (MAP)      semi-supervised learning      automatic model selection      Gaussian mixture      factor analysis (FA)      binary FA      non-Gaussian FA      local FA      temporal FA      three layer networks      mixture of experts      radial basis function (RBF) networks      subspace based function (SBF)      state space modeling      hidden Markov model (HMM)      hierarchical BYY      apex approximation      Ying-Yang alternation     
Corresponding Author(s): XU Lei,   
Issue Date: 05 September 2010
 Cite this article:   
Lei XU. Bayesian Ying-Yang system, best harmony learning, and five action circling[J]. Front Elect Electr Eng Chin, 0, (): 281-328.
1 Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd ed. New York: John Wiley & Sons, 2001
Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd ed. New York: John Wiley & Sons, 2001
2 Xu L. Machine learning problems from optimization perspective. Journal of Global Optimization , 2010, 47: 369-401
doi: 10.1007/s10898-008-9364-0
Xu L. Machinelearning problems from optimization perspective. Journal of Global Optimization, 2010, 47: 369―401

doi: 10.1007/s10898-008-9364-0
3 Xu L. Bayesian Ying Yang learning. Scholarpedia , 2007, 2(3): 1809
4 Aster R, Borchers B, Thurber C. Parameter Estimation and Inverse Problems. New York: Elsevier Academic Press, 2004
Aster R, Borchers B, Thurber C. Parameter Estimation and Inverse Problems. New York: Elsevier Academic Press, 2004
5 Brown R G, Hwang P Y C. Introduction to Random Signals and Applied Kalman Filtering. 3rd ed. New York: John Wiley & Sons, 1997
Brown R G, Hwang P Y C. Introduction to Random Signalsand Applied Kalman Filtering. 3rd ed. New York: John Wiley & Sons, 1997
6 Narendra K S, Parthasarathy K. Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks , 1990, 1(1): 4-27
doi: 10.1109/72.80202
Narendra K S, Parthasarathy K. Identification and controlof dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1990, 1(1): 4―27

doi: 10.1109/72.80202
7 Redner R A, Walker H F. Mixture densities, maximum likelihood, and the EM algorithm. SIAM Review , 1984, 26(2): 195-239
doi: 10.1137/1026034
Redner R A, Walker H F. Mixture densities, maximumlikelihood, and the EM algorithm. SIAMReview, 1984, 26(2): 195―239

doi: 10.1137/1026034
8 Xu L, Jordan M I. On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation , 1996, 8(1): 129-151
Xu L, Jordan M I. On convergence propertiesof the EM algorithm for Gaussian mixtures. Neural Computation, 1996, 8(1): 129―151
9 Anderson T W, Rubin H. Statistical inference in factor analysis. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability . 1956, 5: 111-150
Anderson T W, Rubin H. Statistical inference infactor analysis. In: Proceedings of theThird Berkeley Symposium on Mathematical Statistics and Probability. 1956, 5: 111―150
10 Rubi D, Thayer D. EM algorithm for ML factor analysis. Psychometrika , 1976, 57: 69-76
Rubi D, Thayer D. EM algorithm for ML factoranalysis. Psychometrika, 1976, 57: 69―76
11 Bozdogan H, Ramirez D E. FACAIC: Model selection algorithm for the orthogonal factor model using AIC and FACAIC. Psychometrika , 1988, 53(3): 407-415
doi: 10.1007/BF02294221
Bozdogan H, Ramirez D E. FACAIC: Model selection algorithmfor the orthogonal factor model using AIC and FACAIC. Psychometrika, 1988, 53(3): 407―415

doi: 10.1007/BF02294221
12 Burnham K P, Anderson D R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed. New York: Springer, 2002
Burnham K P, Anderson D R. Model Selection and MultimodelInference: A Practical Information-Theoretic Approach. 2nd ed. New York: Springer, 2002
13 Tikhonov A N, Arsenin V Y. Solutions of Ill-Posed Problems. Washington: Winston and Sons, 1977
Tikhonov A N, Arsenin V Y. Solutions of Ill-Posed Problems. Washington: Winston and Sons, 1977
14 Poggio T, Girosi F. Networks for approximation and learning. Proceedings of the IEEE , 1990, 78(9): 1481-1497
doi: 10.1109/5.58326
Poggio T, Girosi F. Networks for approximationand learning. Proceedings of the IEEE, 1990, 78(9): 1481―1497

doi: 10.1109/5.58326
15 Amari S I, Cichocki A, Yang H. A new learning algorithm for blind separation of sources. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advances in Neural Information Processing System 8 . Cambridge: MIT Press, 1996, 757-763
Amari S I, Cichocki A, Yang H. A new learning algorithm for blind separation of sources. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advances in Neural Information Processing System 8. Cambridge: MIT Press, 1996, 757―763
16 Bell A J, Sejnowski T J. An information-maximization approach to blind separation and blind deconvolution. Neural Computation , 1995, 7(6): 1129-1159
doi: 10.1162/neco.1995.7.6.1129
Bell A J, Sejnowski T J. An information-maximizationapproach to blind separation and blind deconvolution. Neural Computation, 1995, 7(6): 1129―1159

doi: 10.1162/neco.1995.7.6.1129
17 Xu L. Independent component analysis and extensions with noise and time: A Bayesian Ying-Yang learning perspective. Neural Information Processing — Letters and Reviews , 2003, 1(1): 1-52
Xu L. Independentcomponent analysis and extensions with noise and time: A BayesianYing-Yang learning perspective. NeuralInformation Processing — Letters and Reviews, 2003, 1(1): 1―52
18 Xu L. Independent subspaces. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence, Hershey(PA): IGI Global. 2008, 903-912
Xu L. Independentsubspaces. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence, Hershey(PA): IGI Global. 2008, 903―912
19 Xu L. Least mean square error reconstruction principle for self-organizing neural-nets. Neural Networks , 1993, 6(5): 627-648
doi: 10.1016/S0893-6080(05)80107-8
Xu L. Leastmean square error reconstruction principle for self-organizing neural-nets. Neural Networks, 1993, 6(5): 627―648

doi: 10.1016/S0893-6080(05)80107-8
20 McLachlan G J, Krishnan T. The EM Algorithms and Extensions. New York: John Wiley & Sons, 1997
McLachlan G J, Krishnan T. The EM Algorithms and Extensions. New York: John Wiley & Sons, 1997
21 Dempster A P, Laird N M, Rubin D B. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B , 1977, 39(1): 1-38
Dempster A P, Laird N M, Rubin D B. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: SeriesB, 1977, 39(1): 1―38
22 Amari S. Information geometry of the EM and EM algorithms for neural networks. Neural Networks , 1995, 8(9): 1379-1408
doi: 10.1016/0893-6080(95)00003-8
Amari S. Informationgeometry of the EM and EM algorithms for neural networks. Neural Networks, 1995, 8(9): 1379―1408

doi: 10.1016/0893-6080(95)00003-8
23 Grenander U, Miller M. Pattern theory: From representation to inference. Oxford: Oxford University Press, 2007
Grenander U, Miller M. Pattern theory: From representationto inference. Oxford: Oxford University Press, 2007
24 Mumford D. On the computational architecture of the neocortex II: The role of cortico-cortical loops. Biological Cybernetics , 1992, 66(3): 241-251
doi: 10.1007/BF00198477
Mumford D. Onthe computational architecture of the neocortex II: The role of cortico-corticalloops. Biological Cybernetics, 1992, 66(3): 241―251

doi: 10.1007/BF00198477
25 Friston K. A theory of cortical responses. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences , 2005, 360(1456): 815-836
doi: 10.1098/rstb.2005.1622
Friston K. Atheory of cortical responses. PhilosophicalTransactions of the Royal Society of London. Series B: BiologicalSciences, 2005, 360(1456): 815―836

doi: 10.1098/rstb.2005.1622
26 Yuille A L, Kersten D. Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences , 2006, 10(7): 301-308
doi: 10.1016/j.tics.2006.05.002
Yuille A L, Kersten D. Vision as Bayesian inference:Analysis by synthesis? Trends in CognitiveSciences, 2006, 10(7): 301―308

doi: 10.1016/j.tics.2006.05.002
27 Schwarz G. Estimating the dimension of a model. Annals of Statistics , 1978, 6(2): 461-464
doi: 10.1214/aos/1176344136
Schwarz G. Estimatingthe dimension of a model. Annals of Statistics, 1978, 6(2): 461―464

doi: 10.1214/aos/1176344136
28 Rissanen J. Modeling by shortest data description. Automatica , 1978, 14: 465-471
doi: 10.1016/0005-1098(78)90005-5
Rissanen J. Modelingby shortest data description. Automatica, 1978, 14: 465―471

doi: 10.1016/0005-1098(78)90005-5
29 Rissanen J. Information and Complexity in Statistical Modeling. New York: Springer, 2007
Rissanen J. Informationand Complexity in Statistical Modeling. New York: Springer, 2007
30 DeGroot M H. Optimal Statistical Decisions. Hooken: Wiley Classics Library, 2004
DeGroot M H. Optimal Statistical Decisions. Hooken: Wiley Classics Library, 2004
31 Mackay D J C. A practical Bayesian framework for backpropagation networks. Neural Computation , 1992, 4(3): 448-472
doi: 10.1162/neco.1992.4.3.448
Mackay D J C. A practical Bayesian framework for backpropagation networks. Neural Computation, 1992, 4(3): 448―472

doi: 10.1162/neco.1992.4.3.448
32 MacKay D. Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press, 2003
MacKay D. InformationTheory, Inference, and Learning Algorithms. Cambridge: Cambridge UniversityPress, 2003
33 Wallace C S, Boulton D M. An information measure for classification. Computer Journal , 1968, 11(2): 185-194
Wallace C S, Boulton D M. An information measure forclassification. Computer Journal, 1968, 11(2): 185―194
34 Wallace C S, Dowe D R. Minimum message length and Kolmogorov complexity. Computer Journal , 1999, 42(4): 270-280
doi: 10.1093/comjnl/42.4.270
Wallace C S, Dowe D R. Minimum message length andKolmogorov complexity. Computer Journal, 1999, 42(4): 270―280

doi: 10.1093/comjnl/42.4.270
35 Bourlard H, Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics , 1988, 59: 291-294
doi: 10.1007/BF00332918
Bourlard H, Kamp Y. Auto-association by multilayerperceptrons and singular value decomposition. Biological Cybernetics, 1988, 59: 291―294

doi: 10.1007/BF00332918
36 Palmieri F, Zhu J, Chang C. Anti-Hebbian learning in topologically constrained linear networks: A tutorial. IEEE Transactions on Neural Networks , 1993, 4(5): 748-761
doi: 10.1109/72.248453
Palmieri F, Zhu J, Chang C. Anti-Hebbian learning in topologically constrained linearnetworks: A tutorial. IEEE Transactionson Neural Networks, 1993, 4(5): 748―761

doi: 10.1109/72.248453
37 Grossberg S, Carpenter G A. Adaptive resonance theory. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks . 2nd ed. Cambridge: MIT Press, 2002, 87-90
Grossberg S, Carpenter G A. Adaptive resonance theory. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks. 2nd ed. Cambridge: MIT Press, 2002, 87―90
38 Carpenter G A, Grossberg S. A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing , 1987, 37: 54-115
doi: 10.1016/S0734-189X(87)80014-2
Carpenter G A, Grossberg S. A massively parallel architecturefor a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 1987, 37: 54―115

doi: 10.1016/S0734-189X(87)80014-2
39 Kawato M. Cerebellum and motor control. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks . 2nd ed. Cambridge: MIT Press, 2002, 190-195
Kawato M. Cerebellumand motor control. In: Arbib M A, ed. The Handbook of Brain Theoryand Neural Networks. 2nd ed. Cambridge: MIT Press, 2002, 190―195
40 Shidara M, Kawano K, Gomi H, Kawato M. Inversedynamics model eye movement control by Purkinje cells in the cerebellum. Nature , 1993, 365(6441): 50-52
doi: 10.1038/365050a0
Shidara M, Kawano K, Gomi H, Kawato M. Inversedynamicsmodel eye movement control by Purkinje cells in the cerebellum. Nature, 1993, 365(6441): 50―52

doi: 10.1038/365050a0
41 Wolpert D, Kawato M. Multiple paired forward and inverse models for motor control. Neural Networks , 1998, 11(7-8): 1317-1329
doi: 10.1016/S0893-6080(98)00066-5
Wolpert D, Kawato M. Multiple paired forward andinverse models for motor control. NeuralNetworks, 1998, 11(7―8): 1317―1329

doi: 10.1016/S0893-6080(98)00066-5
42 Hinton G E, Dayan P, Frey B J, Neal R N. The wake-sleep algorithm for unsupervised learning neural networks. Science , 1995, 268(5214): 1158-1160
doi: 10.1126/science.7761831
Hinton G E, Dayan P, Frey B J, Neal R N. The wake-sleepalgorithm for unsupervised learning neural networks. Science, 1995, 268(5214): 1158―1160

doi: 10.1126/science.7761831
43 Dayan P, Hinton G E, Neal R M, Zemel R S. The Helmholtz machine. Neural Computation , 1995, 7(5): 889-904
doi: 10.1162/neco.1995.7.5.889
Dayan P, Hinton G E, Neal R M, Zemel R S. The Helmholtzmachine. Neural Computation, 1995, 7(5): 889―904

doi: 10.1162/neco.1995.7.5.889
44 Jaakkola T S. Tutorial on variational approximation methods. In: Opper M, Saad D, eds. Advanced Mean Field Methods: Theory and Practice . Cambridge: MIT press, 2001, 129-160
Jaakkola T S. Tutorial on variational approximation methods. In: Opper M, Saad D, eds. Advanced Mean Field Methods: Theory and Practice. Cambridge: MIT press, 2001, 129―160
45 Jordan M, Ghahramani Z, Jaakkola T, Saul L. Introduction to variational methods for graphical models. Machine Learning , 1999, 37(2): 183-233
doi: 10.1023/A:1007665907178
Jordan M, Ghahramani Z, Jaakkola T, Saul L. Introductionto variational methods for graphical models. Machine Learning, 1999, 37(2): 183―233

doi: 10.1023/A:1007665907178
46 Corduneanu A, Bishop CM. Variational Bayesian model selection for mixture distributions. In: Jaakkola T, Richardson T, eds. Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics . 2001, 27-34
Corduneanu A, Bishop CM. Variational Bayesian modelselection for mixture distributions. In: Jaakkola T, Richardson T, eds. Proceedings of the Eighth International Conference on ArtificialIntelligence and Statistics. 2001, 27―34
47 Xu L. Bayesian-Kullback coupled YING-YANG machines: Unified learning and new results on vector quantization. In: Proceedings of the International Conference on Neural Information Processing . 1995, 977-988 (A further version in NIPS8. In: Touretzky D S, et al. eds. Cambridge: MIT Press, 444-450)
Xu L. Bayesian-Kullbackcoupled YING-YANG machines: Unified learning and new results on vectorquantization. In: Proceedings of the InternationalConference on Neural Information Processing. 1995, 977―988 (A further version in NIPS8. In: Touretzky D S, et al. eds. Cambridge:MIT Press, 444―450)
48 Xu L. Ying-Yang learning. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks . 2nd ed. Cambridge: MIT Press, 2002, 1231-1237
Xu L. Ying-Yanglearning. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks. 2nd ed. Cambridge: MIT Press, 2002, 1231―1237
49 Xu L. Advances on BYY harmony learning: Information theoretic perspective, generalized projection geometry, and independent factor auto-determination. IEEE Transactions on Neural Networks , 2004, 15(4): 885-902
doi: 10.1109/TNN.2004.828767
Xu L. Advanceson BYY harmony learning: Information theoretic perspective, generalizedprojection geometry, and independent factor auto-determination. IEEE Transactions on Neural Networks, 2004, 15(4): 885―902

doi: 10.1109/TNN.2004.828767
50 Xu L. Learning algorithms for RBF functions and subspace based functions. In: Olivas E, . eds. Handbook of Research on Machine Learning, Applications and Trends: Algorithms, Methods and Techniques. Hershey(PA): IGI Global, 2009, 60-94
Xu L. Learningalgorithms for RBF functions and subspace based functions. In: Olivas E, et al. eds. Handbook of Research on Machine Learning, Applications and Trends:Algorithms, Methods and Techniques. Hershey(PA): IGI Global, 2009, 60―94
51 Xu L. Bayesian Ying Yang system, best harmony learning, and Gaussian manifold based family. In: Zurada. eds. Computational Intelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures. Lecture Notes in Computer Science , 2008, 5050: 48-78
Xu L. BayesianYing Yang system, best harmony learning, and Gaussian manifold basedfamily. In: Zuradaet al. eds. ComputationalIntelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures.Lecture Notes in Computer Science, 2008, 5050: 48―78
52 Xu L, Oja E. Randomized Hough transform. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 1354-1361
Xu L, Oja E. Randomized Hough transform. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 1354―1361
53 Veith I. The Yellow Emperor’s Classic of Internal Medicine. Berkeley: University of California Press, 1972
Veith I. TheYellow Emperor’s Classic of Internal Medicine. Berkeley: University of CaliforniaPress, 1972
54 Vapnik, V. Estimation of Dependences Based on Empirical Data. Springer, 2006
Vapnik, V. Estimationof Dependences Based on Empirical Data. Springer, 2006
55 Stone M. Cross-validation: A review. Mathematics, Operations and Statistics , 1978, 9(1): 127-140
Stone M. Cross-validation:A review. Mathematics, Operations and Statistics, 1978, 9(1): 127―140
56 Rivals I, Personnaz L. On cross validation for model selection. Neural Computation , 1999, 11(4): 863-870
doi: 10.1162/089976699300016476
Rivals I, Personnaz L. On cross validation for modelselection. Neural Computation, 1999, 11(4): 863―870

doi: 10.1162/089976699300016476
57 Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control , 1974, 19(6): 714-723
doi: 10.1109/TAC.1974.1100705
Akaike H. Anew look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19(6): 714―723

doi: 10.1109/TAC.1974.1100705
58 Bozdogan H. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extension. Psychometrika , 1987, 52(3): 345-370
doi: 10.1007/BF02294361
Bozdogan H. Modelselection and Akaike’s information criterion (AIC): The generaltheory and its analytical extension. Psychometrika, 1987, 52(3): 345―370

doi: 10.1007/BF02294361
59 Cavanaugh J E. Unifying the derivations for the Akaike and corrected Akaike information criteria. Statistics & Probability Letters , 1997, 33(2): 201-208
doi: 10.1016/S0167-7152(96)00128-9
Cavanaugh J E. Unifying the derivations for the Akaike and corrected Akaike informationcriteria. Statistics & ProbabilityLetters, 1997, 33(2): 201―208

doi: 10.1016/S0167-7152(96)00128-9
60 Williams P M. Bayesian regularization and pruning using a Laplace prior. Neural Computation , 1995, 7(1): 117-143
doi: 10.1162/neco.1995.7.1.117
Williams P M. Bayesian regularization and pruning using a Laplace prior. Neural Computation, 1995, 7(1): 117―143

doi: 10.1162/neco.1995.7.1.117
61 Tibshirani R, Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B , 1996, 58(1): 267-288
Tibshirani R, Regressionshrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 1996, 58(1): 267―288
62 MacKay D J C. Bayesian interpolation. Neural Computation , 1992, 4(3): 415-447
doi: 10.1162/neco.1992.4.3.415
MacKay D J C. Bayesian interpolation. Neural Computation, 1992, 4(3): 415―447

doi: 10.1162/neco.1992.4.3.415
63 Salah A A, Alpaydin E. Incremental mixtures of factor analyzers. In: Proceedings the 17th International Conference on Pattern Recognition . 2004, 1: 276-279
Salah A A, Alpaydin E. Incremental mixtures of factor analyzers.In: Proceedings the 17th International Conferenceon Pattern Recognition. 2004, 1: 276―279
64 Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net and curve detection. IEEE Transactions on Neural Networks , 1993, 4(4): 636-649
doi: 10.1109/72.238318
Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis,RBF net and curve detection. IEEE Transactionson Neural Networks, 1993, 4(4): 636―649

doi: 10.1109/72.238318
65 Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rival penalized competitive learning. In: Proceedings of the 11th International Conference on Pattern Recognition . 1992, I: 672-675
Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rivalpenalized competitive learning. In: Proceedingsof the 11th International Conference on Pattern Recognition. 1992, I: 672―675
66 Xu L. Rival penalized competitive learning. Scholarpedia , 2007, 2(8): 1810
67 Corduneanu A, Bishop C M. Variational Bayesian model selection for mixture distributions. In: Richardson T, Jaakkola T, eds. Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics . 2001, 27-34
Corduneanu A, Bishop C M. Variational Bayesian modelselection for mixture distributions. In: Richardson T, Jaakkola T, eds. Proceedings of the Eighth International Conference on ArtificialIntelligence and Statistics. 2001, 27―34
68 McGrory C A, Titterington D M. Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis , 2007, 51(11): 5352-5367
doi: 10.1016/j.csda.2006.07.020
McGrory C A, Titterington D M. Variational approximationsin Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis, 2007, 51(11): 5352―5367

doi: 10.1016/j.csda.2006.07.020
69 Tu S, Xu L. A study of several model selection criteria for determining the number of signals. In: Proceedings of 2010 IEEE International Conference on Acoustics, Speech and Signal Processing . 2010, 1966-1969
Tu S, Xu L. A study of several modelselection criteria for determining the number of signals. In: Proceedings of 2010 IEEE International Conferenceon Acoustics, Speech and Signal Processing. 2010, 1966―1969
70 Xu L. Fundamentals, challenges, and advances of statistical learning for knowledge discovery and problem solving: A BYY harmony perspective, keynote talk. In: Proceedings of the International Conference on Neural Networks and Brain . 2005, 1: 24-55
Xu L. Fundamentals,challenges, and advances of statistical learning for knowledge discoveryand problem solving: A BYY harmony perspective, keynote talk. In: Proceedings of the International Conferenceon Neural Networks and Brain. 2005, 1: 24―55
71 Hinton G E, Zemel R S. Autoencoders, minimum description length and Helmholtz free energy. In: Cowan J D, Tesauro G, Alspector J, eds. Advances in Neural Information Processing Systems 6 . San Mateo: Morgan Kaufmann, 1994, 449-455
Hinton G E, Zemel R S. Autoencoders, minimum descriptionlength and Helmholtz free energy. In: Cowan J D, Tesauro G, Alspector J, eds. Advances in Neural Information Processing Systems 6. San Mateo: Morgan Kaufmann, 1994, 449―455
72 Xu L. Data smoothing regularization, multi-sets-learning, and problem solving strategies. Neural Networks , 2003, 16(5-6): 817-825
doi: 10.1016/S0893-6080(03)00119-9
Xu L. Datasmoothing regularization, multi-sets-learning, and problem solvingstrategies. Neural Networks, 2003, 16(5―6): 817―825

doi: 10.1016/S0893-6080(03)00119-9
73 Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (I) Unsupervised and semi-unsupervised learning. In: Amari S, Kassabov N, eds. Brain-like Computing and Intelligent Information Systems. Springer-Verlag , 1997, 241-274
Xu L. BayesianYing Yang system and theory as a unified statistical learning approach:(I) Unsupervised and semi-unsupervised learning. In: Amari S, Kassabov N, eds. Brain-like Computing and Intelligent Information Systems. Springer-Verlag, 1997, 241―274
74 Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (II) From unsupervised learning to supervised learning and temporal modeling and (III) Models and algorithms for dependence reduction, data dimension reduction, ICA and supervised learning. In: Wong K M, King I, Yeung D Y, eds. Proceedings of Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective . 1997: 25-60
Xu L. BayesianYing Yang system and theory as a unified statistical learning approach:(II) From unsupervised learning to supervised learning and temporalmodeling and (III) Models and algorithms for dependence reduction,data dimension reduction, ICA and supervised learning.In: Wong K M, King I, Yeung D Y, eds. Proceedings of Theoretical Aspects of Neural Computation: A MultidisciplinaryPerspective. 1997: 25―60
75 Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach (VII): Data smoothing. In: Proceedings of the International Conference on Neural Information Processing . 1998, 1: 243-248
Xu L. BayesianYing Yang system and theory as a unified statistical learning approach(VII): Data smoothing. In: Proceedingsof the International Conference on Neural Information Processing. 1998, 1: 243―248
76 Bishop C M. Training with noise is equivalent to Tikhonov regularization. Neural Computation , 1995, 7(1): 108-116
doi: 10.1162/neco.1995.7.1.108
Bishop C M. Training with noise is equivalent to Tikhonov regularization. Neural Computation, 1995, 7(1): 108―116

doi: 10.1162/neco.1995.7.1.108
77 Xu L. A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving. Pattern Recognition , 2007, 40(8): 2129-2153
doi: 10.1016/j.patcog.2006.12.016
Xu L. Aunified perspective and new results on RHT computing, mixture basedlearning, and multi-learner based problem solving. Pattern Recognition, 2007, 40(8): 2129―2153

doi: 10.1016/j.patcog.2006.12.016
78 Xu L, Oja E, Kultanen P. A new curve detection method randomized Hough transform (RHT). Pattern Recognition Letters , 1990, 11(5): 331-338
doi: 10.1016/0167-8655(90)90042-Z
Xu L, Oja E, Kultanen P. A new curve detection method randomized Hough transform(RHT). Pattern Recognition Letters, 1990, 11(5): 331―338

doi: 10.1016/0167-8655(90)90042-Z
79 Hough P V C. Method and means for recognizing complex patterns. US Patent, 3069654, 1962-12-18
Hough P V C. Method and means for recognizing complex patterns. US Patent, 3069654, 1962-12-18
80 Xu L. Best harmony, unified RPCL and automated model selection for unsupervised and supervised learning on Gaussian mixtures, ME-RBF models and three-layer nets. International Journal of Neural Systems , 2001, 11(1): 3-69
doi: 10.1016/S0129-0657(01)00049-7
Xu L. Bestharmony, unified RPCL and automated model selection for unsupervisedand supervised learning on Gaussian mixtures, ME-RBF models and three-layernets. International Journal of Neural Systems, 2001, 11(1): 3―69

doi: 10.1016/S0129-0657(01)00049-7
81 Xu L. Bayesian Ying-Yang learning theory for data dimension reduction and determination. Journal of Computational Intelligence in Finance , 1998, 6(5): 6-18
Xu L. BayesianYing-Yang learning theory for data dimension reduction and determination. Journal of Computational Intelligence in Finance, 1998, 6(5): 6―18
82 Tu S, Xu L. Theoretical analysis and comparison of several criteria on linear model dimension reduction. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and Signal Separation. Lecture Notes in Computer Science , 2009, 5441: 154-162
Tu S, Xu L. Theoretical analysis andcomparison of several criteria on linear model dimension reduction. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and SignalSeparation. Lecture Notes in Computer Science, 2009, 5441: 154―162
83 Xu L. BYY harmony learning, independent state space and generalized APT financial analyses. IEEE Transactions on Neural Networks , 2001, 12(4): 822-849
doi: 10.1109/72.935094
Xu L. BYYharmony learning, independent state space and generalized APT financialanalyses. IEEE Transactions on Neural Networks, 2001, 12(4): 822―849

doi: 10.1109/72.935094
84 Xu L. Temporal BYY encoding, Markovian state spaces, and space dimension determination. IEEE Transactions on Neural Networks , 2004, 15(5): 1276-1295
doi: 10.1109/TNN.2004.833302
Xu L. TemporalBYY encoding, Markovian state spaces, and space dimension determination. IEEE Transactions on Neural Networks, 2004, 15(5): 1276―1295

doi: 10.1109/TNN.2004.833302
85 Kalman R E. A new approach to linear filtering and prediction problems. Transactions of the ASME Journal of Basic Engineering , 1960, 35-45
Kalman R E. A new approach to linear filtering and prediction problems. Transactions of the ASME Journal of Basic Engineering, 1960, 35―45
86 Sun K, Tu S, Gao D Y, Xu L. Canonical dual approach to binary factor analysis. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and Signal Separation. Lecture Notes in Computer Science , 2009, 5441: 346-353
Sun K, Tu S, Gao D Y, Xu L. Canonical dualapproach to binary factor analysis. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and SignalSeparation. Lecture Notes in Computer Science, 2009, 5441: 346―353
87 Nathan S. Science and medicine in imperial China — The state of the field. The Journal of Asian Studies , 1988, 47(1): 41-90
doi: 10.2307/2056359
Nathan S. Scienceand medicine in imperial China — The state of the field. The Journal of Asian Studies, 1988, 47(1): 41―90

doi: 10.2307/2056359
88 Wilhelm R, Baynes C. The I Ching or Book of Changes, with Foreword by Carl Jung. 3rd ed. Bollingen Series XIX . Princeton: Princeton University Press, 1967
Wilhelm R, Baynes C. The I Ching or Book of Changes,with Foreword by Carl Jung. 3rd ed. Bollingen Series XIX. Princeton: Princeton UniversityPress, 1967
89 Hansen C. A Daoist Theory of Chinese Thought: A Philosophical Interpretation. New York: Oxford University Press, 2000
Hansen C. ADaoist Theory of Chinese Thought: A Philosophical Interpretation. New York: Oxford University Press, 2000
90 Shilov G E, Gurevich B L. Integral, Measure, and Derivative: A Unified Approach. Silverman R trans. New York: Dover Publications , 1978
Shilov G E, Gurevich B L. Integral, Measure, and Derivative:A Unified Approach. Silverman R trans. New York: Dover Publications, 1978
91 Ali S M, Silvey S D. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society: Series B , 1966, 28(1): 131-140
Ali S M, Silvey S D. A general class of coefficientsof divergence of one distribution from another. Journal of the Royal Statistical Society: Series B, 1966, 28(1): 131―140
92 Kullback S, Leibler R A. On information and sufficiency. Annals of Mathematical Statistics , 1951, 22(1): 79-86
doi: 10.1214/aoms/1177729694
Kullback S, Leibler R A. On information and sufficiency. Annals of Mathematical Statistics, 1951, 22(1): 79―86

doi: 10.1214/aoms/1177729694
93 Shore J. Minimum cross-entropy spectral analysis. IEEE Transactions on Acoustics, Speech and Signal Process , 1981, 29(2): 230-237
doi: 10.1109/TASSP.1981.1163539
Shore J. Minimumcross-entropy spectral analysis. IEEE Transactionson Acoustics, Speech and Signal Process, 1981, 29(2): 230―237

doi: 10.1109/TASSP.1981.1163539
94 Burg J P, Luenberger D G, Wenger D L. Estimation of structured covariance matrices. Proceedings of the IEEE , 1982, 70(9): 963-974
doi: 10.1109/PROC.1982.12427
Burg J P, Luenberger D G, Wenger D L. Estimation of structured covariance matrices. Proceedings of the IEEE, 1982, 70(9): 963―974

doi: 10.1109/PROC.1982.12427
95 Jaynes E T. Information theory and statistical mechanics. Physical Review , 1957, 106(4): 620-630
doi: 10.1103/PhysRev.106.620
Jaynes E T. Information theory and statistical mechanics. Physical Review, 1957, 106(4): 620―630

doi: 10.1103/PhysRev.106.620
96 Xu L. Temporal BYY learning for state space approach, hidden Markov model and blind source separation. IEEE Transactions on Signal Processing , 2000, 48(7): 2132-2144
doi: 10.1109/78.847796
Xu L. TemporalBYY learning for state space approach, hidden Markov model and blindsource separation. IEEE Transactions onSignal Processing, 2000, 48(7): 2132―2144

doi: 10.1109/78.847796
97 Jeffreys H. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences , 1946, 186(1007): 453-461
doi: 10.1098/rspa.1946.0056
Jeffreys H. Aninvariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. SeriesA: Mathematical and Physical Sciences, 1946, 186(1007): 453―461

doi: 10.1098/rspa.1946.0056
98 Xu L. BYY learning system and theory for parameter estimation, data smoothing based regularization and model selection. Neural, Parallel and Scientific Computations , 2000, 8(1): 55-82
Xu L. BYYlearning system and theory for parameter estimation, data smoothingbased regularization and model selection. Neural, Parallel and Scientific Computations, 2000, 8(1): 55―82
99 Xu L. BYY Σ-П factor systems and harmony learning. Invited talk. In: Proceedings of International Conference on Neural Information Processing (ICONIP’2000) . 2000, 1: 548-558
Xu L. BYYΣ-П factor systems and harmony learning. Invited talk. In: Proceedings of International Conference onNeural Information Processing (ICONIP’2000). 2000, 1: 548―558
100 Xu L. Bayesian Ying Yang learning. In: Zhong N, Liu J, eds. Intelligent Technologies for Information Analysis . Berlin: Springer, 2004, 615-706
Xu L. BayesianYing Yang learning. In: Zhong N, Liu J, eds. Intelligent Technologiesfor Information Analysis. Berlin: Springer, 2004, 615―706
101 Barron A, Rissanen J, Yu B. The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory , 1998, 44(6): 2743-2760
doi: 10.1109/18.720554
Barron A, Rissanen J, Yu B. The minimum description length principle in coding andmodeling. IEEE Transactions on InformationTheory, 1998, 44(6): 2743―2760

doi: 10.1109/18.720554
102 Xu L, Amari S. Combining classifiers and learning mixtureof-experts. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 318-326
Xu L, Amari S. Combining classifiers andlearning mixtureof-experts. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 318―326
103 Xu L. BYY learning, regularized implementation, and model selection on modular networks with one hidden layer of binary units. Neurocomputing , 2003, 51: 277-301 (Errata on Neurocomputing, 2003, 55(1-2): 405-406)
Xu L. BYYlearning, regularized implementation, and model selection on modularnetworks with one hidden layer of binary units. Neurocomputing, 2003, 51: 277―301 (Errata on Neurocomputing, 2003, 55(1―2): 405―406)
104 Gales M J F, Young S. The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing , 2008, 1(3): 195-304
doi: 10.1561/2000000004
Gales M J F, Young S. The application of hiddenMarkov models in speech recognition. Foundationsand Trends in Signal Processing, 2008, 1(3): 195―304

doi: 10.1561/2000000004
105 Su D, Wu X H, Xu L. GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection. In: Proceedings of 2010 IEEE International Conference on Acoustics, Speech and Signal Processing . 2010, 4890-4893
Su D, Wu X H, Xu L. GMM-HMM acoustic model training by a two level procedurewith Gaussian components determined by automatic model selection. In: Proceedings of 2010 IEEE International Conferenceon Acoustics, Speech and Signal Processing. 2010, 4890―4893
106 Rosti A V, Gales M. Factor analysed hidden Markov models for speech recognition. Computer Speech and Language , 2004, 18(2): 181-200
doi: 10.1016/j.csl.2003.09.004
Rosti A V, Gales M. Factor analysed hidden Markovmodels for speech recognition. ComputerSpeech and Language, 2004, 18(2): 181―200

doi: 10.1016/j.csl.2003.09.004
107 Gales M J F. Discriminative models for speech recognition. In: Proceedings of Information Theory and Applications Workshop . 2007, 170-176
Gales M J F. Discriminative models for speech recognition. In: Proceedings of Information Theory and Applications Workshop. 2007, 170―176
108 Woodland P C, Povey D. Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech and Language , 2002, 16(1): 25-47
doi: 10.1006/csla.2001.0182
Woodland P C, Povey D. Large scale discriminativetraining of hidden Markov models for speech recognition. Computer Speech and Language, 2002, 16(1): 25―47

doi: 10.1006/csla.2001.0182
109 Csiszár I, Tusnády G. Information geometry and alternating minimization procedures. Statistics and Decisions , 1984, (Suppl. 1): 205-237
Csiszár I, Tusnády G. Information geometry andalternating minimization procedures. Statisticsand Decisions, 1984, (Suppl. 1): 205―237
110 Xu L, Oja E, Suen C Y. Modified Hebbian learning for curve and surface fitting. Neural Networks , 1992, 5(3): 441-457
doi: 10.1016/0893-6080(92)90006-5
Xu L, Oja E, Suen C Y. Modified Hebbian learning for curve and surface fitting. Neural Networks, 1992, 5(3): 441―457

doi: 10.1016/0893-6080(92)90006-5
111 Xu L, Krzyzak A, Oja E. A neural net for dual subspace pattern recognition methods. International Journal of Neural Systems , 1991, 2(3): 169-184
doi: 10.1142/S0129065791000169
Xu L, Krzyzak A, Oja E. A neural net for dual subspace pattern recognition methods. International Journal of Neural Systems, 1991, 2(3): 169―184

doi: 10.1142/S0129065791000169
[1] Lei XU. On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications[J]. Front Elect Electr Eng, 2012, 7(1): 147-196.
[2] Hailong ZHU, Peng LIU, Jiafeng LIU, Xianglong TANG. A primary-secondary background model with sliding window PCA algorithm[J]. Front Elect Electr Eng Chin, 2011, 6(4): 528-534.
[3] Penghui WANG, Lei SHI, Lan DU, Hongwei LIU, Lei XU, Zheng BAO. Radar HRRP statistical recognition with temporal factor analysis by automatic Bayesian Ying-Yang harmony learning[J]. Front Elect Electr Eng Chin, 2011, 6(2): 300-317.
[4] Lei SHI, Shikui TU, Lei XU. Learning Gaussian mixture with automatic model selection: A comparative study on three Bayesian related approaches[J]. Front Elect Electr Eng Chin, 2011, 6(2): 215-244.
[5] Changshui ZHANG, Fei WANG. Graph-based semi-supervised learning[J]. Front Elect Electr Eng Chin, 2011, 6(1): 17-26.
[6] Lei XU. Codimensional matrix pairing perspective of BYY harmony learning: hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology[J]. Front Elect Electr Eng Chin, 2011, 6(1): 86-119.
[7] Rongyan WANG, Gang LIU, Jun GUO, Yu FANG, . Multi-class classifier of non-speech audio based on Fisher kernel[J]. Front. Electr. Electron. Eng., 2010, 5(1): 72-76.
[8] Mingwei CAO, Guangguo BI, . Efficient detection for cooperative communication with two alternating relays[J]. Front. Electr. Electron. Eng., 2010, 5(1): 29-35.
[9] SONG Yangqiu, LEE Jianguo, ZHANG Changshui, XIANG Shiming. Semi-supervised Gaussian random field transduction and induction[J]. Front. Electr. Electron. Eng., 2008, 3(1): 1-9.
Full text


