|
|
Bayesian Ying-Yang system, best harmony learning, and five action circling |
Lei XU( ) |
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China |
|
|
Abstract Firstly proposed in 1995 and systematically developed in the past decade, Bayesian Ying-Yang learning“Ying” is spelled “Yin” in Chinese Pin Yin. To keep its original harmony with Yang, we deliberately adopted the term “Ying-Yang” since 1995. is a statistical approach for a two pathway featured intelligent system via two complementary Bayesian representations of a joint distribution on the external observation X and its inner representation R, which can be understood from a perspective of the ancient Ying-Yang philosophy. We have q(X,R)=q(X|R)q(R) as Ying that is primary, with its structure designed according to tasks of the system, and p(X,R)=p(R|X)p(X) as Yang that is secondary, with p(X) given by samples of X while the structure of p(R|X) designed from Ying according to a Ying-Yang variety preservation principle, i.e., p(R|X) is designed as a functional with q(X|R), q(R) as its arguments. We call this pair Bayesian Ying-Yang (BYY) system. A Ying-Yang best harmony principle is proposed for learning all the unknowns in the system, in help of an implementation featured by a five action circling under the name of A5 paradigm. Interestingly, it coincides with the famous ancient WuXing theory that provides a general guide to keep the A5 circling well balanced towards a Ying-Yang best harmony. This BYY learning provides not only a general framework that accommodates typical learning approaches from a unified perspective but also a new road that leads to improved model selection criteria, Ying-Yang alternative learning with automatic model selection, as well as coordinated implementation of Ying based model selection and Yang based learning regularization. This paper aims at an introduction of BYY learning in a twofold purpose. On one hand, we introduce fundamentals of BYY learning, including system design principles of least redundancy versus variety preservation, global learning principles of Ying-Yang harmony versus Ying-Yang matching, and local updating mechanisms of rival penalized competitive learning (RPCL) versus maximum a posteriori (MAP) competitive learning, as well as learning regularization by data smoothing and induced bias cancelation (IBC) priori. Also, we introduce basic implementing techniques, including apex approximation, primal gradient flow, Ying-Yang alternation, and Sheng-Ke-Cheng-Hui law. On the other hand, we provide a tutorial on learning algorithms for a number of typical learning tasks, including Gaussian mixture, factor analysis (FA) with independent Gaussian, binary, and non-Gaussian factors, local FA, temporal FA (TFA), hidden Markov model (HMM), hierarchical BYY, three layer networks, mixture of experts, radial basis functions (RBFs), subspace based functions (SBFs). This tutorial aims at introducing BYY learning algorithms in a comparison with typical algorithms, particularly with a benchmark of the expectation maximization (EM) algorithm for the maximum likelihood. These algorithms are summarized in a unified Ying-Yang alternation procedure with major parts in a same expression while differences simply characterized by few options in some subroutines. Additionally, a new insight is provided on the ancient Chinese philosophy of Yin-Yang and WuXing from a perspective of information science and intelligent system.
|
Keywords
Bayesian Ying-Yang (BYY) system
Yin-Yang philosophy
best harmony
WuXing
A5 paradigm
randomized Hough transform (RHT)
rival penalized competitive learning (RPCL)
maximum a posteriori (MAP)
semi-supervised learning
automatic model selection
Gaussian mixture
factor analysis (FA)
binary FA
non-Gaussian FA
local FA
temporal FA
three layer networks
mixture of experts
radial basis function (RBF) networks
subspace based function (SBF)
state space modeling
hidden Markov model (HMM)
hierarchical BYY
apex approximation
Ying-Yang alternation
|
Corresponding Author(s):
XU Lei,Email:lxu@cse.cuhk.edu.hk
|
Issue Date: 05 September 2010
|
|
1 |
Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd ed. New York: John Wiley & Sons, 2001
|
|
Duda R O, Hart P E, Stork D G. Pattern Classification. 2nd ed. New York: John Wiley & Sons, 2001
|
2 |
Xu L. Machine learning problems from optimization perspective. Journal of Global Optimization , 2010, 47: 369-401 doi: 10.1007/s10898-008-9364-0
|
|
Xu L. Machinelearning problems from optimization perspective. Journal of Global Optimization, 2010, 47: 369―401
doi: 10.1007/s10898-008-9364-0
|
3 |
Xu L. Bayesian Ying Yang learning. Scholarpedia , 2007, 2(3): 1809http://scholarpedia.org/article/Bayesian_Ying_Yang_learning
|
|
scholarpedia.org/article/Bayesian_Ying_Yang_learning
|
4 |
Aster R, Borchers B, Thurber C. Parameter Estimation and Inverse Problems. New York: Elsevier Academic Press, 2004
|
|
Aster R, Borchers B, Thurber C. Parameter Estimation and Inverse Problems. New York: Elsevier Academic Press, 2004
|
5 |
Brown R G, Hwang P Y C. Introduction to Random Signals and Applied Kalman Filtering. 3rd ed. New York: John Wiley & Sons, 1997
|
|
Brown R G, Hwang P Y C. Introduction to Random Signalsand Applied Kalman Filtering. 3rd ed. New York: John Wiley & Sons, 1997
|
6 |
Narendra K S, Parthasarathy K. Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks , 1990, 1(1): 4-27 doi: 10.1109/72.80202
|
|
Narendra K S, Parthasarathy K. Identification and controlof dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1990, 1(1): 4―27
doi: 10.1109/72.80202
|
7 |
Redner R A, Walker H F. Mixture densities, maximum likelihood, and the EM algorithm. SIAM Review , 1984, 26(2): 195-239 doi: 10.1137/1026034
|
|
Redner R A, Walker H F. Mixture densities, maximumlikelihood, and the EM algorithm. SIAMReview, 1984, 26(2): 195―239
doi: 10.1137/1026034
|
8 |
Xu L, Jordan M I. On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation , 1996, 8(1): 129-151
|
|
Xu L, Jordan M I. On convergence propertiesof the EM algorithm for Gaussian mixtures. Neural Computation, 1996, 8(1): 129―151
|
9 |
Anderson T W, Rubin H. Statistical inference in factor analysis. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability . 1956, 5: 111-150
|
|
Anderson T W, Rubin H. Statistical inference infactor analysis. In: Proceedings of theThird Berkeley Symposium on Mathematical Statistics and Probability. 1956, 5: 111―150
|
10 |
Rubi D, Thayer D. EM algorithm for ML factor analysis. Psychometrika , 1976, 57: 69-76
|
|
Rubi D, Thayer D. EM algorithm for ML factoranalysis. Psychometrika, 1976, 57: 69―76
|
11 |
Bozdogan H, Ramirez D E. FACAIC: Model selection algorithm for the orthogonal factor model using AIC and FACAIC. Psychometrika , 1988, 53(3): 407-415 doi: 10.1007/BF02294221
|
|
Bozdogan H, Ramirez D E. FACAIC: Model selection algorithmfor the orthogonal factor model using AIC and FACAIC. Psychometrika, 1988, 53(3): 407―415
doi: 10.1007/BF02294221
|
12 |
Burnham K P, Anderson D R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed. New York: Springer, 2002
|
|
Burnham K P, Anderson D R. Model Selection and MultimodelInference: A Practical Information-Theoretic Approach. 2nd ed. New York: Springer, 2002
|
13 |
Tikhonov A N, Arsenin V Y. Solutions of Ill-Posed Problems. Washington: Winston and Sons, 1977
|
|
Tikhonov A N, Arsenin V Y. Solutions of Ill-Posed Problems. Washington: Winston and Sons, 1977
|
14 |
Poggio T, Girosi F. Networks for approximation and learning. Proceedings of the IEEE , 1990, 78(9): 1481-1497 doi: 10.1109/5.58326
|
|
Poggio T, Girosi F. Networks for approximationand learning. Proceedings of the IEEE, 1990, 78(9): 1481―1497
doi: 10.1109/5.58326
|
15 |
Amari S I, Cichocki A, Yang H. A new learning algorithm for blind separation of sources. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advances in Neural Information Processing System 8 . Cambridge: MIT Press, 1996, 757-763
|
|
Amari S I, Cichocki A, Yang H. A new learning algorithm for blind separation of sources. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advances in Neural Information Processing System 8. Cambridge: MIT Press, 1996, 757―763
|
16 |
Bell A J, Sejnowski T J. An information-maximization approach to blind separation and blind deconvolution. Neural Computation , 1995, 7(6): 1129-1159 doi: 10.1162/neco.1995.7.6.1129
|
|
Bell A J, Sejnowski T J. An information-maximizationapproach to blind separation and blind deconvolution. Neural Computation, 1995, 7(6): 1129―1159
doi: 10.1162/neco.1995.7.6.1129
|
17 |
Xu L. Independent component analysis and extensions with noise and time: A Bayesian Ying-Yang learning perspective. Neural Information Processing — Letters and Reviews , 2003, 1(1): 1-52
|
|
Xu L. Independentcomponent analysis and extensions with noise and time: A BayesianYing-Yang learning perspective. NeuralInformation Processing — Letters and Reviews, 2003, 1(1): 1―52
|
18 |
Xu L. Independent subspaces. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence, Hershey(PA): IGI Global. 2008, 903-912
|
|
Xu L. Independentsubspaces. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence, Hershey(PA): IGI Global. 2008, 903―912
|
19 |
Xu L. Least mean square error reconstruction principle for self-organizing neural-nets. Neural Networks , 1993, 6(5): 627-648 doi: 10.1016/S0893-6080(05)80107-8
|
|
Xu L. Leastmean square error reconstruction principle for self-organizing neural-nets. Neural Networks, 1993, 6(5): 627―648
doi: 10.1016/S0893-6080(05)80107-8
|
20 |
McLachlan G J, Krishnan T. The EM Algorithms and Extensions. New York: John Wiley & Sons, 1997
|
|
McLachlan G J, Krishnan T. The EM Algorithms and Extensions. New York: John Wiley & Sons, 1997
|
21 |
Dempster A P, Laird N M, Rubin D B. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B , 1977, 39(1): 1-38
|
|
Dempster A P, Laird N M, Rubin D B. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: SeriesB, 1977, 39(1): 1―38
|
22 |
Amari S. Information geometry of the EM and EM algorithms for neural networks. Neural Networks , 1995, 8(9): 1379-1408 doi: 10.1016/0893-6080(95)00003-8
|
|
Amari S. Informationgeometry of the EM and EM algorithms for neural networks. Neural Networks, 1995, 8(9): 1379―1408
doi: 10.1016/0893-6080(95)00003-8
|
23 |
Grenander U, Miller M. Pattern theory: From representation to inference. Oxford: Oxford University Press, 2007
|
|
Grenander U, Miller M. Pattern theory: From representationto inference. Oxford: Oxford University Press, 2007
|
24 |
Mumford D. On the computational architecture of the neocortex II: The role of cortico-cortical loops. Biological Cybernetics , 1992, 66(3): 241-251 doi: 10.1007/BF00198477
|
|
Mumford D. Onthe computational architecture of the neocortex II: The role of cortico-corticalloops. Biological Cybernetics, 1992, 66(3): 241―251
doi: 10.1007/BF00198477
|
25 |
Friston K. A theory of cortical responses. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences , 2005, 360(1456): 815-836 doi: 10.1098/rstb.2005.1622
|
|
Friston K. Atheory of cortical responses. PhilosophicalTransactions of the Royal Society of London. Series B: BiologicalSciences, 2005, 360(1456): 815―836
doi: 10.1098/rstb.2005.1622
|
26 |
Yuille A L, Kersten D. Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences , 2006, 10(7): 301-308 doi: 10.1016/j.tics.2006.05.002
|
|
Yuille A L, Kersten D. Vision as Bayesian inference:Analysis by synthesis? Trends in CognitiveSciences, 2006, 10(7): 301―308
doi: 10.1016/j.tics.2006.05.002
|
27 |
Schwarz G. Estimating the dimension of a model. Annals of Statistics , 1978, 6(2): 461-464 doi: 10.1214/aos/1176344136
|
|
Schwarz G. Estimatingthe dimension of a model. Annals of Statistics, 1978, 6(2): 461―464
doi: 10.1214/aos/1176344136
|
28 |
Rissanen J. Modeling by shortest data description. Automatica , 1978, 14: 465-471 doi: 10.1016/0005-1098(78)90005-5
|
|
Rissanen J. Modelingby shortest data description. Automatica, 1978, 14: 465―471
doi: 10.1016/0005-1098(78)90005-5
|
29 |
Rissanen J. Information and Complexity in Statistical Modeling. New York: Springer, 2007
|
|
Rissanen J. Informationand Complexity in Statistical Modeling. New York: Springer, 2007
|
30 |
DeGroot M H. Optimal Statistical Decisions. Hooken: Wiley Classics Library, 2004
|
|
DeGroot M H. Optimal Statistical Decisions. Hooken: Wiley Classics Library, 2004
|
31 |
Mackay D J C. A practical Bayesian framework for backpropagation networks. Neural Computation , 1992, 4(3): 448-472 doi: 10.1162/neco.1992.4.3.448
|
|
Mackay D J C. A practical Bayesian framework for backpropagation networks. Neural Computation, 1992, 4(3): 448―472
doi: 10.1162/neco.1992.4.3.448
|
32 |
MacKay D. Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press, 2003
|
|
MacKay D. InformationTheory, Inference, and Learning Algorithms. Cambridge: Cambridge UniversityPress, 2003
|
33 |
Wallace C S, Boulton D M. An information measure for classification. Computer Journal , 1968, 11(2): 185-194
|
|
Wallace C S, Boulton D M. An information measure forclassification. Computer Journal, 1968, 11(2): 185―194
|
34 |
Wallace C S, Dowe D R. Minimum message length and Kolmogorov complexity. Computer Journal , 1999, 42(4): 270-280 doi: 10.1093/comjnl/42.4.270
|
|
Wallace C S, Dowe D R. Minimum message length andKolmogorov complexity. Computer Journal, 1999, 42(4): 270―280
doi: 10.1093/comjnl/42.4.270
|
35 |
Bourlard H, Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics , 1988, 59: 291-294 doi: 10.1007/BF00332918
|
|
Bourlard H, Kamp Y. Auto-association by multilayerperceptrons and singular value decomposition. Biological Cybernetics, 1988, 59: 291―294
doi: 10.1007/BF00332918
|
36 |
Palmieri F, Zhu J, Chang C. Anti-Hebbian learning in topologically constrained linear networks: A tutorial. IEEE Transactions on Neural Networks , 1993, 4(5): 748-761 doi: 10.1109/72.248453
|
|
Palmieri F, Zhu J, Chang C. Anti-Hebbian learning in topologically constrained linearnetworks: A tutorial. IEEE Transactionson Neural Networks, 1993, 4(5): 748―761
doi: 10.1109/72.248453
|
37 |
Grossberg S, Carpenter G A. Adaptive resonance theory. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks . 2nd ed. Cambridge: MIT Press, 2002, 87-90
|
|
Grossberg S, Carpenter G A. Adaptive resonance theory. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks. 2nd ed. Cambridge: MIT Press, 2002, 87―90
|
38 |
Carpenter G A, Grossberg S. A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing , 1987, 37: 54-115 doi: 10.1016/S0734-189X(87)80014-2
|
|
Carpenter G A, Grossberg S. A massively parallel architecturefor a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 1987, 37: 54―115
doi: 10.1016/S0734-189X(87)80014-2
|
39 |
Kawato M. Cerebellum and motor control. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks . 2nd ed. Cambridge: MIT Press, 2002, 190-195
|
|
Kawato M. Cerebellumand motor control. In: Arbib M A, ed. The Handbook of Brain Theoryand Neural Networks. 2nd ed. Cambridge: MIT Press, 2002, 190―195
|
40 |
Shidara M, Kawano K, Gomi H, Kawato M. Inversedynamics model eye movement control by Purkinje cells in the cerebellum. Nature , 1993, 365(6441): 50-52 doi: 10.1038/365050a0
|
|
Shidara M, Kawano K, Gomi H, Kawato M. Inversedynamicsmodel eye movement control by Purkinje cells in the cerebellum. Nature, 1993, 365(6441): 50―52
doi: 10.1038/365050a0
|
41 |
Wolpert D, Kawato M. Multiple paired forward and inverse models for motor control. Neural Networks , 1998, 11(7-8): 1317-1329 doi: 10.1016/S0893-6080(98)00066-5
|
|
Wolpert D, Kawato M. Multiple paired forward andinverse models for motor control. NeuralNetworks, 1998, 11(7―8): 1317―1329
doi: 10.1016/S0893-6080(98)00066-5
|
42 |
Hinton G E, Dayan P, Frey B J, Neal R N. The wake-sleep algorithm for unsupervised learning neural networks. Science , 1995, 268(5214): 1158-1160 doi: 10.1126/science.7761831
|
|
Hinton G E, Dayan P, Frey B J, Neal R N. The wake-sleepalgorithm for unsupervised learning neural networks. Science, 1995, 268(5214): 1158―1160
doi: 10.1126/science.7761831
|
43 |
Dayan P, Hinton G E, Neal R M, Zemel R S. The Helmholtz machine. Neural Computation , 1995, 7(5): 889-904 doi: 10.1162/neco.1995.7.5.889
|
|
Dayan P, Hinton G E, Neal R M, Zemel R S. The Helmholtzmachine. Neural Computation, 1995, 7(5): 889―904
doi: 10.1162/neco.1995.7.5.889
|
44 |
Jaakkola T S. Tutorial on variational approximation methods. In: Opper M, Saad D, eds. Advanced Mean Field Methods: Theory and Practice . Cambridge: MIT press, 2001, 129-160
|
|
Jaakkola T S. Tutorial on variational approximation methods. In: Opper M, Saad D, eds. Advanced Mean Field Methods: Theory and Practice. Cambridge: MIT press, 2001, 129―160
|
45 |
Jordan M, Ghahramani Z, Jaakkola T, Saul L. Introduction to variational methods for graphical models. Machine Learning , 1999, 37(2): 183-233 doi: 10.1023/A:1007665907178
|
|
Jordan M, Ghahramani Z, Jaakkola T, Saul L. Introductionto variational methods for graphical models. Machine Learning, 1999, 37(2): 183―233
doi: 10.1023/A:1007665907178
|
46 |
Corduneanu A, Bishop CM. Variational Bayesian model selection for mixture distributions. In: Jaakkola T, Richardson T, eds. Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics . 2001, 27-34
|
|
Corduneanu A, Bishop CM. Variational Bayesian modelselection for mixture distributions. In: Jaakkola T, Richardson T, eds. Proceedings of the Eighth International Conference on ArtificialIntelligence and Statistics. 2001, 27―34
|
47 |
Xu L. Bayesian-Kullback coupled YING-YANG machines: Unified learning and new results on vector quantization. In: Proceedings of the International Conference on Neural Information Processing . 1995, 977-988 (A further version in NIPS8. In: Touretzky D S, et al. eds. Cambridge: MIT Press, 444-450)
|
|
Xu L. Bayesian-Kullbackcoupled YING-YANG machines: Unified learning and new results on vectorquantization. In: Proceedings of the InternationalConference on Neural Information Processing. 1995, 977―988 (A further version in NIPS8. In: Touretzky D S, et al. eds. Cambridge:MIT Press, 444―450)
|
48 |
Xu L. Ying-Yang learning. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks . 2nd ed. Cambridge: MIT Press, 2002, 1231-1237
|
|
Xu L. Ying-Yanglearning. In: Arbib M A, ed. The Handbook of Brain Theory and Neural Networks. 2nd ed. Cambridge: MIT Press, 2002, 1231―1237
|
49 |
Xu L. Advances on BYY harmony learning: Information theoretic perspective, generalized projection geometry, and independent factor auto-determination. IEEE Transactions on Neural Networks , 2004, 15(4): 885-902 doi: 10.1109/TNN.2004.828767
|
|
Xu L. Advanceson BYY harmony learning: Information theoretic perspective, generalizedprojection geometry, and independent factor auto-determination. IEEE Transactions on Neural Networks, 2004, 15(4): 885―902
doi: 10.1109/TNN.2004.828767
|
50 |
Xu L. Learning algorithms for RBF functions and subspace based functions. In: Olivas E, . eds. Handbook of Research on Machine Learning, Applications and Trends: Algorithms, Methods and Techniques. Hershey(PA): IGI Global, 2009, 60-94
|
|
Xu L. Learningalgorithms for RBF functions and subspace based functions. In: Olivas E, et al. eds. Handbook of Research on Machine Learning, Applications and Trends:Algorithms, Methods and Techniques. Hershey(PA): IGI Global, 2009, 60―94
|
51 |
Xu L. Bayesian Ying Yang system, best harmony learning, and Gaussian manifold based family. In: Zurada. eds. Computational Intelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures. Lecture Notes in Computer Science , 2008, 5050: 48-78
|
|
Xu L. BayesianYing Yang system, best harmony learning, and Gaussian manifold basedfamily. In: Zuradaet al. eds. ComputationalIntelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures.Lecture Notes in Computer Science, 2008, 5050: 48―78
|
52 |
Xu L, Oja E. Randomized Hough transform. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 1354-1361
|
|
Xu L, Oja E. Randomized Hough transform. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 1354―1361
|
53 |
Veith I. The Yellow Emperor’s Classic of Internal Medicine. Berkeley: University of California Press, 1972
|
|
Veith I. TheYellow Emperor’s Classic of Internal Medicine. Berkeley: University of CaliforniaPress, 1972
|
54 |
Vapnik, V. Estimation of Dependences Based on Empirical Data. Springer, 2006
|
|
Vapnik, V. Estimationof Dependences Based on Empirical Data. Springer, 2006
|
55 |
Stone M. Cross-validation: A review. Mathematics, Operations and Statistics , 1978, 9(1): 127-140
|
|
Stone M. Cross-validation:A review. Mathematics, Operations and Statistics, 1978, 9(1): 127―140
|
56 |
Rivals I, Personnaz L. On cross validation for model selection. Neural Computation , 1999, 11(4): 863-870 doi: 10.1162/089976699300016476
|
|
Rivals I, Personnaz L. On cross validation for modelselection. Neural Computation, 1999, 11(4): 863―870
doi: 10.1162/089976699300016476
|
57 |
Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control , 1974, 19(6): 714-723 doi: 10.1109/TAC.1974.1100705
|
|
Akaike H. Anew look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19(6): 714―723
doi: 10.1109/TAC.1974.1100705
|
58 |
Bozdogan H. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extension. Psychometrika , 1987, 52(3): 345-370 doi: 10.1007/BF02294361
|
|
Bozdogan H. Modelselection and Akaike’s information criterion (AIC): The generaltheory and its analytical extension. Psychometrika, 1987, 52(3): 345―370
doi: 10.1007/BF02294361
|
59 |
Cavanaugh J E. Unifying the derivations for the Akaike and corrected Akaike information criteria. Statistics & Probability Letters , 1997, 33(2): 201-208 doi: 10.1016/S0167-7152(96)00128-9
|
|
Cavanaugh J E. Unifying the derivations for the Akaike and corrected Akaike informationcriteria. Statistics & ProbabilityLetters, 1997, 33(2): 201―208
doi: 10.1016/S0167-7152(96)00128-9
|
60 |
Williams P M. Bayesian regularization and pruning using a Laplace prior. Neural Computation , 1995, 7(1): 117-143 doi: 10.1162/neco.1995.7.1.117
|
|
Williams P M. Bayesian regularization and pruning using a Laplace prior. Neural Computation, 1995, 7(1): 117―143
doi: 10.1162/neco.1995.7.1.117
|
61 |
Tibshirani R, Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B , 1996, 58(1): 267-288
|
|
Tibshirani R, Regressionshrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 1996, 58(1): 267―288
|
62 |
MacKay D J C. Bayesian interpolation. Neural Computation , 1992, 4(3): 415-447 doi: 10.1162/neco.1992.4.3.415
|
|
MacKay D J C. Bayesian interpolation. Neural Computation, 1992, 4(3): 415―447
doi: 10.1162/neco.1992.4.3.415
|
63 |
Salah A A, Alpaydin E. Incremental mixtures of factor analyzers. In: Proceedings the 17th International Conference on Pattern Recognition . 2004, 1: 276-279
|
|
Salah A A, Alpaydin E. Incremental mixtures of factor analyzers.In: Proceedings the 17th International Conferenceon Pattern Recognition. 2004, 1: 276―279
|
64 |
Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis, RBF net and curve detection. IEEE Transactions on Neural Networks , 1993, 4(4): 636-649 doi: 10.1109/72.238318
|
|
Xu L, Krzyzak A, Oja E. Rival penalized competitive learning for clustering analysis,RBF net and curve detection. IEEE Transactionson Neural Networks, 1993, 4(4): 636―649
doi: 10.1109/72.238318
|
65 |
Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rival penalized competitive learning. In: Proceedings of the 11th International Conference on Pattern Recognition . 1992, I: 672-675
|
|
Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rivalpenalized competitive learning. In: Proceedingsof the 11th International Conference on Pattern Recognition. 1992, I: 672―675
|
66 |
Xu L. Rival penalized competitive learning. Scholarpedia , 2007, 2(8): 1810http://www.scholarpedia.org/article/Rival_penalized_competitive_learning
|
|
www.scholarpedia.org/article/Rival_penalized_competitive_learning
|
67 |
Corduneanu A, Bishop C M. Variational Bayesian model selection for mixture distributions. In: Richardson T, Jaakkola T, eds. Proceedings of the Eighth International Conference on Artificial Intelligence and Statistics . 2001, 27-34
|
|
Corduneanu A, Bishop C M. Variational Bayesian modelselection for mixture distributions. In: Richardson T, Jaakkola T, eds. Proceedings of the Eighth International Conference on ArtificialIntelligence and Statistics. 2001, 27―34
|
68 |
McGrory C A, Titterington D M. Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis , 2007, 51(11): 5352-5367 doi: 10.1016/j.csda.2006.07.020
|
|
McGrory C A, Titterington D M. Variational approximationsin Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis, 2007, 51(11): 5352―5367
doi: 10.1016/j.csda.2006.07.020
|
69 |
Tu S, Xu L. A study of several model selection criteria for determining the number of signals. In: Proceedings of 2010 IEEE International Conference on Acoustics, Speech and Signal Processing . 2010, 1966-1969
|
|
Tu S, Xu L. A study of several modelselection criteria for determining the number of signals. In: Proceedings of 2010 IEEE International Conferenceon Acoustics, Speech and Signal Processing. 2010, 1966―1969
|
70 |
Xu L. Fundamentals, challenges, and advances of statistical learning for knowledge discovery and problem solving: A BYY harmony perspective, keynote talk. In: Proceedings of the International Conference on Neural Networks and Brain . 2005, 1: 24-55
|
|
Xu L. Fundamentals,challenges, and advances of statistical learning for knowledge discoveryand problem solving: A BYY harmony perspective, keynote talk. In: Proceedings of the International Conferenceon Neural Networks and Brain. 2005, 1: 24―55
|
71 |
Hinton G E, Zemel R S. Autoencoders, minimum description length and Helmholtz free energy. In: Cowan J D, Tesauro G, Alspector J, eds. Advances in Neural Information Processing Systems 6 . San Mateo: Morgan Kaufmann, 1994, 449-455
|
|
Hinton G E, Zemel R S. Autoencoders, minimum descriptionlength and Helmholtz free energy. In: Cowan J D, Tesauro G, Alspector J, eds. Advances in Neural Information Processing Systems 6. San Mateo: Morgan Kaufmann, 1994, 449―455
|
72 |
Xu L. Data smoothing regularization, multi-sets-learning, and problem solving strategies. Neural Networks , 2003, 16(5-6): 817-825 doi: 10.1016/S0893-6080(03)00119-9
|
|
Xu L. Datasmoothing regularization, multi-sets-learning, and problem solvingstrategies. Neural Networks, 2003, 16(5―6): 817―825
doi: 10.1016/S0893-6080(03)00119-9
|
73 |
Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (I) Unsupervised and semi-unsupervised learning. In: Amari S, Kassabov N, eds. Brain-like Computing and Intelligent Information Systems. Springer-Verlag , 1997, 241-274
|
|
Xu L. BayesianYing Yang system and theory as a unified statistical learning approach:(I) Unsupervised and semi-unsupervised learning. In: Amari S, Kassabov N, eds. Brain-like Computing and Intelligent Information Systems. Springer-Verlag, 1997, 241―274
|
74 |
Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach: (II) From unsupervised learning to supervised learning and temporal modeling and (III) Models and algorithms for dependence reduction, data dimension reduction, ICA and supervised learning. In: Wong K M, King I, Yeung D Y, eds. Proceedings of Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective . 1997: 25-60
|
|
Xu L. BayesianYing Yang system and theory as a unified statistical learning approach:(II) From unsupervised learning to supervised learning and temporalmodeling and (III) Models and algorithms for dependence reduction,data dimension reduction, ICA and supervised learning.In: Wong K M, King I, Yeung D Y, eds. Proceedings of Theoretical Aspects of Neural Computation: A MultidisciplinaryPerspective. 1997: 25―60
|
75 |
Xu L. Bayesian Ying Yang system and theory as a unified statistical learning approach (VII): Data smoothing. In: Proceedings of the International Conference on Neural Information Processing . 1998, 1: 243-248
|
|
Xu L. BayesianYing Yang system and theory as a unified statistical learning approach(VII): Data smoothing. In: Proceedingsof the International Conference on Neural Information Processing. 1998, 1: 243―248
|
76 |
Bishop C M. Training with noise is equivalent to Tikhonov regularization. Neural Computation , 1995, 7(1): 108-116 doi: 10.1162/neco.1995.7.1.108
|
|
Bishop C M. Training with noise is equivalent to Tikhonov regularization. Neural Computation, 1995, 7(1): 108―116
doi: 10.1162/neco.1995.7.1.108
|
77 |
Xu L. A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving. Pattern Recognition , 2007, 40(8): 2129-2153 doi: 10.1016/j.patcog.2006.12.016
|
|
Xu L. Aunified perspective and new results on RHT computing, mixture basedlearning, and multi-learner based problem solving. Pattern Recognition, 2007, 40(8): 2129―2153
doi: 10.1016/j.patcog.2006.12.016
|
78 |
Xu L, Oja E, Kultanen P. A new curve detection method randomized Hough transform (RHT). Pattern Recognition Letters , 1990, 11(5): 331-338 doi: 10.1016/0167-8655(90)90042-Z
|
|
Xu L, Oja E, Kultanen P. A new curve detection method randomized Hough transform(RHT). Pattern Recognition Letters, 1990, 11(5): 331―338
doi: 10.1016/0167-8655(90)90042-Z
|
79 |
Hough P V C. Method and means for recognizing complex patterns. US Patent, 3069654, 1962-12-18
|
|
Hough P V C. Method and means for recognizing complex patterns. US Patent, 3069654, 1962-12-18
|
80 |
Xu L. Best harmony, unified RPCL and automated model selection for unsupervised and supervised learning on Gaussian mixtures, ME-RBF models and three-layer nets. International Journal of Neural Systems , 2001, 11(1): 3-69 doi: 10.1016/S0129-0657(01)00049-7
|
|
Xu L. Bestharmony, unified RPCL and automated model selection for unsupervisedand supervised learning on Gaussian mixtures, ME-RBF models and three-layernets. International Journal of Neural Systems, 2001, 11(1): 3―69
doi: 10.1016/S0129-0657(01)00049-7
|
81 |
Xu L. Bayesian Ying-Yang learning theory for data dimension reduction and determination. Journal of Computational Intelligence in Finance , 1998, 6(5): 6-18
|
|
Xu L. BayesianYing-Yang learning theory for data dimension reduction and determination. Journal of Computational Intelligence in Finance, 1998, 6(5): 6―18
|
82 |
Tu S, Xu L. Theoretical analysis and comparison of several criteria on linear model dimension reduction. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and Signal Separation. Lecture Notes in Computer Science , 2009, 5441: 154-162
|
|
Tu S, Xu L. Theoretical analysis andcomparison of several criteria on linear model dimension reduction. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and SignalSeparation. Lecture Notes in Computer Science, 2009, 5441: 154―162
|
83 |
Xu L. BYY harmony learning, independent state space and generalized APT financial analyses. IEEE Transactions on Neural Networks , 2001, 12(4): 822-849 doi: 10.1109/72.935094
|
|
Xu L. BYYharmony learning, independent state space and generalized APT financialanalyses. IEEE Transactions on Neural Networks, 2001, 12(4): 822―849
doi: 10.1109/72.935094
|
84 |
Xu L. Temporal BYY encoding, Markovian state spaces, and space dimension determination. IEEE Transactions on Neural Networks , 2004, 15(5): 1276-1295 doi: 10.1109/TNN.2004.833302
|
|
Xu L. TemporalBYY encoding, Markovian state spaces, and space dimension determination. IEEE Transactions on Neural Networks, 2004, 15(5): 1276―1295
doi: 10.1109/TNN.2004.833302
|
85 |
Kalman R E. A new approach to linear filtering and prediction problems. Transactions of the ASME Journal of Basic Engineering , 1960, 35-45
|
|
Kalman R E. A new approach to linear filtering and prediction problems. Transactions of the ASME Journal of Basic Engineering, 1960, 35―45
|
86 |
Sun K, Tu S, Gao D Y, Xu L. Canonical dual approach to binary factor analysis. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and Signal Separation. Lecture Notes in Computer Science , 2009, 5441: 346-353
|
|
Sun K, Tu S, Gao D Y, Xu L. Canonical dualapproach to binary factor analysis. In: Adali T, Jutten C, Romano J M T, Barros A K, eds. Independent Component Analysis and SignalSeparation. Lecture Notes in Computer Science, 2009, 5441: 346―353
|
87 |
Nathan S. Science and medicine in imperial China — The state of the field. The Journal of Asian Studies , 1988, 47(1): 41-90 doi: 10.2307/2056359
|
|
Nathan S. Scienceand medicine in imperial China — The state of the field. The Journal of Asian Studies, 1988, 47(1): 41―90
doi: 10.2307/2056359
|
88 |
Wilhelm R, Baynes C. The I Ching or Book of Changes, with Foreword by Carl Jung. 3rd ed. Bollingen Series XIX . Princeton: Princeton University Press, 1967
|
|
Wilhelm R, Baynes C. The I Ching or Book of Changes,with Foreword by Carl Jung. 3rd ed. Bollingen Series XIX. Princeton: Princeton UniversityPress, 1967
|
89 |
Hansen C. A Daoist Theory of Chinese Thought: A Philosophical Interpretation. New York: Oxford University Press, 2000
|
|
Hansen C. ADaoist Theory of Chinese Thought: A Philosophical Interpretation. New York: Oxford University Press, 2000
|
90 |
Shilov G E, Gurevich B L. Integral, Measure, and Derivative: A Unified Approach. Silverman R trans. New York: Dover Publications , 1978
|
|
Shilov G E, Gurevich B L. Integral, Measure, and Derivative:A Unified Approach. Silverman R trans. New York: Dover Publications, 1978
|
91 |
Ali S M, Silvey S D. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society: Series B , 1966, 28(1): 131-140
|
|
Ali S M, Silvey S D. A general class of coefficientsof divergence of one distribution from another. Journal of the Royal Statistical Society: Series B, 1966, 28(1): 131―140
|
92 |
Kullback S, Leibler R A. On information and sufficiency. Annals of Mathematical Statistics , 1951, 22(1): 79-86 doi: 10.1214/aoms/1177729694
|
|
Kullback S, Leibler R A. On information and sufficiency. Annals of Mathematical Statistics, 1951, 22(1): 79―86
doi: 10.1214/aoms/1177729694
|
93 |
Shore J. Minimum cross-entropy spectral analysis. IEEE Transactions on Acoustics, Speech and Signal Process , 1981, 29(2): 230-237 doi: 10.1109/TASSP.1981.1163539
|
|
Shore J. Minimumcross-entropy spectral analysis. IEEE Transactionson Acoustics, Speech and Signal Process, 1981, 29(2): 230―237
doi: 10.1109/TASSP.1981.1163539
|
94 |
Burg J P, Luenberger D G, Wenger D L. Estimation of structured covariance matrices. Proceedings of the IEEE , 1982, 70(9): 963-974 doi: 10.1109/PROC.1982.12427
|
|
Burg J P, Luenberger D G, Wenger D L. Estimation of structured covariance matrices. Proceedings of the IEEE, 1982, 70(9): 963―974
doi: 10.1109/PROC.1982.12427
|
95 |
Jaynes E T. Information theory and statistical mechanics. Physical Review , 1957, 106(4): 620-630 doi: 10.1103/PhysRev.106.620
|
|
Jaynes E T. Information theory and statistical mechanics. Physical Review, 1957, 106(4): 620―630
doi: 10.1103/PhysRev.106.620
|
96 |
Xu L. Temporal BYY learning for state space approach, hidden Markov model and blind source separation. IEEE Transactions on Signal Processing , 2000, 48(7): 2132-2144 doi: 10.1109/78.847796
|
|
Xu L. TemporalBYY learning for state space approach, hidden Markov model and blindsource separation. IEEE Transactions onSignal Processing, 2000, 48(7): 2132―2144
doi: 10.1109/78.847796
|
97 |
Jeffreys H. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences , 1946, 186(1007): 453-461 doi: 10.1098/rspa.1946.0056
|
|
Jeffreys H. Aninvariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. SeriesA: Mathematical and Physical Sciences, 1946, 186(1007): 453―461
doi: 10.1098/rspa.1946.0056
|
98 |
Xu L. BYY learning system and theory for parameter estimation, data smoothing based regularization and model selection. Neural, Parallel and Scientific Computations , 2000, 8(1): 55-82
|
|
Xu L. BYYlearning system and theory for parameter estimation, data smoothingbased regularization and model selection. Neural, Parallel and Scientific Computations, 2000, 8(1): 55―82
|
99 |
Xu L. BYY Σ-П factor systems and harmony learning. Invited talk. In: Proceedings of International Conference on Neural Information Processing (ICONIP’2000) . 2000, 1: 548-558
|
|
Xu L. BYYΣ-П factor systems and harmony learning. Invited talk. In: Proceedings of International Conference onNeural Information Processing (ICONIP’2000). 2000, 1: 548―558
|
100 |
Xu L. Bayesian Ying Yang learning. In: Zhong N, Liu J, eds. Intelligent Technologies for Information Analysis . Berlin: Springer, 2004, 615-706
|
|
Xu L. BayesianYing Yang learning. In: Zhong N, Liu J, eds. Intelligent Technologiesfor Information Analysis. Berlin: Springer, 2004, 615―706
|
101 |
Barron A, Rissanen J, Yu B. The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory , 1998, 44(6): 2743-2760 doi: 10.1109/18.720554
|
|
Barron A, Rissanen J, Yu B. The minimum description length principle in coding andmodeling. IEEE Transactions on InformationTheory, 1998, 44(6): 2743―2760
doi: 10.1109/18.720554
|
102 |
Xu L, Amari S. Combining classifiers and learning mixtureof-experts. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 318-326
|
|
Xu L, Amari S. Combining classifiers andlearning mixtureof-experts. In: Ram?n J, Dopico R, Dorado J, Pazos A, eds. Encyclopedia of Artificial Intelligence. Hershey(PA): IGI Global, 2008, 318―326
|
103 |
Xu L. BYY learning, regularized implementation, and model selection on modular networks with one hidden layer of binary units. Neurocomputing , 2003, 51: 277-301 (Errata on Neurocomputing, 2003, 55(1-2): 405-406)
|
|
Xu L. BYYlearning, regularized implementation, and model selection on modularnetworks with one hidden layer of binary units. Neurocomputing, 2003, 51: 277―301 (Errata on Neurocomputing, 2003, 55(1―2): 405―406)
|
104 |
Gales M J F, Young S. The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing , 2008, 1(3): 195-304 doi: 10.1561/2000000004
|
|
Gales M J F, Young S. The application of hiddenMarkov models in speech recognition. Foundationsand Trends in Signal Processing, 2008, 1(3): 195―304
doi: 10.1561/2000000004
|
105 |
Su D, Wu X H, Xu L. GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection. In: Proceedings of 2010 IEEE International Conference on Acoustics, Speech and Signal Processing . 2010, 4890-4893
|
|
Su D, Wu X H, Xu L. GMM-HMM acoustic model training by a two level procedurewith Gaussian components determined by automatic model selection. In: Proceedings of 2010 IEEE International Conferenceon Acoustics, Speech and Signal Processing. 2010, 4890―4893
|
106 |
Rosti A V, Gales M. Factor analysed hidden Markov models for speech recognition. Computer Speech and Language , 2004, 18(2): 181-200 doi: 10.1016/j.csl.2003.09.004
|
|
Rosti A V, Gales M. Factor analysed hidden Markovmodels for speech recognition. ComputerSpeech and Language, 2004, 18(2): 181―200
doi: 10.1016/j.csl.2003.09.004
|
107 |
Gales M J F. Discriminative models for speech recognition. In: Proceedings of Information Theory and Applications Workshop . 2007, 170-176
|
|
Gales M J F. Discriminative models for speech recognition. In: Proceedings of Information Theory and Applications Workshop. 2007, 170―176
|
108 |
Woodland P C, Povey D. Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech and Language , 2002, 16(1): 25-47 doi: 10.1006/csla.2001.0182
|
|
Woodland P C, Povey D. Large scale discriminativetraining of hidden Markov models for speech recognition. Computer Speech and Language, 2002, 16(1): 25―47
doi: 10.1006/csla.2001.0182
|
109 |
Csiszár I, Tusnády G. Information geometry and alternating minimization procedures. Statistics and Decisions , 1984, (Suppl. 1): 205-237
|
|
Csiszár I, Tusnády G. Information geometry andalternating minimization procedures. Statisticsand Decisions, 1984, (Suppl. 1): 205―237
|
110 |
Xu L, Oja E, Suen C Y. Modified Hebbian learning for curve and surface fitting. Neural Networks , 1992, 5(3): 441-457 doi: 10.1016/0893-6080(92)90006-5
|
|
Xu L, Oja E, Suen C Y. Modified Hebbian learning for curve and surface fitting. Neural Networks, 1992, 5(3): 441―457
doi: 10.1016/0893-6080(92)90006-5
|
111 |
Xu L, Krzyzak A, Oja E. A neural net for dual subspace pattern recognition methods. International Journal of Neural Systems , 1991, 2(3): 169-184 doi: 10.1142/S0129065791000169
|
|
Xu L, Krzyzak A, Oja E. A neural net for dual subspace pattern recognition methods. International Journal of Neural Systems, 1991, 2(3): 169―184
doi: 10.1142/S0129065791000169
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|