Please wait a minute...
Frontiers of Electrical and Electronic Engineering

ISSN 2095-2732

ISSN 2095-2740(Online)

CN 10-1028/TM

Front Elect Electr Eng Chin    2011, Vol. 6 Issue (2) : 215-244    https://doi.org/10.1007/s11460-011-0153-z
RESEARCH ARTICLE
Learning Gaussian mixture with automatic model selection: A comparative study on three Bayesian related approaches
Lei SHI, Shikui TU, Lei XU()
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
 Download: PDF(1248 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Three Bayesian related approaches, namely, variational Bayesian (VB), minimum message length (MML) and Bayesian Ying-Yang (BYY) harmony learning, have been applied to automatically determining an appropriate number of components during learning Gaussian mixture model (GMM). This paper aims to provide a comparative investigation on these approaches with not only a Jeffreys prior but also a conjugate Dirichlet-Normal-Wishart (DNW) prior on GMM. In addition to adopting the existing algorithms either directly or with some modifications, the algorithm for VB with Jeffreys prior and the algorithm for BYY with DNW prior are developed in this paper to fill the missing gap. The performances of automatic model selection are evaluated through extensive experiments, with several empirical findings: 1) Considering priors merely on the mixing weights, each of three approaches makes biased mistakes, while considering priors on all the parameters of GMM makes each approach reduce its bias and also improve its performance. 2) As Jeffreys prior is replaced by the DNW prior, all the three approaches improve their performances. Moreover, Jeffreys prior makes MML slightly better than VB, while the DNW prior makes VB better than MML. 3) As the hyperparameters of DNW prior are further optimized by each of its own learning principle, BYY improves its performances while VB and MML deteriorate their performances when there are too many free hyper-parameters. Actually, VB and MML lack a good guide for optimizing the hyper-parameters of DNW prior. 4) BYY considerably outperforms both VB and MML for any type of priors and whether hyper-parameters are optimized. Being different from VB and MML that rely on appropriate priors to perform model selection, BYY does not highly depend on the type of priors. It has model selection ability even without priors and performs already very well with Jeffreys prior, and incrementally improves as Jeffreys prior is replaced by the DNW prior. Finally, all algorithms are applied on the Berkeley segmentation database of real world images. Again, BYY considerably outperforms both VB and MML, especially in detecting the objects of interest from a confusing background.

Keywords Bayesian Ying-Yang (BYY) harmony learning      variational Bayesian (VB)      minimum message length (MML)      empirical comparison      Gaussian mixture model (GMM)      automatic model selection      Jeffreys prior      Dirichlet      joint Normal-Wishart (NW)      conjugate distributions      marginalized student’s T-distribution     
Corresponding Author(s): XU Lei,Email:lxu@cse.cuhk.edu.hk   
Issue Date: 05 June 2011
 Cite this article:   
Lei SHI,Shikui TU,Lei XU. Learning Gaussian mixture with automatic model selection: A comparative study on three Bayesian related approaches[J]. Front Elect Electr Eng Chin, 2011, 6(2): 215-244.
 URL:  
https://academic.hep.com.cn/fee/EN/10.1007/s11460-011-0153-z
https://academic.hep.com.cn/fee/EN/Y2011/V6/I2/215
1 Constantinopoulos C, Titsias M K. Bayesian feature and model selection for Gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2006, 28(6): 1013-1018
doi: 10.1109/TPAMI.2006.111
2 Redner R, Walker H. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review , 1984, 26(2): 195-239
doi: 10.1137/1026034
3 Engel A, den Broeck C P L V. Statistical Mechanics of Learning. New York: Cambridge University Press, 2001
4 Constantinopoulos C, Likas A. Unsupervised learning of Gaussian mixtures based on variational component splitting. IEEE Transactions on Neural Networks , 2007, 18(3): 745-755
doi: 10.1109/TNN.2006.891114
5 Verbeek J J, Vlassis N, Krose B. Efficient greedy learning of Gaussian mixture models. Neural Computation , 2003, 15(2): 469-485
doi: 10.1162/089976603762553004
6 Xu L, Jordan M I. On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation , 1996, 8(1):129-151
doi: 10.1162/neco.1996.8.1.129
7 Mclachlan G J, Krishnan T. The EM Algorithm and Extensions (Wiley Series in Probability and Statistics). New York: Wiley-Interscience, 2007
8 Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control , 1974, 19(6): 716-723
doi: 10.1109/TAC.1974.1100705
9 Schwarz G. Estimating the dimension of a model. The Annals of Statistics , 1978, 6(2): 461-464
doi: 10.1214/aos/1176344136
10 Barron A R, Rissanen J, Yu B. The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory , 1998, 44(6): 2743-2760
doi: 10.1109/18.720554
11 Rissanen J. Modelling by the shortest data description. Automatica , 1978, 14(5): 465-471
doi: 10.1016/0005-1098(78)90005-5
12 Xu L. Bayesian Ying-Yang system, best harmony learning, and five action circling. Frontiers of Electrical and Electronic Enginnering in China , 2010, 5(3): 281-328
doi: 10.1007/s11460-010-0108-9
13 Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rival penalized competitive learning. In: Proceedings of the 11th International Conference on Pattern Recognition . 1992, I: 672-675
14 Xu L, Krzyzak A, Oja E. Rival penelized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Transactions on Neural Networks , 1993, 4(4): 636-649
doi: 10.1109/72.238318
15 Xu L. Rival penalized competitive learning, finite mixture, and multisets clustering. In: Proceedings of IEEE Intentional Joint Conference on Neural Networks . 1998, 2: 2525-2530
16 Xu L. Bayesian-Kullback coupled Ying-Yang machines: unified learnings and new results on vector quantization. In: Proceedings of International Conference on Neural Information Processing . 1995, 977-988
17 Xu L. Bayesian Ying Yang learning. Scholarpedia , 2007, 2(3): 1809
doi: 10.4249/scholarpedia.1809
18 Figueiredo M A F, Jain A K. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2002, 24(3): 381-396
doi: 10.1109/34.990138
19 Neal R, Hinton G E. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models. Norwell: Kluwer Academic Publishers , 1998, 355-368
20 Corduneanu A, Bishop C M. Variational Bayesian model selection for mixture distributions. In: Proceedings of the 8th International Conference on Artificial Intelligence and Statistics . 2001, 27-34
21 Jaakkola T, Jordan M. Bayesian parameter estimation via variational methods. Statistics and Computing , 2000, 10(1): 25-37
doi: 10.1023/A:1008932416310
22 Wallace C, Boulton D. An information measure for classification. The Computer Journal , 1968, 11(2): 185-194
23 Wallace C S, Dowe D L. Minimum message length and Kolmogorov complexity. The Computer Journal , 1999, 42(4): 270-283
doi: 10.1093/comjnl/42.4.270
24 Attias H. A variational Bayesian framework for graphical models. Advances in Neural Information Processing Systems , 2000, 12: 209-215
25 Gelman A, Carlin J B, Stern H S, Rubin D B. Bayesian Data Analysis. 2nd ed. Texts in Statistical Science . Boca Raton: Chapman & Hall/CRC, 2003
26 Xu L. Machine learning problems from optimization perspective. Journal of Global Optimization , 2010, 47(3): 369-401
doi: 10.1007/s10898-008-9364-0
27 Unnikrishnan R, Pantofaru C, Hebert M. Toward objective evaluation of image segmentation algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2007, 29(6): 929-944
doi: 10.1109/TPAMI.2007.1046
28 Xu L. Bayesian Ying Yang system, best harmony learning and Gaussian manifold based family. In: Zurada J, Yen G, Wang J, eds. Computational Intelligence: Research Frontiers . Berlin-Heidelberg: Springer-Verlag, 2008, 5050: 48-78 ,
29 Xu L. Multisets modeling learning: a unified theory for supervised and unsupervised learning. In: Proceedings of IEEE Intentional Joint Conference on Neural Networks . 1994, I: 315-320
30 Xu L. A unified learning framework: multisets modeling learning. In: Proceedings of World Congress on Neural Networks . 1995, 1: 35-42
31 Xu L. BYY harmony learning, structural RPCL, and topological self-organizing on unsupervised and supervised mixture models. Neural Networks , 2002, 15(8-9): 1125-1151
doi: 10.1016/S0893-6080(02)00084-9
32 Xu L. Data smoothing regularization, multi-sets-learning, and problem solving strategies. Neural Networks , 2003, 16(5-6): 817-825
doi: 10.1016/S0893-6080(03)00119-9
33 Xu L. A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving. Pattern Recognition , 2007, 40(8): 2129-2153
doi: 10.1016/j.patcog.2006.12.016
34 Xu L. Learning algorithms for RBF functions and subspace based functions. Handbook of Research on Machine Learning, Applications and Trends: Algorithms, Methods and Techniques . Hershey: IGI Global, 2009, 60-94
doi: 10.4018/978-1-60566-766-9.ch003
35 Xu L. Best harmony, unified RPCL and automated model selection for unsupervised and supervised learning on Gaussian mixtures, ME-RBF models and three-layer nets. International Journal of Neural Systems , 2001, 11(1): 3-69
doi: 10.1016/S0129-0657(01)00049-7
36 Bartlett P L, Boucheron S, Lugosi G. Model selection and error estimation. Machine Learning , 2002, 48(1-3): 85-113
doi: 10.1023/A:1013999503812
37 Kearns M, Mansour Y, Ng A Y, Ron D. An experimental and theoretical comparison of model selection methods. Machine Learning , 1997, 27(1): 7-50
doi: 10.1023/A:1007344726582
38 Wallace C S, Dowe D L. Refinements of MDL and MML coding. The Computer Journal , 1999, 42(4): 330-337
doi: 10.1093/comjnl/42.4.330
39 Kotz S, Nadarajah S. Multivariate t Distributions and Their Applications. Cambridge: Cambridge University Press, 2004
doi: 10.1017/CBO9780511550683
40 Varma M, Zisserman A. Texture classification: are filter banks necessary? In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition . 2003, 2: 691-698
41 Nikou C, Likas A, Galatsanos N. A Bayesian framework for image segmentation with spatially varying mixtures. IEEE Transactions on Image Processing , 2010, 19(9): 2278-2289
doi: 10.1109/TIP.2010.2047903
42 Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2000, 22(8): 888-905
doi: 10.1109/34.868688
43 Rother C, Kolmogorov V, Blake A. “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics , 2004, 23(3): 309-314
doi: 10.1145/1015706.1015720
[1] Lei XU. On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications[J]. Front Elect Electr Eng, 2012, 7(1): 147-196.
[2] Hailong ZHU, Peng LIU, Jiafeng LIU, Xianglong TANG. A primary-secondary background model with sliding window PCA algorithm[J]. Front Elect Electr Eng Chin, 2011, 6(4): 528-534.
[3] Penghui WANG, Lei SHI, Lan DU, Hongwei LIU, Lei XU, Zheng BAO. Radar HRRP statistical recognition with temporal factor analysis by automatic Bayesian Ying-Yang harmony learning[J]. Front Elect Electr Eng Chin, 2011, 6(2): 300-317.
[4] Lei XU. Codimensional matrix pairing perspective of BYY harmony learning: hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology[J]. Front Elect Electr Eng Chin, 2011, 6(1): 86-119.
[5] Lei XU. Bayesian Ying-Yang system, best harmony learning, and five action circling[J]. Front Elect Electr Eng Chin, 2010, 5(3): 281-328.
[6] Rongyan WANG, Gang LIU, Jun GUO, Yu FANG, . Multi-class classifier of non-speech audio based on Fisher kernel[J]. Front. Electr. Electron. Eng., 2010, 5(1): 72-76.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed