|
Learning Gaussian mixture with automatic model selection: A comparative study on three Bayesian related approaches
Lei SHI, Shikui TU, Lei XU
Front Elect Electr Eng Chin. 2011, 6 (2): 215-244.
https://doi.org/10.1007/s11460-011-0153-z
Three Bayesian related approaches, namely, variational Bayesian (VB), minimum message length (MML) and Bayesian Ying-Yang (BYY) harmony learning, have been applied to automatically determining an appropriate number of components during learning Gaussian mixture model (GMM). This paper aims to provide a comparative investigation on these approaches with not only a Jeffreys prior but also a conjugate Dirichlet-Normal-Wishart (DNW) prior on GMM. In addition to adopting the existing algorithms either directly or with some modifications, the algorithm for VB with Jeffreys prior and the algorithm for BYY with DNW prior are developed in this paper to fill the missing gap. The performances of automatic model selection are evaluated through extensive experiments, with several empirical findings: 1) Considering priors merely on the mixing weights, each of three approaches makes biased mistakes, while considering priors on all the parameters of GMM makes each approach reduce its bias and also improve its performance. 2) As Jeffreys prior is replaced by the DNW prior, all the three approaches improve their performances. Moreover, Jeffreys prior makes MML slightly better than VB, while the DNW prior makes VB better than MML. 3) As the hyperparameters of DNW prior are further optimized by each of its own learning principle, BYY improves its performances while VB and MML deteriorate their performances when there are too many free hyper-parameters. Actually, VB and MML lack a good guide for optimizing the hyper-parameters of DNW prior. 4) BYY considerably outperforms both VB and MML for any type of priors and whether hyper-parameters are optimized. Being different from VB and MML that rely on appropriate priors to perform model selection, BYY does not highly depend on the type of priors. It has model selection ability even without priors and performs already very well with Jeffreys prior, and incrementally improves as Jeffreys prior is replaced by the DNW prior. Finally, all algorithms are applied on the Berkeley segmentation database of real world images. Again, BYY considerably outperforms both VB and MML, especially in detecting the objects of interest from a confusing background.
References |
Related Articles |
Metrics
|
|
An investigation of several typical model selection criteria for detecting the number of signals
Shikui TU, Lei XU
Front Elect Electr Eng Chin. 2011, 6 (2): 245-255.
https://doi.org/10.1007/s11460-011-0146-y
Based on the problem of detecting the number of signals, this paper provides a systematic empiricalinvestigation on model selection performances of several classical criteria and recently developed methods (including Akaike’s information criterion (AIC), Schwarz’s Bayesian information criterion, Bozdogan’s consistent AIC, Hannan-Quinn information criterion, Minka’s (MK) principal component analysis (PCA) criterion, Kritchman & Nadler’s hypothesis tests (KN), Perry & Wolfe’s minimax rank estimation thresholding algorithm (MM), and Bayesian Ying-Yang (BYY) harmony learning), by varying signal-to-noise ratio (SNR) and training sample size N. A family of model selection indifference curves is defined by the contour lines of model selection accuracies, such that we can examine the joint effect of N and SNR rather than merely the effect of either of SNR and N with the other fixed as usually done in the literature. The indifference curves visually reveal that all methods demonstrate relative advantages obviously within a region of moderate N and SNR. Moreover, the importance of studying this region is also confirmed by an alternative reference criterion by maximizing the testing likelihood. It has been shown via extensive simulations that AIC and BYY harmony learning, as well as MK, KN, and MM, are relatively more robust than the others against decreasing N and SNR, and BYY is superior for a small sample size.
References |
Related Articles |
Metrics
|
|
Parameterizations make different model selections: Empirical findings from factor analysis
Shikui TU, Lei XU
Front Elect Electr Eng Chin. 2011, 6 (2): 256-274.
https://doi.org/10.1007/s11460-011-0150-2
How parameterizations affect model selection performance is an issue that has been ignored or seldom studied since traditional model selection criteria, such as Akaike’s information criterion (AIC), Schwarz’s Bayesian information criterion (BIC), difference of negative log-likelihood (DNLL), etc., perform equivalently on different parameterizations that have equivalent likelihood functions. For factor analysis (FA), in addition to one traditional model (shortly denoted by FA-a), it was previously found that there is another parameterization (shortly denoted by FA-b) and the Bayesian Ying-Yang (BYY) harmony learning gets different model selection performances on FA-a and FA-b. This paper investigates a family of FA parameterizations that have equivalent likelihood functions, where each one (shortly denoted by FA-r) is featured by an integer r, with FA-a as one end that r = 0 and FA-b as the other end that r reaches its upper-bound. In addition to the BYY learning in comparison with AIC, BIC, and DNLL, we also implement variational Bayes (VB). Several empirical finds have been obtained via extensive experiments. First, both BYY and VB perform obviously better on FA-b than on FA-a, and this superiority of FA-b is reliable and robust. Second, both BYY and VB outperform AIC, BIC, and DNLL, while BYY further outperforms VB considerably, especially on FA-b. Moreover, with FA-a replaced by FA-b, the gain obtained by BYY is obviously higher than the one by VB, while the gain by VB is better than no gain by AIC, BIC, and DNLL. Third, this paper also demonstrates how each part of priors incrementally and jointly improves the performances, and further shows that using VB to optimize the hyperparameters of priors deteriorates the performances while using BYY for this purpose can further improve the performances.
References |
Related Articles |
Metrics
|
|
Discriminative training of GMM-HMM acoustic model by RPCL learning
Zaihu PANG, Shikui TU, Dan SU, Xihong WU, Lei XU
Front Elect Electr Eng Chin. 2011, 6 (2): 283-290.
https://doi.org/10.1007/s11460-011-0152-0
This paper presents a new discriminative approach for training Gaussian mixture models (GMMs) of hidden Markov models (HMMs) based acoustic model in a large vocabulary continuous speech recognition (LVCSR) system. This approach is featured by embedding a rival penalized competitive learning (RPCL) mechanism on the level of hidden Markov states. For every input, the correct identity state, called winner and obtained by the Viterbi force alignment, is enhanced to describe this input while its most competitive rival is penalized by de-learning, which makes GMMs-based states become more discriminative.Without the extensive computing burden required by typical discriminative learning methods for one-pass recognition of the training set, the new approach saves computing costs considerably. Experiments show that the proposed method has a good convergence with better performances than the classical maximum likelihood estimation (MLE) based method. Comparing with two conventional discriminative methods, the proposed method demonstrates improved generalization ability, especially when the test set is not well matched with the training set.
References |
Related Articles |
Metrics
|
|
Data-based intelligent modeling and control for nonlinear systems
Chaoxu MU, Changyin SUN
Front Elect Electr Eng Chin. 2011, 6 (2): 291-299.
https://doi.org/10.1007/s11460-011-0143-1
With the ever increasing complexity of industrial systems, model-based control has encountered difficulties and is facing problems, while the interest in data-based control has been booming. This paper gives an overview of data-based control, which divides it into two subfields, intelligent modeling and direct controller design. In the two subfields, some important methods concerning data-based control are intensively investigated. Within the framework of data-based modeling, main modeling technologies and control strategies are discussed, and then fundamental concepts and various algorithms are presented for the design of a data-based controller. Finally, some remaining challenges are suggested.
References |
Related Articles |
Metrics
|
|
Radar HRRP statistical recognition with temporal factor analysis by automatic Bayesian Ying-Yang harmony learning
Penghui WANG, Lei SHI, Lan DU, Hongwei LIU, Lei XU, Zheng BAO
Front Elect Electr Eng Chin. 2011, 6 (2): 300-317.
https://doi.org/10.1007/s11460-011-0149-8
Radar high-resolution range profiles (HRRPs) are typical high-dimensional and interdimension dependently distributed data, the statistical modeling of which is a challenging task for HRRP-based target recognition. Supposing that HRRP samples are independent and jointly Gaussian distributed, a recent work [Du L, Liu H W, Bao Z. IEEE Transactions on Signal Processing, 2008, 56(5): 1931-1944] applied factor analysis (FA) to model HRRP data with a two-phase approach for model selection, which achieved satisfactory recognition performance. The theoretical analysis and experimental results reveal that there exists high temporal correlation among adjacent HRRPs. This paper is thus motivated to model the spatial and temporal structure of HRRP data simultaneously by employing temporal factor analysis (TFA) model. For a limited size of high-dimensional HRRP data, the two-phase approach for parameter learning and model selection suffers from intensive computation burden and deteriorated evaluation. To tackle these problems, this work adopts the Bayesian Ying-Yang (BYY) harmony learning that has automatic model selection ability during parameter learning. Experimental results show stepwise improved recognition and rejection performances from the two-phase learning based FA, to the two-phase learning based TFA and to the BYY harmony learning based TFA with automatic model selection. In addition, adding many extra free parameters to the classic FA model and thus becoming even worse in identifiability, the model of a general linear dynamical system is even inferior to the classic FA model.
References |
Related Articles |
Metrics
|
|
Natural scene recognition using weighted histograms of gradient orientation descriptor
Li ZHOU, Dewen HU, Zongtan ZHOU, Zhaowen ZHUANG
Front Elect Electr Eng Chin. 2011, 6 (2): 318-327.
https://doi.org/10.1007/s11460-011-0140-4
The automatic recognition of the contents of a scene is an important issue in the computer vision field. Though considerable progress has been made, the complexity of scenes remains an important challenge to computer vision research. Most of the previous scene recognition models are based on the so-called “bag of visual words” method, which uses some clustering method to quantize the numerous local region descriptors into a codebook. The size of the codebook and the selection of initial clustering center have great influence on the performance. Furthermore, the big size of the codebook has high computational cost and memory consumption. To overcome these drawbacks, we present an unsupervised natural scene recognition approach that is not based on the “bag of visual words” method. This approach works by creating multiple resolution images and partitioning them into sub-regions at different scales. The descriptors of all sub-regions in the same resolution image are directly concatenated for support vector machine (SVM) classifiers. To represent images more effectively, we present a new visual descriptor: weighted histograms of gradient orientation (WHGO). We evaluate our approach on three data sets: the 8 scene categories of Oliva et al., the 13 scene categories of Fei-Fei et al. and the 15 scene categories of Lazebnik et al. Experiments show that the WHGO descriptor outperforms the classical scale invariant feature transform (SIFT) descriptor in natural scene recognition, and our approach achieves good performances with respect to the state of the art methods.
References |
Related Articles |
Metrics
|
|
Thresholding-based detection of fine and sparse details
Alexander DROBCHENKO, Joni-Kristian KAMARAINEN, Lasse LENSU, Jarkko VARTIAINEN, Heikki K?LVI?INEN, Tuomas EEROLA
Front Elect Electr Eng Chin. 2011, 6 (2): 328-338.
https://doi.org/10.1007/s11460-011-0139-x
Fine and sparse details appear in many quality inspection applications requiring machine vision. Especially on flat surfaces, such as paper or board, the details can be made detectable by oblique illumination. In this study, a general definition of such details is given by defining sufficient statistical properties from histograms. The statistical model allows simulation of data and comparison of methods designed for detail detection. Based on the definition, utilization of the existing thresholding methods is shown to be well motivated. The comparison shows that minimum error thresholding outperforms the other standard methods. Finally, the results are successfully applied to a paper printability inspection application, and the IGT picking assessment, in which small surface defects must be detected. The provided method and measurement system prototype provide automated assessment with results comparable to manual expert evaluations in this laborious task.
References |
Related Articles |
Metrics
|
|
3D face recognition based on principal axes registration and fusing features
Hongxia ZHANG, Yanning ZHANG, Zhe GUO, Zenggang LIN, Chao ZHANG
Front Elect Electr Eng Chin. 2011, 6 (2): 347-352.
https://doi.org/10.1007/s11460-011-0155-x
A 3D face recognition approach which uses principal axes registration (PAR) and three face representation features from the re-sampling depth image: Eigenfaces, Fisherfaces and Zernike moments is presented. The approach addresses the issue of 3D face registration instantly achieved by PAR. Because each facial feature has its own advantages, limitations and scope of use, different features will complement each other. Thus the fusing features can learn more expressive characterizations than a single feature. The support vector machine (SVM) is applied for classification. In this method, based on the complementarity between different features, weighted decision-level fusion makes the recognition system have certain fault tolerance. Experimental results show that the proposed approach achieves superior performance with the rank-1 recognition rate of 98.36% for GavabDB database.
References |
Related Articles |
Metrics
|
|
Classification of schizophrenic patients and healthy controls using multiple spatially independent components of structural MRI data
Lubin WANG, Hui SHEN, Baojuan LI, Dewen HU
Front Elect Electr Eng Chin. 2011, 6 (2): 353-362.
https://doi.org/10.1007/s11460-011-0142-2
Several meta-analyses were recently conducted in attempts to identify the core brain regions exhibiting pathological changes in schizophrenia, which could potentially act as disease markers. Based on the findings of these meta-analyses, we developed a multivariate pattern analysis method to classify schizophrenic patients and healthy controls using structural magnetic resonance imaging (sMRI) data. Independent component analysis (ICA) was used to decompose gray matter density images into a set of spatially independent components. Spatial multiple regression of a region of interest (ROI) mask with each of the components was then performed to determine pathological patterns, in which the voxels were taken as features for classification. After dimensionality reduction using principal component analysis (PCA), a nonlinear support vector machine (SVM) classifier was trained to discriminate schizophrenic patients from healthy controls. The performance of the classifier was tested using a 10-fold cross-validation strategy. Experimental results showed that two distinct spatial patterns displayed discriminative power for schizophrenia, which mainly included the prefrontal cortex (PFC) and subcortical regions respectively. It was found that simultaneous usage of these two patterns improved the classification performance compared to using either of them alone. Moreover, the two pathological patterns constitute a prefronto-subcortical network, suggesting that schizophrenia involves abnormalities in networks of brain regions.
References |
Related Articles |
Metrics
|
|
MRI image segmentation based on fast kernel clustering analysis
Liang LIAO, Yanning ZHANG
Front Elect Electr Eng Chin. 2011, 6 (2): 363-373.
https://doi.org/10.1007/s11460-011-0154-y
Kernel-based clustering is supposed to provide a better analysis tool for pattern classification, which implicitly maps input samples to a high-dimensional space for improving pattern separability. For this implicit space map, the kernel trick is believed to elegantly tackle the problem of “curse of dimensionality”, which has actually been more challenging for kernel-based clustering in terms of computational complexity and classification accuracy, which traditional kernelized algorithms cannot effectively deal with. In this paper, we propose a novel kernel clustering algorithm, called KFCM-III, for this problem by replacing the traditional isotropic Gaussian kernel with the anisotropic kernel formulated by Mahalanobis distance. Moreover, a reduced-set represented kernelized center has been employed for reducing the computational complexity of KFCM-I algorithm and circumventing the model deficiency of KFCM-II algorithm. The proposed KFCMIII has been evaluated for segmenting magnetic resonance imaging (MRI) images. For this task, an image intensity inhomogeneity correction is employed during image segmentation process. With a scheme called preclassification, the proposed intensity correction scheme could further speed up image segmentation. The experimental results on public image data show the superiorities of KFCM-III.
References |
Related Articles |
Metrics
|
|
Development of color density concept with color difference formulas in respect to human vision system
Arto KAARNA, Wei LIU, Heikki K?LVI?INEN
Front Elect Electr Eng Chin. 2011, 6 (2): 381-387.
https://doi.org/10.1007/s11460-011-0144-0
The aims of this study are to develop the color density concept and to propose the color density based color difference formulas. The color density is defined using the metric coefficients that are based on the discrimination ellipses and the locations of the colors in the color space. The ellipse sets are the MacAdam ellipses in the CIE 1931 xy-chromaticity diagram and the chromaticity-discrimination ellipses in the CIELAB space. The latter set was originally used to develop the CIEDE2000 color difference formula. The color difference can be calculated from the color density for the two colors under consideration. As a result, the color density represents the perceived color difference more accurately, and it could be used to characterize a color by a quantity attribute matching better to the perceived color difference from this color. Resulting from this, the color density concept provides simply a correction term for the estimation of the color differences. In the experiments, the line element formula and the CIEDE2000 color difference formula performed better than the color density based difference measures. The reason behind this is in the current modeling of the color density concept. The discrimination ellipses are typically described with three-dimensional data consisting of two axes, the major and the minor, and the inclination angle. The proposed color density is only a one-dimensional corrector for color differences; thus, it cannot capture all the details of the ellipse information. Still, the color density gives clearly more correct estimations to perceived color differences than Euclidean distances using directly the coordinates of the color space.
References |
Related Articles |
Metrics
|
20 articles
|