Please wait a minute...
Frontiers of Electrical and Electronic Engineering

ISSN 2095-2732

ISSN 2095-2740(Online)

CN 10-1028/TM

Front Elect Electr Eng Chin    2011, Vol. 6 Issue (2) : 283-290    https://doi.org/10.1007/s11460-011-0152-0
RESEARCH ARTICLE
Discriminative training of GMM-HMM acoustic model by RPCL learning
Zaihu PANG1, Shikui TU2, Dan SU1, Xihong WU1(), Lei XU1,2()
1. Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing 100871, China; 2. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
 Download: PDF(265 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

This paper presents a new discriminative approach for training Gaussian mixture models (GMMs) of hidden Markov models (HMMs) based acoustic model in a large vocabulary continuous speech recognition (LVCSR) system. This approach is featured by embedding a rival penalized competitive learning (RPCL) mechanism on the level of hidden Markov states. For every input, the correct identity state, called winner and obtained by the Viterbi force alignment, is enhanced to describe this input while its most competitive rival is penalized by de-learning, which makes GMMs-based states become more discriminative.Without the extensive computing burden required by typical discriminative learning methods for one-pass recognition of the training set, the new approach saves computing costs considerably. Experiments show that the proposed method has a good convergence with better performances than the classical maximum likelihood estimation (MLE) based method. Comparing with two conventional discriminative methods, the proposed method demonstrates improved generalization ability, especially when the test set is not well matched with the training set.

Keywords discriminative training      hidden Markov model      rival penalized competitive learning      Bayesian Ying-Yang harmony learning      large vocabulary continuous speech recognition     
Corresponding Author(s): WU Xihong,Email:wxh@cis.pku.edu.cn; XU Lei,Email:lxu@cse.cuhk.edu.hk   
Issue Date: 05 June 2011
 Cite this article:   
Zaihu PANG,Shikui TU,Dan SU, et al. Discriminative training of GMM-HMM acoustic model by RPCL learning[J]. Front Elect Electr Eng Chin, 2011, 6(2): 283-290.
 URL:  
https://academic.hep.com.cn/fee/EN/10.1007/s11460-011-0152-0
https://academic.hep.com.cn/fee/EN/Y2011/V6/I2/283
1 Brown P. The acoustic-modeling problem in automatic speech recognition. Dissertation for the Doctoral Degree . Pittsburgh: Carnegie Mellon University, 1987
2 Gales M, Young S. The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing , 2008, 1(3): 195-304
doi: 10.1561/2000000004
3 Bahl L, Brown P, De Souza P, Mercer R. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of 1986 IEEE International Conference on Acoustics, Speech, and Signal Processing . 1986, 11: 49-52
4 Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing , 1997, 5(3): 257-265
doi: 10.1109/89.568732
5 Povey D, Woodland P C. Minimum phone error and Ismothing for improved discriminative training. In: Proceedings of 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing . 2002, 1: 105-108
6 Su D, Wu X H, Xu L. GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection. In: Proceedings of 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing . 2010, 4890-4893
doi: 10.1109/ICASSP.2010.5495122
7 Xu L. Bayesian Ying-Yang system, best harmony learning, and five action circling. Frontiers of Electrical and Electronic Engineering in China , 2010, 5(3): 281-328
doi: 10.1007/s11460-010-0108-9
8 Xu L, Krzyzak A, Oja E. Unsupervised and supervised classifications by rival penalized competitive learning. In: Proceedings of the 11th International Conference on Pattern Recognition . 1992, II: 672-675
9 Xu L. Rival penalized competitive learning. Scholarpedia , 2007, 2(8): 1810
doi: 10.4249/scholarpedia.1810
10 Xu L. A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving. Pattern Recognition , 2007, 40(8): 2129-2153
doi: 10.1016/j.patcog.2006.12.016
11 Kuhn H W. The hungarian method for the assignment problem. Naval Research Logistics Quarterly , 1955, 2(1-2): 83-97
doi: 10.1002/nav.3800020109
12 Young S, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P. The HTK Book Version 3.4. Cambridge: Cambridge University Press, 2006
13 Povey D, Kingsbury B. Evaluation of proposed modifications to MPE for large scale discriminative training. In: Proceedings of 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing . 2007, 4: IV-321-IV-324
14 Cheng Y J, Lin C K, Lee L S. Evaluation and analysis of minimum phone error training and its modified versions for large vocabulary Mandarin speech recognition. In: Proceedings of 2008 IEEE International Symposium on Chinese Spoken Language Processing . 2008, 1: 157-160
doi: 10.1109/CHINSL.2008.ECP.51
15 Valtchev V, Odell J J, Woodland P C, Young S J. MMIE training of large vocabulary recognition systems. Speech Communication , 1997, 22(4): 303-314
doi: 10.1016/S0167-6393(97)00029-0
16 McDermott E, Katagiri S. String-level MCE for continuous phoneme recognition. In: Proceedings of EuroSpeech 1997 . 1997, 123-126
17 Macherey W, Haferkamp L, Schluter R, Ney H. Investigations on error minimizing training criteria for discriminative training in acoustic speech recognition. In: Proceedings of EuroSpeech 2005 . 2005, 2133-2136
18 Schlter R, Macherey W, Mller B, Ney H. Comparison of discriminative training criteria and optimization methods for speech recognition. Speech Communication , 2001, 34(3): 287-310
doi: 10.1016/S0167-6393(00)00035-2
19 Fu Q, He X, Deng L. Phone-discriminating minimum classification error (P-MCE) training criteria for phonetic recognition. In: Proceedings of InterSpeech 2007 . 2007, 2073-2076
[1] Wanwan TANG, Rui LI, Shao LI, Yanda LI. Co-regulated gene module detection for time series gene expression data[J]. Front Elect Electr Eng, 2012, 7(4): 357-366.
[2] Lei XU. On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications[J]. Front Elect Electr Eng, 2012, 7(1): 147-196.
[3] Lei XU. Bayesian Ying-Yang system, best harmony learning, and five action circling[J]. Front Elect Electr Eng Chin, 2010, 5(3): 281-328.
[4] Hao WU, Xihong WU, Huisheng CHI, . Product HMM-based training method for acoustic model with multiple-size units[J]. Front. Electr. Electron. Eng., 2010, 5(1): 65-71.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed