Please wait a minute...
Frontiers of Electrical and Electronic Engineering

ISSN 2095-2732

ISSN 2095-2740(Online)

CN 10-1028/TM

Front Elect Electr Eng    2012, Vol. 7 Issue (1) : 107-115    https://doi.org/10.1007/s11460-012-0189-8
RESEARCH ARTICLE
A computational model for assessment of speech intelligibility in informational masking
Xihong WU(), Jing CHEN()
Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing 100871, China
 Download: PDF(459 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The existing auditory computational models for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was aimed to make a computational model considering the mechanism of informational masking. Several psychoacoustic experiments were conducted to test the effect of informational masking on speech intelligibility by manipulating the number of masking talker, speech rate, and the similarity of F0 contour between target and masker. The results showed that the speech reception threshold for the target increased as the F0 contours of the masker became more similar to that of the target, suggesting that the difficulty in segregating the target harmonics from the masker harmonics may underlie the informational masking effect. Based on these studies, a new auditory computational model was made by inducing the auditory function of harmonic extraction to the traditional model of speech intelligibility index (SII), named as harmonic extraction (HF) model. The predictions of the HF model are highly consistent with the experimental results.

Keywords auditory computational model      speech intelligibility      informational masking      F0 contour      harmonic extraction     
Corresponding Author(s): WU Xihong,Email:wxh@cis.pku.edu.cn; CHEN Jing,Email:chenj@cis.pku.edu.cn   
Issue Date: 05 March 2012
 Cite this article:   
Xihong WU,Jing CHEN. A computational model for assessment of speech intelligibility in informational masking[J]. Front Elect Electr Eng, 2012, 7(1): 107-115.
 URL:  
https://academic.hep.com.cn/fee/EN/10.1007/s11460-012-0189-8
https://academic.hep.com.cn/fee/EN/Y2012/V7/I1/107
1 Geneva: International Organization for Standardization. ISO 9921, Ergonomics — Assessment of speech communication. 2003
2 Watson C S. Uncertainty, informational masking, and the capacity of immediate auditory memory. In: Yost W A, Watson C S, Eds. Auditory Processing of Complex Sounds . NJ: Lawrence Erlbaum Associates, 1987, 267-277
3 Freyman R L, Balakrishnan U, Helfer K S. Spatial release from informational masking in speech recognition. Journal of the Acoustical Society of America , 2001, 109(5): 2112-2122
doi: 10.1121/1.1354984
4 Brungart D S, Simpson B D, Ericson M A, Scott K R. Informational and energetic masking effects in the perception of multiple simultaneous talkers. Journal of the Acoustical Society of America , 2001, 110(5): 2527-2538
doi: 10.1121/1.1408946
5 Durlach N I, Mason C R, Kidd G Jr, Arbogast T L, Colburn H S, Shinn-Cunningham B G. Note on informational masking. Journal of the Acoustical Society of America , 2003, 113(6): 2984-2987
doi: 10.1121/1.1570435
6 Wu X H, Wang C, Chen J, Qu H W, Li W R, Wu Y H, Schneider B A, Li L. The effect of perceived spatial separation on informational masking of Chinese speech. Hearing Research , 2005, 199(1-2): 1-10
doi: 10.1016/j.heares.2004.03.010
7 Mattys S L, Brooks J, Cooke M. Recognizing speech under a processing load: Dissociating energetic from informational factors. Cognitive Psychology , 2009, 59(3): 203-243
doi: 10.1016/j.cogpsych.2009.04.001
8 Freyman R L, Balakrishnan U, Helfer K S. Effect of number of masking talkers and auditory priming on informational masking in speech recognition. Journal of the Acoustical Society of America , 2004, 115(5): 2246-2256
doi: 10.1121/1.1689343
9 Simpson S A, Cooke M. Consonant identification in N-talker babble is a nonmonotonic function of N. Journal of the Acoustical Society of America , 2005, 118(5): 2775-2778
doi: 10.1121/1.2062650
10 Rhebergen K S, Versfeld N J, Dreschler W A. Release from informational masking by time reversal of native and nonnative interfering speech. Journal of the Acoustical Society of America , 2005, 118(3): 1274-1277
doi: 10.1121/1.2000751
11 French N R, Steinberg J C. Factors governing the intelligibility of speech sounds. Journal of the Acoustical Society of America , 1947, 19(1): 90-119
doi: 10.1121/1.1916407
12 Fletcher H, Galt R H. The perception of speech and its relation to telephony. Journal of the Acoustical Society of America , 1950, 22(2): 89-151
doi: 10.1121/1.1906605
13 ANSI. ANSI S3.5, Methods for the calculation of the articulation index. New York: American National Standards Institute, 1969
14 ANSI. ANSI S3.5-1997, Methods for the calculation of the speech intelligibility index. New York: American National Standards Institute, 1997
15 Zhang J L. Statistic relations on articulation index across different speech test materials. Acoustics , 1964, 1: 90-94 (in Chinese)
16 Zhang J L, Ma D Y. A new method for calculating articulation index. Acoustics , 1965, 2: 80-84 (in Chinese)
17 Zhang J L. The statistic relation on articulation index between syllable and phoneme. Physics , 1974, 23: 315-320 (in Chinese)
18 Houtgast T, Steeneken H J. A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. Journal of the Acoustical Society of America , 1985, 77(3): 1069-1077
doi: 10.1121/1.392224
19 Chi T, Gao Y, Guyton M C, Ru P, Shamma S. Spectrotemporal modulation transfer functions and speech intelligibility. Journal of the Acoustical Society of America , 1999, 106(5): 2719-2732
doi: 10.1121/1.428100
20 Elhilali M, Chi T, Shamma S A. A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication , 2003, 41(2-3): 331-348
doi: 10.1016/S0167-6393(02)00134-6
21 Chen J. Mechanism of informational masking and computational model for evaluating speech intelligibility. Dissertation for the Doctoral Degree. Beijing: Peking University, 2009 (in Chinese)
22 Li L, Daneman M, Qi J G, Schneider B A. Does the information content of an irrelevant source differentially affect spoken word recognition in younger and older adults? Journal of Experimental Psychology: Human Perception and Performance , 2004, 30(6): 1077-1091
doi: 10.1037/0096-1523.30.6.1077
23 Huang Y, Huang Q, Chen X, Qu T S, Wu X H, Li L. Perceptual integration between target speech and target-speech reflection reduces masking for target-speech recognition in younger adults and older adults. Hearing Research , 2008, 244(1-2): 51-65
doi: 10.1016/j.heares.2008.07.006
24 Litovsky R Y, Colburn H S, Yost W A, Guzman S J. The precedence effect. Journal of the Acoustical Society of America , 1999, 106(4): 1633-1654
doi: 10.1121/1.427914
25 Wu X H, Chen J, Yang Z G, Huang Q, Wang M Y, Li L. Effect of number of masking talkers on speech-on-speech masking in Chinese. In: Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech2007) . 2007, 390-393
26 Henja D, Musicus B. The solafs time-scale modification algorithm. Technical report. Bolt Beranek & Newman , 1991
27 Binns C, Culling J F. The role of fundamental frequency contours in the perception of speech against interfering speech. Journal of the Acoustical Society of America , 2007, 122(3): 1765-1776
doi: 10.1121/1.2751394
28 Chen J, Li H H, Li L, Moore C J B, Wu X H. Informational masking of speech produced by speech-like sounds without linguistic content. Journal of Acoustic Society of America , 2011 (conditionally accepted)
29 Scheffers M T M. Sifting vowels: Auditory pitch analysis and sound segregation. Dissertation for the Doctoral Degree . Groningen, Netherlands: University of Groningen, 1983
30 Licklider J C R. A duplex theory of pitch perception. Experientia , 1951, 7(4): 128-134
doi: 10.1007/BF02156143
31 Assmann P F, Summerfield Q. Modeling the perception of concurrent vowels: Vowels with the same fundamental frequency. Journal of the Acoustical Society of America , 1989, 85(1): 327-338
doi: 10.1121/1.397684
32 Meddis R, Hewitt M J. Modeling the identification of concurrent vowels with different fundamental frequencies. Journal of the Acoustical Society of America , 1992, 91(1): 233-245
doi: 10.1121/1.402767
33 de Cheveigné A. Concurrent vowel identification. III: A neural model of harmonic interference cancellation. Journal of the Acoustical Society of America , 1997, 101(5): 2857-2865
doi: 10.1121/1.419480
34 Cooke M, Ellis D P W. The auditory organization of speech and other sources in listeners and computational models. Speech Communication , 2001, 35(3-4): 141-177
doi: 10.1016/S0167-6393(00)00078-9
35 Greenberg S, Ainsworth W. Speech processing in the auditory system: An overview. In: Greenberg S et al., Eds. Speech Processing in the Auditory System . Springer: Berlin, 2004, 20-22
doi: 10.1007/0-387-21575-1_1
36 Patterson R D, Allerhand M H, Gigu`ere C. Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform. Journal of the Acoustical Society of America , 1995, 98(4): 1890-1894
doi: 10.1121/1.414456
37 Glasberg B R, Moore B C J. A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society , 2002, 50(5): 331-342
38 Glasberg B R, Moore B C J. Derivation of auditory filter shapes from notched-noise data. Hearing Research , 1990, 47(1-2): 103-138
doi: 10.1016/0378-5955(90)90170-T
39 Wu M, Wang D, Brown G J. A multipitch tracking algorithm for noisy speech. IEEE Transactions on Speech and Audio Processing , 2003, 11(3): 229-241
doi: 10.1109/TSA.2003.811539
40 Studebaker G A, Sherbecoe R L, Gilmore C. Frequencyimportance and transfer functions for the Auditec of St. Louis recordings of the NU-6 word test. Journal of Speech and Hearing Research , 1993, 36(4): 799-807
41 Rhebergen K S, Versfeld N J, Dreschler W A. Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. Journal of the Acoustical Society of America , 2006, 120(6): 3988-3997
doi: 10.1121/1.2358008
42 Huang Q. Frequency importance function of Mandarin Chinese speech and models for speech intelligibility evaluation. Dissertation for the Master’s Degree . Beijing: Peking University, 2007 (in Chinese)
43 Brokx J P L, Nooteboom S G. Intonation and the perceptual separation of simultaneous voices. Journal of Phonetics , 1982, 10(1): 23-36
44 Darwin C J, Brungart D S, Simpson B D. Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers. Journal of the Acoustical Society of America , 2003, 114(5): 2913-2922
doi: 10.1121/1.1616924
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed