Speech Parameters for the Robust Emotional Speech Recognition

감정에 강인한 음성 인식을 위한 음성 파라메터

  • 김원구 (군산대학교 전기공학과)
  • Received : 2010.09.10
  • Accepted : 2010.12.01
  • Published : 2010.12.01


This paper studied the speech parameters less affected by the human emotion for the development of the robust speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient and frequency warped mel-cepstral coefficient were used as feature parameters. And CMS (Cepstral Mean Subtraction) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using vocal tract length normalized mel-cepstral coefficient, its derivatives and CMS as a signal bias removal showed the best performance of 0.78% word error rate. This corresponds to about a 50% word error reduction as compare to the performance of baseline system using mel-cepstral coefficient, its derivatives and CMS.


Supported by : 한국연구재단


  1. M. Benzeghiba and et al., “Impact of variabilities on speech recognition,” in SPECOM'2006, 11th International Conference Speech and Computer, pp. 3-16, June 2006.
  2. D. O'Shaughnessy “Invited paper: Automatic speech recognition: History, methods and challenges,” Pattern Recognition, vol. 41, no 10, pp. 2965-2979, Oct. 2008.
  3. H. Hermansky, N. Morgan, and H. G. Hirsch, “Recognition of speech in additive and convolutional noise based RASTA spectral processing,” Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 83-86, 1993.
  4. Y. Sun, Y. Zhou, Q. Zhao, and Y. Yan, “Acoustic Feature optimization for emotion affected speech recognition,” International Conference on Information Engineering and Computer Science 2009, pp. 1-4, Dec. 2009.
  5. J. Koehler, N. Morgan, H. Hermansky, H. G. Hirsch, and G. Tong, “Integrating RASTA-PLP into Speech Recognition,” Proc. ICASSP, pp. 421-424, Apr. 1994.
  6. B. Schuller, J. Stadermann1, and G. Rigoll, “Affect-robust speech recognition by dynamic emotional adaptation,” Speech Prosody 2006, May 2006.
  7. J. H. Hansen and S. Patil, “Speech under stress: Analysis, modeling and recognition,” Berlin, Heidelberg: Springer-Verlag, pp. 108-137, 2007.
  8. N. Amir, “Classifying emotions in speech: a comparison of methods,” Proc. of Eurospeech 2001, Aalborg, Denmark, vol. 1, pp. 127-130, 2001.
  9. A. Nogueiras, etc, “Speech emotion recognition using Hidden Markov Models,” Proc. of Eurospeech 2001, Aalborg, Denmark, vol. 4, pp. 2679-2682, 2001.
  10. R. W. Picard, “Affective computing,” The MIT Press 1997.
  11. M. Pitz and H. Ney, “Vocal tract normalization equals linear transformation in cepstral space,” IEEE Trans. Speech & Audio Processing, vol. 13, no. 5, pp. 930-944, 2005.
  12. S. Wegmann, D. McAllaster, J. Orlofl, and B. Peskin, “Speaker normalization on conversational telephone speech,” Proc. of ICASSP, Atlanta, GA, pp. 339-342, May 1996.
  13. L. Welling, R. Haeb-Umbach, X. Aubert, and N. Haberland, “A study on speaker Normalization using vocal tract normalization and speaker adaptive training,” Proc. of ICASSP, Seattle, WA, vol. 2, pp. 797-800, May 1998
  14. S. Molau, S. Kanthak, and H. Ney, “Efficient vocal tract normalization in automatic speech recognition,” in Proc. of of the ESSV’00, Cottbus, Germany, pp. 209-216, 2000
  15. 강봉석, “음성 신호를 이용한 문장독립 감정 인식 시스템,” 연세대학교 석사학위 논문, 2000.