DOI QR코드

DOI QR Code

Robust Speech Parameters for the Emotional Speech Recognition

감정 음성 인식을 위한 강인한 음성 파라메터

  • Lee, Guehyun (Department of Electrical Engineering, Kunsan National University) ;
  • Kim, Weon-Goo (Department of Electrical Engineering, Kunsan National University)
  • 이규현 (군산대학교 전기공학과) ;
  • 김원구 (군산대학교 전기공학과)
  • Received : 2012.10.12
  • Accepted : 2012.11.30
  • Published : 2012.12.25

Abstract

This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.

본 논문에서는 강인한 감정 음성 인식 시스템을 개발하기 위하여 감정의 영향을 적게 받는 음성 파라메터에 대한 연구를 수행하였다. 이러한 목적을 위하여 다양한 감정이 포함된 데이터를 사용하여 감정이 음성 인식 시스템과 음성 파라메터에 미치는 영향을 분석하였다. 본 연구에서는 멜 켑스트럼, 델타 멜 켑스트럼, RASTA 멜 켑스트럼, 루트 켑스트럼, PLP 계수와 성도 길이 정규화 방법에서 주파수 와핑된 멜 켑스트럼 계수를 사용하였다. 또한 신호 편의 제거 방법으로 CMS 방법과 SBR 방법이 사용되었다. 실험결과에서 성도정규화 방법을 사용한 RASTA 멜 켑스트럼, 델타 멜 켑스트럼 및 CMS 방법을 사용한 경우가 HMM 기반의 화자독립 단독음 인식 실험 결과에서 가장 우수한 결과를 나타내었다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. M. Benzeghiba et al, "Impact of variabilities on speech recognition," in SPECOM'2006, 11th International Conference Speech and Computer, pp. 3-16, June 2006.
  2. Y. Sun, Y. Zhou, Q. Zhao, and Y. Yan, "Acoustic Feature optimization for emotion affected speech recognition," International Conference on Information Engineering and Computer Science, pp. 1-4, Dec. 2009.
  3. J. Koehler, N. Morgan, H. Hermansky, H. G. Hirsch, and G. Tong, "Integrating RASTA-PLP into Speech Recognition," Proc. ICASSP, pp. 421-424, Apr. 1994.
  4. B. Schuller, J. Stadermann1 and G. Rigoll, "Affect-robust speech recognition by dynamic emotional adaptation," Speech Prosody 2006.
  5. J. H. Hansen and S. Patil, Speech under stress: Analysis, modeling and recognition, Berlin, Heidelberg: Springer-Verlag, pp. 108-137, 2007.
  6. M. Pitz and H. Ney, "Vocal tract normalization equals linear transformation in cepstral space," IEEE Trans. Speech & Audio Processing, vol. 13, no. 5, pp. 930-944, 2005. https://doi.org/10.1109/TSA.2005.848881
  7. S. Wegmann, D. McAllaster, J. Orlofl, and B. Peskin, "Speaker normalization on conversational telephone speech," Proc. of ICASSP, Atlanta, GA, pp. 339-342, May. 1996.
  8. L. Welling, R. Haeb-Umbach, X. Aubert and N. Haberland, "A study on speaker Normalization using vocal tract normalization and speaker adaptive training," Proc. of ICASSP, Seattle, WA, vol. 2, pp. 797-800, May. 1998.
  9. S. Molau, S. Kanthak, and H. Ney, "Efficient vocal tract normalization in automatic speech recognition," Proc. of of the ESSV'00, Cottbus, Germany, pp. 209-216, 2000.
  10. B. S. Kang, Context Independent Emotion Recognition System using Speech Signal, Yonsei University, Master of Engineering Thesis, 2000.
  11. M. G. Rahim and B. H. Juang, "Signal Bias Removal by Maximum Likelihood Estimation for Robust Telephone Speech Recognition," IEEE Trans. Speech & Audio Processing, vol. 4, no. 1, pp. 19-30, 1996. https://doi.org/10.1109/TSA.1996.481449
  12. P. Alexandre, etc., "Root Cepstral Analysis: A Unified View. Application to Speech Processing in Car Noise Environments," Speech Communication, vol. 12, no. 3, pp. 277-288, 1993. https://doi.org/10.1016/0167-6393(93)90099-7
  13. H. Hermansky, N. Morgan, A. Bayya and P. Kohn, "Compensation for the Effect of the Communication Channel in Auditory-Like Analysis of Speech(RASTA-PLP)," in Proc. EUROSPEECH, vol. 3, pp. 1367-1370, Sep. 1991.

Cited by

  1. A Nonuniform Sampling Technique and Its Application to Speech Coding vol.24, pp.1, 2014, https://doi.org/10.5391/JKIIS.2014.24.1.028