DOI QR코드

DOI QR Code

Emotion Recognition using Pitch Parameters of Speech

음성의 피치 파라메터를 사용한 감정 인식

  • Lee, Guehyun (Department of Electrical Engineering, Kunsan National University) ;
  • Kim, Weon-Goo (Department of Electrical Engineering, Kunsan National University)
  • 이규현 (군산대학교 전기공학과) ;
  • 김원구 (군산대학교 전기공학과)
  • Received : 2013.11.19
  • Accepted : 2015.04.03
  • Published : 2015.06.25

Abstract

This paper studied various parameter extraction methods using pitch information of speech for the development of the emotion recognition system. For this purpose, pitch parameters were extracted from korean speech database containing various emotions using stochastical information and numerical analysis techniques. GMM based emotion recognition system were used to compare the performance of pitch parameters. Sequential feature selection method were used to select the parameters showing the best emotion recognition performance. Experimental results of recognizing four emotions showed 63.5% recognition rate using the combination of 15 parameters out of 56 pitch parameters. Experimental results of detecting the presence of emotion showed 80.3% recognition rate using the combination of 14 parameters.

본 논문에서는 음성신호 피치 정보를 이용한 감정 인식 시스템 개발을 목표로 피치 정보로부터 다양한 파라메터 추출방법을 연구하였다. 이를 위하여 다양한 감정이 포함된 한국어 음성 데이터베이스를 이용하여 피치의 통계적인 정보와 수치해석 기법을 사용한 피치 파라메터를 생성하였다. 이러한 파라메터들은 GMM(Gaussian Mixture Model) 기반의 감정 인식 시스템을 구현하여 각 파라메터의 성능을 비교되었다. 또한 순차특징선택 방법을 사용하여 최고의 감정 인식 성능을 나타내는 피치 파라메터들을 선정하였다. 4개의 감정을 구별하는 실험 결과에서 총 56개의 파라메터중에서 15개를 조합하였을 때 63.5%의 인식 성능을 나타내었다. 또한 감정 검출 여부를 나타내는 실험에서는 14개의 파라메터를 조합하였을 때 80.3%의 인식 성능을 나타내었다.

Keywords

References

  1. Janet E. Cahn, "The Generation of Affect in Synthesized Speech", Journal of the American Voice I/0 Society, Vol. 8, pp.1-19 July 1990.
  2. K. R. Scherer, D. R. Ladd, and K. E. A. Silverman, "Vocal Cues to Speaker Affect: Testing Two Models", Journal Acoustical Society of America, Vol. 76, No. 5, pp. 1346-1355, Nov 1984. https://doi.org/10.1121/1.391450
  3. Iain R. Murray and John L. Arnott, "Toward the Simulation of Emotion in Synthetic Speech: A Review of the Literature on Human Vocal Emotion", Journal Acoustical Society of America, pp.1097-1108, Feb. 1993.
  4. Rosalind W. Picard, "Affective Computing", The MIT Press, 1997.
  5. V. Kostv and S. Fukuda, "Emotion in User Interface, Voice Interaction System," IEEE International Conference on Systems, Cybernetics Representation, No.2, pp,798-803, 2000
  6. T. Moriyama and S. Oazwa, "Emotion Recognition and Synthesis System on Speech," IEEE Intl. Conference on Multimedia Computing and System, , pp.840-844. 1999
  7. L. C. Siva and P. C. Ng, "Bimodal Emotion Recognition," in Proceeding of the 4th Intl. Conference on Automatic Face and Gesture Recognition, pp.332-335. 2000
  8. Y. G. Kim, Y. C. Bae, "Design of Emotion Recognition Model Using fuzzy Logic" Proceedings of KFIS Spring Conference, 2000.
  9. K. B. Sim, C. H. Park, "Analyzing the element of emotion recognition from speech", Journal of Korean Institute of Intelligent Systems, Vol. 11, no. 6, pp.510-515, 2001.
  10. P. A. Devijver and J. Kitteler, "Pattern Recognition : A Statistical Approach", London: Prentice-Hall International, 1982
  11. P. Boersma and D. Weeninck, "PRAAT, a system for doing phonetics by computer," Inst. Phon. Sci. Univ. of Amsterdam, Amsterdam, Negherlands, Tech. Rep. 132, 1996 [Online]. Available: http://www.praat.org.
  12. B. S. Kang, "text-independent emotion recognition algorithm using speech signal," Master thesis, Yonsei University, 2000
  13. Dimitrios Ververidis, Constantine Kotropoulos, Loannis Pitas, "Automatic Emotional Speech Classification", in Proceedings of ICASSP'04, 2004.
  14. Carlos Busso, Sungbok Lee, Shrikanth Narayanan, "Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection,", IEEE Trans. Speech and Audio Processing, Vol. 17, No 4, pp. 582-596, May 2009 https://doi.org/10.1109/TASL.2008.2009578

Cited by

  1. A Low Bit Rate Speech Coder Based on the Inflection Point Detection vol.15, pp.4, 2015, https://doi.org/10.5391/IJFIS.2015.15.4.300
  2. A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection vol.16, pp.4, 2016, https://doi.org/10.5391/IJFIS.2016.16.4.276