Utilizing Korean Ending Boundary Tones for Accurately Recognizing Emotions in Utterances

발화 내 감정의 정밀한 인식을 위한 한국어 문미억양의 활용

  • 장인창 (고려대학교 산업정보시스템공학과) ;
  • 이태승 (한국과학기술연구원 CAD/CAM 연구센터) ;
  • 박미경 (한국과학기술연구원 CAD/CAM 연구센터) ;
  • 김태수 (한국과학기술연구원 CAD/CAM 연구센터) ;
  • 장동식 (고려대학교 산업정보시스템공학과)
  • Published : 2005.06.01

Abstract

Autonomic machines interacting with human should have capability to perceive the states of emotion and attitude through implicit messages for obtaining voluntary cooperation from their clients. Voice is the easiest and most natural way to exchange human messages. The automatic systems capable to understanding the states of emotion and attitude have utilized features based on pitch and energy of uttered sentences. Performance of the existing emotion recognition systems can be further improved withthe support of linguistic knowledge that specific tonal section in a sentence is related with the states of emotion and attitude. In this paper, we attempt to improve recognition rate of emotion by adopting such linguistic knowledge for Korean ending boundary tones into anautomatic system implemented using pitch-related features and multilayer perceptrons. From the results of an experiment over a Korean emotional speech database, the improvement of $4\%$ is confirmed.

인간을 상대하는 자율장치는 고객의 자발적인 협조를 얻기 위해 암시적인 신호에 포함된 감정과 태도를 인지할 수 있어야 한다. 인간에게 음성은 가장 쉽고 자연스럽게 정보를 교환할 수 있는 수단이다. 지금까지 감정과 태도를 이해할 수 있는 자동시스템은 발성문장의 피치와 에너지에 기반한 특징을 활용하였다. 이와 같은 기존의 감정인식 시스템의 성능은 문장의 특정한 억양구간이 감정과 태도와 관련을 갖는다는 언어학적 지식의 활용으로 보다 높은 향상이 가능하다. 본 논문에서는 한국어 문미억양에 대한 언어학적 지식을 피치기반 특징과 다층신경망을 활용하여 구현한 자동시스템에 적용하여 감정인식률을 향상시킨다. 한국어 감정음성 데이터베이스를 대상으로 실험을 실시한 결과 $4\%$의 인식률 향상을 확인하였다.

Keywords

References

  1. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W. and Taylor, J.G., 'Emotion Recognition in HumanComputer Interaction,' IEEE Signal Processing Magazine, Vol. 18, No.1, pp. 32-80, Jan 2001 https://doi.org/10.1109/79.911197
  2. Gauvain, J. and Lamel, L., 'Large-Vocabulary Continuous Speech Recognition: Advances and Applications,' Proceedings of the IEEE, Vol. 88, No.8, pp. 1181-1200, Aug 2000
  3. Yoshimura, T., Hayamizu, S., Ohmura, H., and Tanaka, K., 'Pitch Pattern Clustering of User Utterances in Human-Machine Dialogue,' Proceedings of the International Conference on Spoken Language, Vol. 2, pp. 837-840, Oct 1996
  4. Dellaert, F., Polzin, T., and Waibel, A., 'Recognizing Emotion in Speech,' Proceedings of the International Conference on Spoken Language, Vol. 3, pp. 1970-1973, Oct 1996
  5. Bhatti, M. W., Wang Y., and Guan. L., 'A Neural Network Approach for Human Emotion Recognition in Speech,' Proceedings of the 2004 International Symposium on Circuits and Systems, Vol. 2, pp. 181-184, May 2004
  6. Schuller, B., Rigoll, G. and Lang, M., 'Hidden Markov Model-Based Speech Emotion Recognition,' Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 1-4, Apr 2003
  7. Schuller, B., Rigoll, G. and Lang, M., 'Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine-Belief Network Architecture,' Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 577-580, May 2004
  8. O'Connor, J. D. and Arnold G. F., Intonation of Colloquial English, Longmans, 1961
  9. Jun, S., K-ToBI Labelling Conventions, Ver. 3.1, http://www.linguistics.ucla.edu/people/jun/ktobi/K-tobi.html, 2000
  10. Pierrehurnbert, J. and Hirschberg, J., 'The Meaning of Intonation Contours in the Interpretation of Discourse,' Intentions in Communication, MIT Press, pp. 271-323, 1990
  11. 이호영, '한국어의 억양체계,' 언어학, 제13호, pp. 129-151, 12월 1991년
  12. Rabiner, L. and Sambur, M. 'An Algorithm for Determining the Endpoints of Isolated Utterances,' Bell System Technical Journal, Vol. 54, pp. 297-315, Feb 1975
  13. Krubsack, D. A. and Niederjohn, R. J., 'An Autocorrelation Pitch Detector and Voicing Decision with Confidence Measures Developed for Noise-Corrupted Speech,' IEEE Transactions on Signal Processing, Vol. 39, No.2, pp. 319-329, Feb 1991 https://doi.org/10.1109/78.80814
  14. Bengio, Y., Neural Networks for Speech and Sequence Recognition, International Thomson Computer Press, 1995.