성별 구분을 통한 음성 감성인식 성능 향상에 대한 연구

A Study on The Improvement of Emotion Recognition by Gender Discrimination

  • 조윤호 (단국대학교 공과대학 컴퓨터학부) ;
  • 박규식 (단국대학교 공과대학 컴퓨터학부)
  • Cho, Youn-Ho (Dept. of Computer Science & Engineering, Dankook University) ;
  • Park, Kyu-Sik (Dept. of Computer Science & Engineering, Dankook University)
  • 발행 : 2008.07.25

초록

본 논문은 남/여 성별에 기반해 음성을 평상, 기쁨, 슬픔, 화남의 4가지 감성 상태로 분류하는 감성인식 시스템을 구축하였다. 제안된 시스템은 입력 음성으로부터 1차적으로 남/여 성별을 분류하고, 분류된 성별을 기반으로 남/여 각기 최적의 특징벡터 열을 적용하여 감성인식을 수행함으로써 감성인식 성공률을 향상시켰다. 또한 음성인식에서 주로 사용되는 ZCPA(Zero Crossings with Peak Amplitudes)를 감성인식용 특징벡터로 사용하여 성능을 향상시켰으며, 남/여 각각의 특징 벡터 열을 최적화하기 위해 SFS(Sequential Forward Selection) 기법을 사용하였다. 감성 패턴 분류기로는 k-NN과 SVM을 비교하여 실험하였다. 실험결과 제안 시스템은 4가지 감성상태에 대해 약 85.3%의 높은 감성 인식 성공률을 달성할 수 있어 향후 감성을 인식하는 콜센터, 휴머노이드형 로봇이나 유비쿼터스(Ubiquitous) 환경 등 다양한 분야에서 감성인식 정보를 유용하게 사용될 수 있을 것으로 기대된다.

In this paper, we constructed a speech emotion recognition system that classifies four emotions - neutral, happy, sad, and anger from speech based on male/female gender discrimination. At first, the proposed system distinguish between male and female from a queried speech, then the system performance can be improved by using separate optimized feature vectors for each gender for the emotion classification. As a emotion feature vector, this paper adopts ZCPA(Zero Crossings with Peak Amplitudes) which is well known for its noise-robustic characteristic from the speech recognition area and the features are optimized using SFS method. For a pattern classification of emotion, k-NN and SVM classifiers are compared experimentally. From the computer simulation results, the proposed system was proven to be highly efficient for speech emotion classification about 85.3% regarding four emotion states. This might promise the use the proposed system in various applications such as call-center, humanoid robots, ubiquitous, and etc.

키워드

참고문헌

  1. Duda R., Hart P. and Stork D., "Pattern Classification, Second Edition," John Wiley & Sons, 2000
  2. Dellaert, F., Polzin, T., and Waibel, A., "Recognizing Emotion in Speech," Proceedings of the International Conference on Spoken Language, Vol. 3, pp. 1970-1973, Oct. 1996
  3. Dellaert, F., Polzin, T., and Waibel, A., "Recognizing Emotion in Speech," Proceedings of the International Conference on Spoken Language, Vol. 3, pp. 1970-1973, Oct. 1996
  4. Lee C. M. and Narayanan S. S., "Towards Detecting Emotions in Spoken Dialogs," IEEE Transactions on Speech and Audio Processing, Vol. 13, No. 2, March 2005
  5. Rong J., Chen Y., Chowdhury M. and Li G., "Acoustic Features Extraction for Emotion Recognition," Computer and Information Science, ICIS 2007. 6th IEEE/ACIS International Conference, pp.419-424, July 2007
  6. Vogt T and Andre E., "Improving Automatic Emotion Recognition from Speech via Gender Differentiation," Proceedings of Language Resources and Evaluation Conference '06, Italy, May 2006
  7. Lugger M. and Yang B., "The relevance of voice quality features in speaker independent emotion recognition," Acoustics, Speech and Signal Processing, ICASSP 07., Vol. 4, pp.IV-17-IV-20, April 2007
  8. Doh-Suk Kim, Soo-Young Lee and Rhee M. Kil, "Auditory Processing of Speech Signals for Robust Speech Recognition in Real-World Noisy Environments," IEEE Transactions on Speech and Audio Processing, Vol. 7, No. 1, pp.55-69, Jan. 1999 https://doi.org/10.1109/89.736331
  9. Jain A. and Zongker D., "Feature Selection: Evaluation, Application, and Small Sample Performance," IEEE transactions on pattern analysis and machine intelligence, Vol. 19, No. 2, pp.153-158. 1997 https://doi.org/10.1109/34.574797
  10. Lingyun Gu and Stephen A. Zahorian, "A New Robust Algorithm for Isolated Word Endpoint Detection," Acoustics, Speech and Signal Processing, ICASSP 02., Orlando, FL, May 2002
  11. Xuejing Sun, "A Pitch Determination Algorithm Based On Subharmonic-to-Harmonic Ratio," International Conference on Spoken Language Processing '2000, pp.676-679, 2000
  12. 강봉석, "음성 신호를 이용한 문장독립 감정 인식 시스템," 석사학위 논문, 연세대학교, 2001