Statistical Speech Feature Selection for Emotion Recognition

Kwon Oh-Wook;Chan Kwokleung;Lee Te-Won;

The Journal of the Acoustical Society of Korea

Volume 24 Issue 4E
/
Pages.144-151
/
2005
/
1225-4428(pISSN)

The Acoustical Society of Korea (한국음향학회)

Statistical Speech Feature Selection for Emotion Recognition

Kwon Oh-Wook (Chungbuk National University) ;
Chan Kwokleung (University of California) ;
Lee Te-Won (University of California)

Published : 2005.12.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

We evaluate the performance of emotion recognition via speech signals when a plain speaker talks to an entertainment robot. For each frame of a speech utterance, we extract the frame-based features: pitch, energy, formant, band energies, mel frequency cepstral coefficients (MFCCs), and velocity/acceleration of pitch and MFCCs. For discriminative classifiers, a fixed-length utterance-based feature vector is computed from the statistics of the frame-based features. Using a speaker-independent database, we evaluate the performance of two promising classifiers: support vector machine (SVM) and hidden Markov model (HMM). For angry/bored/happy/neutral/sad emotion classification, the SVM and HMM classifiers yield $42.3\%\;and\;40.8\%$ accuracy, respectively. We show that the accuracy is significant compared to the performance by foreign human listeners.

Keywords

References

R.W. Picard, Affective computing, (MIT Media Lab Perceptual Computing Section Technical Report No. 321, 1995.)
R. Cowie, 'Describing the emotional states expressed in speech,' ISCA Workshop on Speech and Emotion, Belfast 2000
N. Amir, 'Classifying emotions in speech: A comparison of methods,' Proc. Eurospeech 2001, Aalborg, Denmark, Sep. 2001
M, Pardas, A. Bonafonte, J.L. Landabaso, 'Emotion recognition based on MPEG-4 facial animation parameters,' Proc. ICASSP 2002, Orlando, USA, May 2002
K. R. Scherer, 'Adding the affective dimension: A new look in speech analysis and synthesis,' Proc. ICSLP 96, 1996
S. McGilioway, R. Cowie, E. Douglas-Cowie, 'Approaching automatic recognition of emotion from voice: A rough benchmark,' ISCA Workshop on Speech and Emotion, Belfast 2000
A. Nogueiras, A. Moreno, A. Bonafonte, J.B. Marino, 'Speech emotion recognition using hidden Markov models,' Proc, Eurospeech 2001, Aalborg, Denmark, Sep, 2001
G. Zhou, J,H.L. Hansen, and J,K. Kaiser, 'Nonlinear feature based classification of speech under stress,' IEEE Trans. Speech and Audio Processing, 9 (3), 201-216, Mar. 2001 https://doi.org/10.1109/89.905995
M, Rahurkar, J.H.L. Hansen, J. Meyerhoff, G, Saviolakis, M. Koenig, 'Frequency Distribution Based Weighted Sub-band Approach for Classification of Emotional/Stressful Content in Speech,' Proc. Eurospeech-2003, 721-724, Geneva, Switzerland, Sep, 2003
D. Ververidis, C. Kotropoulos, I. Pitas, 'Automatic emotional speech classification,' Proc. ICASSP 2004, 1-5931-596, 2004
A. Tickle, 'English and Japanese speakers' emotion vocalization and recognition: A comparison highlighting vowel quality,'' ISCA Workshop on Speech and Emotion, Belfast, 2000
T. S. Polzin, A. Waibel, 'Emotion-sensitive human-computer interfaces,' ISCA Workshop on Speech and Emotion, Belfast, 2000
C. M. Lee, S. S. Narayanan, 'Toward detecting emotions in spoken dialogs,' IEEE Trans. Speech and Audio Processing, 13 (2), 293-303, Mar. 2005 https://doi.org/10.1109/TSA.2004.838534
R. Tato, R, Santos, R. Kompe, J.M. Pardo, 'Emotional space improves emotion recognition,' Proc. ICSLP 2002, 2029-2032, Sep, 2002
L. Rabiner and B.-H, Juang, Fundamentals of Speech Recognition, (Prentice-Hall, 1993.)
L. Rabiner and R,W. Schafer, Digital Processing of Speech Signals, (Prentice-Hall, 1978.)
ETSI Standard, Final Draft ETSI ES 202 050 v1,1,1 (2002-07), Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms
V, Vapnik, Statistical Learning Theory, (New York: Wiley, 1998.)
S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.2, (Cambridge University Engineering Department, 2002.)
B.D. Ripley, Pattern Recognition and Neural Networks. (Cambridge, U.K.: Cambridge Univ. Press, 1996.)
C.-W. Hsu and C,-J, Lin, 'A comparison of methods for multi-class support vector machines,' IEEE Transactions on Neural Networks, 13, 415-425, 2002 https://doi.org/10.1109/72.991427
J. Ma, Y. Zhao, and S. Ahalt, OSU SVM Classifier Matlab Toolbox (ver 3.00), http://eewww.eng.ohio-state.edu/-maj/osu_svm/
S, Steidl, M. Levit, A, Batliner, E, Noth, H. Niemann, 'Of all things the measure is man - Automatic classification of emotions and inter-Iabeler consistency,' Proc. ICASSP 2005, PP. 1-3171-320, 2005
A.J. Hayter, Probability and Statistics for Engineers and Scientists, (PWS Publishing Company, 1995.)

The Journal of the Acoustical Society of Korea

Statistical Speech Feature Selection for Emotion Recognition

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)