Statistical Speech Feature Selection for Emotion Recognition

  • Published : 2005.12.01

Abstract

We evaluate the performance of emotion recognition via speech signals when a plain speaker talks to an entertainment robot. For each frame of a speech utterance, we extract the frame-based features: pitch, energy, formant, band energies, mel frequency cepstral coefficients (MFCCs), and velocity/acceleration of pitch and MFCCs. For discriminative classifiers, a fixed-length utterance-based feature vector is computed from the statistics of the frame-based features. Using a speaker-independent database, we evaluate the performance of two promising classifiers: support vector machine (SVM) and hidden Markov model (HMM). For angry/bored/happy/neutral/sad emotion classification, the SVM and HMM classifiers yield $42.3\%\;and\;40.8\%$ accuracy, respectively. We show that the accuracy is significant compared to the performance by foreign human listeners.

Keywords

References

  1. R.W. Picard, Affective computing, (MIT Media Lab Perceptual Computing Section Technical Report No. 321, 1995.)
  2. R. Cowie, 'Describing the emotional states expressed in speech,' ISCA Workshop on Speech and Emotion, Belfast 2000
  3. N. Amir, 'Classifying emotions in speech: A comparison of methods,' Proc. Eurospeech 2001, Aalborg, Denmark, Sep. 2001
  4. M, Pardas, A. Bonafonte, J.L. Landabaso, 'Emotion recognition based on MPEG-4 facial animation parameters,' Proc. ICASSP 2002, Orlando, USA, May 2002
  5. K. R. Scherer, 'Adding the affective dimension: A new look in speech analysis and synthesis,' Proc. ICSLP 96, 1996
  6. S. McGilioway, R. Cowie, E. Douglas-Cowie, 'Approaching automatic recognition of emotion from voice: A rough benchmark,' ISCA Workshop on Speech and Emotion, Belfast 2000
  7. A. Nogueiras, A. Moreno, A. Bonafonte, J.B. Marino, 'Speech emotion recognition using hidden Markov models,' Proc, Eurospeech 2001, Aalborg, Denmark, Sep, 2001
  8. G. Zhou, J,H.L. Hansen, and J,K. Kaiser, 'Nonlinear feature based classification of speech under stress,' IEEE Trans. Speech and Audio Processing, 9 (3), 201-216, Mar. 2001 https://doi.org/10.1109/89.905995
  9. M, Rahurkar, J.H.L. Hansen, J. Meyerhoff, G, Saviolakis, M. Koenig, 'Frequency Distribution Based Weighted Sub-band Approach for Classification of Emotional/Stressful Content in Speech,' Proc. Eurospeech-2003, 721-724, Geneva, Switzerland, Sep, 2003
  10. D. Ververidis, C. Kotropoulos, I. Pitas, 'Automatic emotional speech classification,' Proc. ICASSP 2004, 1-5931-596, 2004
  11. A. Tickle, 'English and Japanese speakers' emotion vocalization and recognition: A comparison highlighting vowel quality,'' ISCA Workshop on Speech and Emotion, Belfast, 2000
  12. T. S. Polzin, A. Waibel, 'Emotion-sensitive human-computer interfaces,' ISCA Workshop on Speech and Emotion, Belfast, 2000
  13. C. M. Lee, S. S. Narayanan, 'Toward detecting emotions in spoken dialogs,' IEEE Trans. Speech and Audio Processing, 13 (2), 293-303, Mar. 2005 https://doi.org/10.1109/TSA.2004.838534
  14. R. Tato, R, Santos, R. Kompe, J.M. Pardo, 'Emotional space improves emotion recognition,' Proc. ICSLP 2002, 2029-2032, Sep, 2002
  15. L. Rabiner and B.-H, Juang, Fundamentals of Speech Recognition, (Prentice-Hall, 1993.)
  16. L. Rabiner and R,W. Schafer, Digital Processing of Speech Signals, (Prentice-Hall, 1978.)
  17. ETSI Standard, Final Draft ETSI ES 202 050 v1,1,1 (2002-07), Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms
  18. V, Vapnik, Statistical Learning Theory, (New York: Wiley, 1998.)
  19. S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.2, (Cambridge University Engineering Department, 2002.)
  20. B.D. Ripley, Pattern Recognition and Neural Networks. (Cambridge, U.K.: Cambridge Univ. Press, 1996.)
  21. C.-W. Hsu and C,-J, Lin, 'A comparison of methods for multi-class support vector machines,' IEEE Transactions on Neural Networks, 13, 415-425, 2002 https://doi.org/10.1109/72.991427
  22. J. Ma, Y. Zhao, and S. Ahalt, OSU SVM Classifier Matlab Toolbox (ver 3.00), http://eewww.eng.ohio-state.edu/-maj/osu_svm/
  23. S, Steidl, M. Levit, A, Batliner, E, Noth, H. Niemann, 'Of all things the measure is man - Automatic classification of emotions and inter-Iabeler consistency,' Proc. ICASSP 2005, PP. 1-3171-320, 2005
  24. A.J. Hayter, Probability and Statistics for Engineers and Scientists, (PWS Publishing Company, 1995.)