DOI QR코드

DOI QR Code

Speech Emotion Recognition with SVM, KNN and DSVM

  • Hadhami Aouani (National School of Engineers, ENIS University of Sfax) ;
  • Yassine Ben Ayed (Multimedia InfoRmation systems and Advanced Computing Laboratory, MIRACL University of Sfax)
  • Received : 2023.08.05
  • Published : 2023.08.30

Abstract

Speech Emotions recognition has become the active research theme in speech processing and in applications based on human-machine interaction. In this work, our system is a two-stage approach, namely feature extraction and classification engine. Firstly, two sets of feature are investigated which are: the first one is extracting only 13 Mel-frequency Cepstral Coefficient (MFCC) from emotional speech samples and the second one is applying features fusions between the three features: Zero Crossing Rate (ZCR), Teager Energy Operator (TEO), and Harmonic to Noise Rate (HNR) and MFCC features. Secondly, we use two types of classification techniques which are: the Support Vector Machines (SVM) and the k-Nearest Neighbor (k-NN) to show the performance between them. Besides that, we investigate the importance of the recent advances in machine learning including the deep kernel learning. A large set of experiments are conducted on Surrey Audio-Visual Expressed Emotion (SAVEE) dataset for seven emotions. The results of our experiments showed given good accuracy compared with the previous studies.

Keywords

References

  1. Simina Emerich, Eugen Lupu - Improving Speech Emotion Recognition using Frequency and Time Domain Acoustic features, EURSAIP 2011.
  2. Park, J.-S., J.-H. Kim and Y.-H. Oh, Feature vector classification based speech emotion recognition for service robots. IEEE Transactions on Consumer Electronics, 2009. 55(3).
  3. A Dictionary of Physics. 7 ed. 2015: Oxford University Press.
  4. Zhibing, X., Audiovisual Emotion Recognition Using Entropy estimation- based Multimodal Information Fusion. 2015, Ryerson University.
  5. Hinton, G. E., and Salakhutdinov, R. R.Reducing the dimensionality of data with neural networks. Science 313(5786):504-507, 2006 https://doi.org/10.1126/science.1127647
  6. P. Song, S. Ou, W.Zheng, Y. Jin, & L. Zhao: "Speech emotion recognition using transfer non-negative matrix factorization". In Proceedings of IEEE international conference ICASSP, pp. 5180-5184, 2016.
  7. Papakostas, M., et al., Recognizing Emotional States Using Speech Information, in GeNeDis 2016. 2017, Springer. p. 155-164.
  8. E. Ramdinmawii, A.Mohanta, V.K. Mittal: "Emotion Recognition from Speech Signal ", IEEE 10 Conference (TENCON), Malaysia, November 5-8, 2017.
  9. P. Shi: "Speech Emotion Recognition Based on Deep Belief Network", IEEE, 2018.
  10. Siddique Latif, R.R., Shahzad Younis, Junaid Qadir, Julien Epps, Transfer Learning for Improving Speech Emotion Classification Accuracy. ArXiv:1801.06353v3 [cs.CV] 2018.
  11. Aouani H, Ben Ayed Y: "Emotion recognition in speech using MFCC with SVM, DSVM and auto-encoder",IEEE, 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) 2018.
  12. L.X.Hung : Detection des emotions dans des enonces audio multilingues. Institut polytechnique de Grenoble, 2009.
  13. Ferrand, C: Speech science: An integrated approach to theory and clinical practice. Boston, MA: Pearson, 2007.
  14. Noroozi, F., et al., Vocal-based emotion recognition using random forests and decision tree. International Journal of Speech Technology, 20(2): p. 239-246, 2017. https://doi.org/10.1007/s10772-017-9396-2
  15. M. Swerts and E. Krahmer. Gender-related differences in the production and perception of emotion. In Proc. Interspeech, pages {334,337}, 2008.
  16. Eric V. Strobl & Shyam Visweswaran:' Deep Multiple Kernel Learning' ICMLA, 2013.
  17. http:// personal.ee.surrey.ac.uk/Personal/PJackson/SAVEE.
  18. Y. Ben Ayed : Detection de mots cles dans un flux de parole. These de doctorat, Ecole Nationale Superieure des Telecommunications ENST, 2003.
  19. DELLAAERT F., POLZIN T., WAIBEL A., "Recognizing Emotion in Speech ", Proc.of ICSLP,Philadelphie , 1996.
  20. L. Bottou, C. Cortes, J. Drucker, I. Guyon, Y. LeCunn, U. Muller, E. Sackinger, P. Simard et V. Vapnik : "Comparaison of classifier methods : a case study in handwriting digit recognition", dans Proceedings of the International Conference on Pattern Recognition, p.77_87, 1994.
  21. S. Knerr, L. Personnaz et G. Dreyfus : "Single-layer learning revisited: a stepwise procedure for building and training a neural network", Neurocomputing: Algoritms, Architectures and Applications, p.68, 1990.
  22. J. C. Platt, N. Cristianini et J. Shawe-Taylor : "Large margin dags for multiclass classification", dans Advances in Neural Information Processing Systems, MIT Press, 12, p.547_553, 2000.
  23. V. Vapnik : "Statistical learning theory", John Wiley and Sons, 1998.
  24. J. Weston et C. Watkins : "Support vector machines for multiclass pattern recognition", In Proceedings of the Seventh European Symposium On Artificial Neural Networks, 1999.
  25. A. Amina, A. Mouhamed, and C. Morad. Identification des personnes par systeme multimodale.
  26. Sucksmith, E., Allison, C., Baron-Cohen, S., Chakrabarti, B., & Hoekstra, R. A. Empathy and emotion recognition in people with autism, first-degree relatives, and controls. Neuropsychologia, 51(1), 98-105,2013 https://doi.org/10.1016/j.neuropsychologia.2012.11.013