DOI QR코드

DOI QR Code

Multimodal Parametric Fusion for Emotion Recognition

  • Kim, Jonghwa (Dept. Intelligent System Engineering, Cheju Halla University)
  • Received : 2020.02.27
  • Accepted : 2020.03.10
  • Published : 2020.03.31

Abstract

The main objective of this study is to investigate the impact of additional modalities on the performance of emotion recognition using speech, facial expression and physiological measurements. In order to compare different approaches, we designed a feature-based recognition system as a benchmark which carries out linear supervised classification followed by the leave-one-out cross-validation. For the classification of four emotions, it turned out that bimodal fusion in our experiment improves recognition accuracy of unimodal approach, while the performance of trimodal fusion varies strongly depending on the individual. Furthermore, we experienced extremely high disparity between single class recognition rates, while we could not observe a best performing single modality in our experiment. Based on these observations, we developed a novel fusion method, called parametric decision fusion (PDF), which lies in building emotion-specific classifiers and exploits advantage of a parametrized decision process. By using the PDF scheme we achieved 16% improvement in accuracy of subject-dependent recognition and 10% for subject-independent recognition compared to the best unimodal results.

Keywords

References

  1. Cowie, R.; Douglas-Cowie, E.; Tsapatsoulis, N.; Votsis, G.; Kollias, S.; Fellenz, W.; Taylor, J.G. Emotion recognition in human-computer interaction. IEEE Signal Processing Mag., 18, pp. 32-80, DOI: 10.1109/79.911197, 2001
  2. Wu, C.H.; Lin, J.C.; Wei, W.L. Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Transactions on Signal and Information Processing, 3, pp. 1-18, 2014
  3. Jang, E.; Park, B.; Kim, S.; Sohn, J. Emotion classification by Machine Learning Algorithm using Physiological Signals. in Proc. of Computer Science and Information Technology, Singapore, pp. 1-5, 2012
  4. Nicolaou, M.; Gunes, H.; Pantic, M. Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space. IEEE Trans. on Affective Computing, 2, 92-105, DOI: 10.1109/T-AFFC.2011.9, 2011
  5. Zhang, S.; Zhang, S.; Huang, T.; Gao, W.; Tian, Q. Learning Affective Features with a Hybrid Deep Model for Audio-Visual Emotion Recognition. IEEE Trans. on Circuits and Systems for Video Technology, pp. 3030 - 3043, DOI: 10.1109/TCSVT.2017.2719043, 2017
  6. Busso, C.; Deng, Z.; Yildirim, S.; Bulut, M.; Lee, C.H.; Kazemzaden, A.; Lee, S.; Neumann, U.; Narayanan, S. Analysis of emotion recognition using facial expression, speech and multimodal information. ICMI'04, pp. 205-211, DOI: 10.1145/1027933.1027968, 2004
  7. Kim, J. Bimodal Emotion Recognition using Speech and Physiological Changes. In Robust Speech Recognition and Understanding; I-Tech Education and Publishing, chapter 15, pp. 265-280, DOI: 10.5772/4754, 2007
  8. Povolny, F.; Matejka, P.; Hradis, M.; Popkova, A.; Otrusina, L.; Smrz, P.; Wood, I.; Robin, C.; Lamel, L. Multimodal emotion recognition for AVEC 2016 challenge. in Proc. of the 6th International Workshop on Audio/Visual Emotion Challenge, ACM, pp. 75-82, DOI: 101145/2988257.2988268, 2016
  9. Kim, J.; Andre, E. Emotion recognition based on physiological changes in music listening. IEEE Trans. Pattern Anal. and Machine Intell., 30, pp. 2067-2083, DOI: 10.1109/TPAMI.2008.26, 2008
  10. Kim, J.; Jung, F. Emotional Facial Expression Recognition from Two Different Feature Domains. In Proc. of ICAART 2010, Intl. Conf. on Agents and Artificial Intelligence, pp. 631-634, 2010
  11. Kim, J.; Lingenfelser, F. Ensemble Approaches to Parametric Decision Fusion for Bimodal Emotion Recognition. Proc. of Biosignals 2010, Int. Conf. on Bio-inspired Systems and Signal Processing, pp. 460-463, DOI: 10.5220/0002753204600463, 2010
  12. Cohen, J. A Power Primer. Psychological 'Bulletin, 112, pp. 155-159, DOI: 10.1037//0033-2909.112.1.155, 1992