DOI QR코드

DOI QR Code

A study on the effectiveness of intermediate features in deep learning on facial expression recognition

  • KyeongTeak Oh (Department of Biomedical Engineering, Yonsei University College of Medicine) ;
  • Sun K. Yoo (Department of Biomedical Engineering, Yonsei University College of Medicine)
  • Received : 2023.03.10
  • Accepted : 2023.03.17
  • Published : 2023.06.30

Abstract

The purpose of this study is to evaluate the impact of intermediate features on FER performance. To achieve this objective, intermediate features were extracted from the input images at specific layers (FM1~FM4) of the pre-trained network (Resnet-18). These extracted intermediate features and original images were used as inputs to the vision transformer (ViT), and the FER performance was compared. As a result, when using a single image as input, using intermediate features extracted from FM2 yielded the best performance (training accuracy: 94.35%, testing accuracy: 75.51%). When using the original image as input, the training accuracy was 91.32% and the testing accuracy was 74.68%. However, when combining the original image with intermediate features as input, the best FER performance was achieved by combining the original image with FM2, FM3, and FM4 (training accuracy: 97.88%, testing accuracy: 79.21%). These results imply that incorporating intermediate features alongside the original image can lead to superior performance. The findings can be referenced and utilized when designing the preprocessing stages of a deep learning model in FER. By considering the effectiveness of using intermediate features, practitioners can make informed decisions to enhance the performance of FER systems.

Keywords

Acknowledgement

This work was supported by the Industrial Technology Innovation Program (No. 20012603, Development of Emotional Cognitive and Sympathetic AI Service Technology for Remote (Non-face-to-face) Learning and Industrial Sites) funded By the Ministry of Trade, Industry and Energy (MOTIE, Korea).

References

  1. Dalal, N., & Triggs, B., "Histograms of oriented gradients for human detection." In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), vol. 1, pp. 886-893, Jun 2005. DOI: https://doi.org/ 10.1109/CVPR.2005.177
  2. Shan, C., Gong, S., & McOwan, P. W., "Robust facial expression recognition using local binary patterns." In IEEE International Conference on Image Processing 2005, vol. 2, pp. II-370, Sep 2005. DOI: https://doi.org/ 10.1109/ICIP.2005.1530069
  3. Lee, S. H., Plataniotis, K. N., & Ro, Y. M., "Intra-class variation reduction using training expression images for sparse representation based facial expression recognition." IEEE Transactions on Affective Computing, Vol. 5, Issue 3, pp.340-351, Aug 2014. DOI: https://doi.org/ 10.1109/TAFFC.2014.2346515
  4. Li, Y., Zeng, J., Shan, S., & Chen, X., "Occlusion aware facial expression recognition using CNN with attention mechanism." IEEE Transactions on Image Processing, Vol. 28, No. 5, pp.2439-2450, Dec 2018 DOI: https://doi.org/ 10.1109/TIP.2018.2886767
  5. Saeed, S., Shah, A. A., Ehsan, M. K., Amirzada, M. R., Mahmood, A., & Mezgebo, T., "Automated facial expression recognition framework using deep learning." Journal of Healthcare Engineering, Vol. 2022, Mar 2022. DOI: https://doi.org/10.1155/2022/5707930
  6. Zhi, R., Zhou, C., Li, T., Liu, S., & Jin, Y., "Action unit analysis enhanced facial expression recognition by deep neural network evolution." Neurocomputing, Vol. 425, pp.135-148, Feb 2021. DOI: https://doi.org/10.1016/j.neucom.2020.03.036
  7. Liang, D., Liang, H., Yu, Z., & Zhang, Y., "Deep convolutional BiLSTM fusion network for facial expression recognition." The Visual Computer, Vol. 36, pp.499-508, Feb 2020. DOI: https://doi.org/10.1007/s00371-019-01636-3
  8. Sun, N., Li, Q., Huan, R., Liu, J., & Han, G., "Deep spatial-temporal feature fusion for facial expression recognition in static images." Pattern Recognition Letters, Vol. 119, pp.49-61, Mar 2019. DOI: https://doi.org/10.1016/j.patrec.2017.10.022
  9. Valstar, M., & Pantic, M., "Induced disgust, happiness and surprise: an addition to the mmi facial expression database." In Proc. 3rd Intern. Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, p. 65, May
  10. Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I., "The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression." In 2010 ieee computer society conference on computer vision and pattern recognition-workshops, pp. 94-101, Jun 2010. DOI: https://doi.org/10.1109/CVPRW.2010.5543262
  11. Barsoum, E., Zhang, C., Ferrer, C. C., & Zhang, Z., "Training deep networks for facial expression recognition with crowd-sourced label distribution." In Proceedings of the 18th ACM international conference on multimodal interaction, pp. 279-283, Oct 2016. DOI: https://doi.org/10.48550/arXiv.1608.01041
  12. Li, S., Deng, W., & Du, J., " Reliable Crowdsourcing and Deep Locality-Preserving Learning for Unconstrained Facial Expression Recognition." IEEE Transactions on Image Processing, Vol. 28, Issue 1, pp. 356-370, Jan 2019. DOI: https://doi.org/ 10.1109/TIP.2018.2868382
  13. Mollahosseini, A., Hasani, B., & Mahoor, M. H., "Affectnet: A database for facial expression, valence, and arousal computing in the wild.", IEEE Transactions on Affective Computing, Vol. 10, No. 1, pp.18-31, Aug 2017. DOI: https://doi.org/ 10.1109/TAFFC.2017.2740923
  14. Georgescu, M. I., Ionescu, R. T., & Popescu, M., "Local learning with deep and handcrafted features for facial expression recognition.", IEEE Access, Vol. 7, pp.64827-64836, May 2019. DOI: https://doi.org/ 10.1109/ACCESS.2019.2917266
  15. Ruan, D., Yan, Y., Lai, S., Chai, Z., Shen, C., & Wang, H., "Feature decomposition and reconstruction learning for effective facial expression recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.7660-7669, Apr 2021. DOI: https://doi.org/10.48550/arXiv.2104.05160
  16. Liu, Y., Feng, C., Yuan, X., Zhou, L., Wang, W., Qin, J., & Luo, Z., "Clip-aware expressive feature learning for video-based facial expression recognition." Information Sciences, Vol. 598, pp.182-195, Jun 2022. DOI: https://doi.org/10.1016/j.ins.2022.03.062
  17. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I., "Attention is all you need." Advances in neural information processing systems, Jun 2017. DOI: https://doi.org/10.48550/arXiv.1706.03762 
  18. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J., "An image is worth 16x16 words: Transformers for image recognition at scale. ", arXiv, 2010. DOI: https://doi.org/10.48550/arXiv.2010.11929
  19. Ma, F., Sun, B. and Li, S., "Facial expression recognition with visual transformers and attentional selective fusion." IEEE Transactions on Affective Computing, pp. 1-1, Oct 2021. DOI: https://doi.org/10.1109/TAFFC.2021.3122146
  20. Liu, C., Hirota, K., & Dai, Y., "Patch attention convolutional vision transformer for facial expression recognition with occlusion.", Information Sciences, Vol. 619, pp. 781-794, Jan 2023. DOI: https://doi.org/10.1016/j.ins.2022.11.068
  21. Pong, K. H., & Lam, K. M., "Multi-resolution feature fusion for face recognition." Pattern Recognition, Vol. 47, No. 2, pp.556-567, Feb 2014. DOI: https://doi.org/10.1016/j.patcog.2013.08.023
  22. Lin, T. Y., Dollar, P., Girshick, R., He, K., Hariharan, B., & Belongie, S., "Feature pyramid networks for object detection." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125, Jul 2017. DOI: https://doi.org/10.1109/CVPR.2017.106
  23. He, K., Zhang, X., Ren, S., & Sun, J., "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778. 2016. DOI: https://doi.org/10.48550/arXiv.1512.03385
  24. Kingma, D. P., & Ba, J., "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980, Dec 2014. DOI: https://doi.org/10.48550/arXiv.1412.6980
  25. Cubuk, E. D., Zoph, B., Shlens, J., & Le, Q. V., "Randaugment: Practical automated data augmentation with a reduced search space." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp.702-703. 2020. DOI: https://doi.org/10.48550/arXiv.1909.13719