DOI QR코드

DOI QR Code

Pose Estimation과 얼굴 감정인식을 활용한 CCTV 영상에서 폭력행위 탐지 정확도 개선 방안 연구

Improving Accuracy of Violence Detection in CCTV Camera Using Pose Estimation and Face Emotion Recognition

  • 노성운 (국민대학교 소프트웨어융합대학원) ;
  • 허대영 (국민대학교 소프트웨어융합대학)
  • Seong Un Noh ;
  • Dae Young Heo
  • 투고 : 2024.06.07
  • 심사 : 2024.08.06
  • 발행 : 2024.08.31

초록

Recently, as social anxiety regarding violent crimes accompanied by frequent occurrences of violence has increased, the need for intelligent video analysis in CCTV systems for crime prevention and rapid response to incidents has grown. One of the methods used for detecting violent behavior through video analysis is action-based detection using pose estimation. However, relying solely on joint angles and changes obtained from pose estimation to detect violent acts can lead to issues. False positives occur when non-violent actions such as petting a head or hugging are mistakenly classified as violent behavior. This study aims to reduce the frequency of false positives in action-based violence detection methods that utilize only pose estimation. We propose a new violence detection method that combines the results of facial emotion recognition (anger, disgust, fear, sadness, surprise, happiness, and neutrality) of the expected victim with the existing pose estimation-based violence detection method. By combining pose estimation with facial emotion recognition results on a video dataset consisting of YouTube videos and self-made videos, we were able to achieve a higher accuracy rate of 92.5% compared to the traditional method which solely relies on pose estimation. Future research will focus on studying violence detection in actual CCTV scenarios to improve the reliability of the result data.

키워드

참고문헌

  1. 서울연구원, "서울시 사물인터넷 활용 방안", 2022, 77-78. 
  2. 행정안전부, "국정감사 보도자료"(전국 시/도별 CCTV 통합관제센터 현황), 2021. 
  3. Cao, Z., T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2D pose estimation using part affinity fields", in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 7291-7299. 
  4. Dlib C++ Library, http://dlib.net(Accessed August, 7. 2024). 
  5. Face Detection pre-trained model, https://github.com/akanametov/yolov8-face(Accessed August, 7, 2024). 
  6. Human pose estimation pre-trained Model, https://docs.ultralytics.com/models/yolov8 (Accessed August, 7. 2024). 
  7. Kumar, P., G. Shih, B. Guo, S.K. Nagi, Y.C. Manie, C. Yao, M.A. Arockiyadoss, and P, Peng, "Enhancing smart city safety and utilizing AI expert systems for violence detection", Future Internet, Vol.16, No.2, 2024, 50. 
  8. Kwan-Loo, K.B. J.C. Ortiz-Bayliss, S.E Conant-Pablos, H. Terashima-Marin, and P. Rad, "Detection of Violent Behavior Using Neural Networks and Pose Estimation", IEEE Access, Vol.10, 2022, 86339-86352. 
  9. Naik, A. and M. Gopalakrishna, "Deep-violence: Individual person violent activity detection in video", Multimedia Tools and Applications, Vol.80, No.12, 2021, 18365-18380. 
  10. ONNX, https://onnx.ai/onnx/intro/ (Accessed August, 7. 2024). 
  11. OpenPose, https://github.com/CMU-Perceptual-Computing-Lab/openpose(Accessed August, 7. 2024). 
  12. Redmon, J., S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 779-788. 
  13. Shindhe, D. S., S. Govindraj and S.N. Omkar, "Real-time Violence Activity Detection Using Deep Neural Networks in a CCTV camera", 2021 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), IEEE, 2021, 1-6. 
  14. Su, Y., G. Lin, J. Zhu, and Q. Wu, "Human interaction learning on 3D skeleton point clouds for video violence recognition", In Computer Vision-ECCV 2020: 16th European Conference, 2020, 74-90.