DOI QR코드

DOI QR Code

Semantic Occlusion Augmentation for Effective Human Pose Estimation

가려진 사람의 자세추정을 위한 의미론적 폐색현상 증강기법

  • 배현재 (성균관대학교 소프트웨어학과) ;
  • 김진평 ((재)차세대융합기술연구원) ;
  • 이지형 (성균관대학교 소프트웨어학과)
  • Received : 2022.08.12
  • Accepted : 2022.10.04
  • Published : 2022.12.31

Abstract

Human pose estimation is a method of estimating a posture by extracting a human joint key point. When occlusion occurs, the joint key point extraction performance is lowered because the human joint is covered. The occlusion phenomenon is largely divided into three types of actions: self-contained, covered by other objects, and covered by background. In this paper, we propose an effective posture estimation method using a masking phenomenon enhancement technique. Although the posture estimation method has been continuously studied, research on the occlusion phenomenon of the posture estimation method is relatively insufficient. To solve this problem, the author proposes a data augmentation technique that intentionally masks human joints. The experimental results in this paper show that the intentional use of the blocking phenomenon enhancement technique is strong against the blocking phenomenon and the performance is increased.

사람의 자세추정(Human pose estimation)은 사람의 관절 키포인트를 추출하여 자세를 추정하는 방법이다. 폐색현상(Occlusion)이 발생하면, 사람의 관절이 가려지므로 관절 키포인트 추출 성능이 낮아진다. 폐색현상은 총 3가지로 행동할 때 스스로 가려짐, 다른 사물에 의해 가려짐과 배경에 의해 가려짐으로 크게 나뉜다. 본 논문에서는 폐색현상 증강기법을 활용하여 효과적인 자세추정방법을 제안한다. 자세추정방법이 지속적으로 연구되어왔지만, 자세추정방법의 가려짐 현상에 관한 연구는 상대적으로 부족한 상태이다. 이를 해결하기 위해 저자는 사람의 관절을 타겟팅하여 의도적으로 가리는 데이터 증강기법을 제안한다. 본 논문에서의 실험 결과는 의도적으로 폐색현상 증강기법을 활용하면 폐색현상에 강인하며 성능이 올라간 것을 보여준다.

Keywords

References

  1. H. J. Bae, G. J. Jang, Y. H. Kim, and J. P. Kim, "LSTM (long short-term memory)-based abnormal behavior recognition using AlphaPose," KIPS Transactions on Software and Data Engineering, Vol.10, No.5, pp.187-194, 2021.
  2. R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri, and D. Tran, "Detect-and-Track: Efficient pose estimation in videos," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
  3. Z. Li, X. Chen, W. Zhou, Y. Zhang, and J. Yu, "Pose2body: Pose-guided human parts segmentation," In 2019 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp.640-645, 2019.
  4. Z. Fang and A. M. Lopez, "Intention recognition of pedestrians and cyclists by 2d pose estimation," arXiv preprint arXiv:1910.03858, 2019.
  5. P. A. Dias, D. Malafronte, H. Medeiros, and F. Odone, "Gaze estimation for assisted living environments," In the IEEE Winter Conference on Applications of Computer Vision, pp.290-299, 2020.
  6. L. Ladicky, P. H. S. Torr, and A. Zisserman, "Human pose estimation using a joint pixel-wise and part-wise formulation," In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.3578-3585, 2013.
  7. Y. Huang, B. Sun, H. Kan, J. Zhuang, and Z. Qin, "Followmeup sports: New benchmark for 2d human keypoint recognition," arXiv preprint arXiv:1911.08344, 2019.
  8. T. Golda, T. Kalb, A. Schumann, and J. Beyerer, "Human pose estimation for real-world crowded scenarios," arXiv preprint arXiv:1907.06922, 2019.
  9. P. S. R. Kishore, S. Das, P. S. Mukherjee, and U. Bhattacharya, "Cluenet : A deep framework for occluded pedestrian pose estimation," 12 2019.
  10. U. Rafi, J. Gall, and B. Leibe, "A semantic occlusion model for human pose estimation from a single depth image," In 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.67-74, 2015.
  11. B. Cheng, B. Xiao, J. Wang, H. Shi, T. S. Huang, and L. Zhang, "Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation," In the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020.
  12. C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," Journal of Big Data, Vol.6, No.1, pp.60, 2019.
  13. L. Taylor and G. Nitschke, "Improving deep learning using generic data augmentation," arXiv preprint arXiv:1708.06020, 2017.
  14. L. Ke, M.-C. Chang, H. Qi, and S. Lyu, "Multiscale structure-aware network for human pose estimation," CoRR, arXiv preprint arXiv:1803.09894, 2018.
  15. K. Sun, B. Xiao, D. Liu, and J. Wang, "Deep HighResolution representation learning for human pose estimation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  16. Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, "Openpose: Realtime multi-person 2d pose estimation using part affinity fields," CoRR, arXiv preprint arXiv:1812. 08008, 2018.
  17. T.-Y. Lin et al., "Microsoft COCO: common objects in context," CoRR, arXiv preprint arXiv:1405.0312, 2014.
  18. M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, "2d human pose estimation: New benchmark and state of the art analysis," In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2014.
  19. B., Xiao, H., Wu, and Y. Wei, "Simple baselines for human pose estimation and tracking," Proceedings of the European Conference on Computer Vision (ECCV). 2018.
  20. N. D. Reddy, M. Vo, and S. G. Narasimhan. "Occlusion-net: 2d/3d occluded keypoint localization using graph networks," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
  21. H. Guo, Y. Mao, and R. Zhang, "Mixup as locally linear out-of-manifold regularization," CoRR, arXiv preprint arXiv:1809.02499, 2018.
  22. Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun. "Cascaded pyramid network for multi-person pose estimation," In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2018.
  23. I. Sarandi, T. Linder, K. O. Arras, and B. Leibe, "How robust is 3D human pose estimation to occlusion?," arXiv preprint arXiv: 1808.09316, 2018.
  24. R. Pytel, O. S. Kayhan, and J. C. van Gemert, "Tilting at windmills: Data augmentation for deep pose estimation does not help with occlusions," 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, 2021.
  25. S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, "Cutmix: Regularization strategy to train strong classifiers with localizable features," Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
  26. W. Li et al., "Rethinking on multi-stage networks for human pose estimation," arXiv preprint arXiv:1901.00148, 2019.