DOI QR코드

DOI QR Code

A Method for 3D Human Pose Estimation based on 2D Keypoint Detection using RGB-D information

RGB-D 정보를 이용한 2차원 키포인트 탐지 기반 3차원 인간 자세 추정 방법

  • Park, Seohee (Human Care System Research Center, Korea Electronics Technology Institute(KETI)) ;
  • Ji, Myunggeun (Department of Computer Science, Kyonggi University) ;
  • Chun, Junchul (Department of Computer Science, Kyonggi University)
  • Received : 2018.08.03
  • Accepted : 2018.10.10
  • Published : 2018.12.31

Abstract

Recently, in the field of video surveillance, deep learning based learning method is applied to intelligent video surveillance system, and various events such as crime, fire, and abnormal phenomenon can be robustly detected. However, since occlusion occurs due to the loss of 3d information generated by projecting the 3d real-world in 2d image, it is need to consider the occlusion problem in order to accurately detect the object and to estimate the pose. Therefore, in this paper, we detect moving objects by solving the occlusion problem of object detection process by adding depth information to existing RGB information. Then, using the convolution neural network in the detected region, the positions of the 14 keypoints of the human joint region can be predicted. Finally, in order to solve the self-occlusion problem occurring in the pose estimation process, the method for 3d human pose estimation is described by extending the range of estimation to the 3d space using the predicted result of 2d keypoint and the deep neural network. In the future, the result of 2d and 3d pose estimation of this research can be used as easy data for future human behavior recognition and contribute to the development of industrial technology.

최근 영상 감시 분야에서는 지능형 영상 감시 시스템에 딥 러닝 기반 학습 방법이 적용되어 범죄, 화재, 이상 현상과 같은 다양한 이벤트들을 강건하게 탐지 할 수 있게 되었다. 그러나 3차원 실세계를 2차원 영상으로 투영시키면서 발생하는 3차원 정보의 손실로 인하여 폐색 문제가 발생하기 때문에 올바르게 객체를 탐지하고, 자세를 추정하기 위해서는 폐색 문제를 고려하는 것이 필요하다. 따라서 본 연구에서는 기존 RGB 정보에 깊이 정보를 추가하여 객체 탐지 과정에서 나타나는 폐색 문제를 해결하여 움직이는 객체를 탐지하고, 탐지된 영역에서 컨볼루션 신경망을 이용하여 인간의 관절 부위인 14개의 키포인트의 위치를 예측한다. 그 다음 자세 추정 과정에서 발생하는 자가 폐색 문제를 해결하기 위하여 2차원 키포인트 예측 결과와 심층 신경망을 이용하여 자세 추정의 범위를 3차원 공간상으로 확장함으로써 3차원 인간 자세 추정 방법을 설명한다. 향후, 본 연구의 2차원 및 3차원 자세 추정 결과는 인간 행위 인식을 위한 용이한 데이터로 사용되어 산업 기술 발달에 기여 할 수 있다.

Keywords

OTJBCD_2018_v19n6_41_f0001.png 이미지

(그림 1) 폐색 문제[1, 2] (Figure 1) The problem of occlusion

OTJBCD_2018_v19n6_41_f0002.png 이미지

(그림 2) RGB-D 정보를 이용한 2차원 키포인트 탐지 기반 3차원 인간 자세 추정의 개요 (Figure 2) The overview of 3D human Pose Estimation based on 2D Keypoint Detection using RGB-D information

OTJBCD_2018_v19n6_41_f0003.png 이미지

(그림 3) RGB-D 정보 기반 객체 탐지 (Figure 3) Object Detection based on RGB-D information

OTJBCD_2018_v19n6_41_f0004.png 이미지

(그림 4) 2차원 키포인트 탐지를 위한 컨볼루션 신경망 구조[16] (Figure 4) The structure of Convolutional Neural Network for 2D Keypoint Detection

OTJBCD_2018_v19n6_41_f0005.png 이미지

(그림 5) 스켈레톤 모델 (Figure 5) Skeleton Model

OTJBCD_2018_v19n6_41_f0006.png 이미지

(그림 6) 3차원 인간 자세 추정을 위한 심층 신경망 구조[14] (Figure 6) The structure of Deep Neural Network for 3D Human Pose Estimation

OTJBCD_2018_v19n6_41_f0007.png 이미지

(그림 7) 신뢰 분포도 (Figure 7) Distribution plot of belief

OTJBCD_2018_v19n6_41_f0008.png 이미지

(그림 8) 3차원 인간 자세 추정 (Figure 8) 3D Human Pose Estimation

OTJBCD_2018_v19n6_41_f0009.png 이미지

(그림 9) 객체 탐지 결과 비교 (Figure 9) Comparison of Results of Object Detection

OTJBCD_2018_v19n6_41_f0010.png 이미지

(그림 10) Human3.6M 데이터 세트를 이용한 3차원 인간 자세 추정 결과 (Figure 10) The result of 3D Human Pose Estimation using Human3.6M dataset

(표 1) 실험 환경 (Table 1) Experimental Environments

OTJBCD_2018_v19n6_41_t0001.png 이미지

(표 2) Human3.6M 데이터 세트[6]를 이용한 3차원 인간 자세 추정 결과 비교 (관절 위치 오류 당 평균) (Table 2) Comparison of results of 3D Human Pose Estimation using Human3.6M (MPJPE)

OTJBCD_2018_v19n6_41_t0002.png 이미지

References

  1. Seohee Park, Myunggeun Ji, and Junchul Chun, "2D Human Pose Estimation based on Object Detection using RGB-D information", KSII Transactions on Internet & Information Systems, Vol. 12, No. 2, pp. 800-816, 2018. https://doi.org/10.3837/tiis.2018.02.015
  2. Ramakrishna, Varun, Takeo Kanade, and Yaser Sheikh, "Reconstructing 3d human pose from 2d image landmarks", European conference on computer vision. Springer, Berlin, Heidelberg, pp. 573-586, 2012. https://doi.org/10.1007/978-3-642-33765-9_41
  3. Parekh, Himani S., Darshak G. Thakore, and Udesang K. Jaliya, "A survey on object detection and tracking methods", International Journal of Innovative Research in Computer and Communication Engineering, Vol. 2, No. 2, pp. 2970-2978, 2014. http://www.ijircce.com/upload/2014/february/7J_A%20S urvey.pdf
  4. Zivkovic, Zoran, "Improved adaptive Gaussian mixture model for background subtraction", Pattern Recognition, 2004. https://doi.org/10.1109/icpr.2004.1333992
  5. Hirschmuller, Heiko, "Stereo processing by semiglobal matching and mutual information", IEEE Transactions on pattern analysis and machine intelligence, Vol. 30, No. 2, pp. 328-341, 2008. https://doi.org/10.1109/tpami.2007.1166
  6. Ionescu, Catalin, et al, "Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments", IEEE transactions on pattern analysis and machine intelligence, Vol. 36, No. 7, pp. 1325-1339, 2014. https://doi.org/10.1109/tpami.2013.248
  7. Tekin, Bugra, et al, "Direct prediction of 3d body poses from motion compensated sequences", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. https://doi.org/10.1109/cvpr.2016.113
  8. Chen, Ching-Hang, and Deva Ramanan, "3d human pose estimation = 2d pose estimation + matching", CVPR, Vol. 2, No. 5, 2017. https://doi.org/10.1109/cvpr.2017.610
  9. Zhou, Xiaowei, et al, "Sparseness meets deepness: 3D human pose estimation from monocular video", Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. https://doi.org/10.1109/cvpr.2016.537
  10. Du, Yu, et al, "Marker-less 3d human motion capture with monocular image sequence and height-maps", European Conference on Computer Vision. Springer, Cham, 2016. https://doi.org/10.1007/978-3-319-46493-0_2
  11. Park, Sungheon, Jihye Hwang, and Nojun Kwak, "3D human pose estimation using convolutional neural networks with 2D pose information", European Conference on Computer Vision. Springer, Cham, 2016. https://arxiv.org/abs/1608.03075
  12. Zhou, et al, "Deep kinematic pose regression", European Conference on Computer Vision. Springer, Cham, 2016. https://arxiv.org/abs/1609.05317
  13. Tome, Denis, Christopher Russell, and Lourdes Agapito, "Lifting from the deep: Convolutional 3d pose estimation from a single image", CVPR 2017 Proceedings, pp. 2500-2509, 2017. https://doi.org/10.1109/cvpr.2017.603
  14. Martinez, et al, "A simple yet effective baseline for 3d human pose estimation", International Conference on Computer Vision, Vol. 1, No. 2. 2017. https://doi.org/10.1109/iccv.2017.288
  15. OpenPose: A Real-Time Multi-Person Keypoint Detection and Multi-Threading C++ Library, 2017.
  16. Wei, Shih-En, et al, "Convolutional pose machines", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. https://doi.org/10.1109/cvpr.2016.511
  17. Ramakrishna, Varun, et al, "Pose machines: Articulated pose estimation via inference machines", European Conference on Computer Vision. Springer, Cham, 2014. https://doi.org/10.1007/978-3-319-10605-2_3
  18. Newell, Alejandro, Kaiyu Yang, and Jia Deng, "Stacked hourglass networks for human pose estimation", European Conference on Computer Vision. Springer, Cham, 2016. https://doi.org/10.1007/978-3-319-46484-8_29
  19. Sigal, Leonid, et al, "Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion", International journal of computer vision, 2010. https://doi.org/10.1007/s11263-009-0273-6

Cited by

  1. Empirical Comparison of Deep Learning Networks on Backbone Method of Human Pose Estimation vol.21, pp.5, 2018, https://doi.org/10.7472/jksii.2020.21.5.21