DOI QR코드

DOI QR Code

A Method for Body Keypoint Localization based on Object Detection using the RGB-D information

RGB-D 정보를 이용한 객체 탐지 기반의 신체 키포인트 검출 방법

  • Park, Seohee (Department of Computer Science, Kyonggi University) ;
  • Chun, Junchul (Department of Computer Science, Kyonggi University)
  • Received : 2017.09.25
  • Accepted : 2017.10.24
  • Published : 2017.12.31

Abstract

Recently, in the field of video surveillance, a Deep Learning based learning method has been applied to a method of detecting a moving person in a video and analyzing the behavior of a detected person. The human activity recognition, which is one of the fields this intelligent image analysis technology, detects the object and goes through the process of detecting the body keypoint to recognize the behavior of the detected object. In this paper, we propose a method for Body Keypoint Localization based on Object Detection using RGB-D information. First, the moving object is segmented and detected from the background using color information and depth information generated by the two cameras. The input image generated by rescaling the detected object region using RGB-D information is applied to Convolutional Pose Machines for one person's pose estimation. CPM are used to generate Belief Maps for 14 body parts per person and to detect body keypoints based on Belief Maps. This method provides an accurate region for objects to detect keypoints an can be extended from single Body Keypoint Localization to multiple Body Keypoint Localization through the integration of individual Body Keypoint Localization. In the future, it is possible to generate a model for human pose estimation using the detected keypoints and contribute to the field of human activity recognition.

최근 영상감시 분야에서는 영상에서 움직이는 사람을 탐지하고, 탐지된 사람의 행위를 분석하는 방식에 딥러닝 기반 학습방법이 적용되기 시작했다. 이러한 지능형 영상분석 기술을 적용할 수 있는 분야 중 하나인 인간 행위 인식은 객체를 탐지하고 탐지된 객체의 행위를 인식하기 위해 신체 키포인트를 검출 하는 과정을 거치게 된다. 본 논문에서는 RGB-D 정보를 이용한 객체 탐지 기반의 신체 키포인트 검출 방법을 제시한다. 먼저, 두 대의 카메라로 생성된 색상정보와 깊이정보를 이용하여 이동하는 객체를 배경으로부터 분할하여 탐지한다. RGB-D 정보를 이용하여 탐지된 객체의 영역을 재조정하여 생성된 입력 데이터를 한 사람의 자세 추정을 위한 Convolutional Pose Machines(CPM)에 적용한다. CPM을 이용하여 한 사람당 14개의 신체부위에 대한 신념 지도(Belief Map)를 생성하고, 신념 지도를 기반으로 신체 키포인트를 검출한다. 이와 같은 방법은 키포인트를 검출할 객체에 대한 정확한 영역을 제공하게 되며, 개별적인 신체 키포인트의 검출을 통하여 단일 신체 키포인트 검출에서 다중 신체 키포인트 검출로 확장 할 수 있다. 향후, 검출된 키포인트를 이용하여 인간 자세 추정을 위한 모델을 생성할 수 있으며 인간 행위 인식 분야에 기여 할 수 있다.

Keywords

References

  1. Grant, Jason M., and Patrick J. Flynn., "Crowd Scene Understanding from Video: A Survey," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol 13, No. 2, pp. 19, 2017. https://doi.org/10.1145/3052930
  2. Zhang, Shugang, et al., "Vision-Based Human Activity Recognition: A Review," Journal of Healthcare Engineering, Vol 2017, pp. 1-31, 2017. https://doi.org/10.1155/2017/3090343
  3. Vrigkas, Michalis, Christophoros Nikou, and Ioannis A. Kakadiaris, "A review of human activity recognition methods," Frontiers in Robotics and AI, Vol 2, article 28, 2015. https://doi.org/10.3389/frobt.2015.00028
  4. Paul, Manoranjan, Shah ME Haque, and Subrata Chakraborty., "Human detection in surveillance videos and its applications-a review," EURASIP Journal on Advances in Signal Processing, Vol 176, No. 1, pp.1-16, 2013. https://doi.org/10.1186/1687-6180-2013-176
  5. Toshev, Alexander, and Christian Szegedy., "Deeppose: Human pose estimation via deep neural networks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653-1660, 2014. https://doi.org/10.1109/cvpr.2014.214
  6. Pishchulin, Leonid, et al., "Deepcut: Joint subset partition and labeling for multi person pose estimation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929-4937. 2016. https://doi.org/10.1109/cvpr.2016.533
  7. OpenPose: A Real-Time Multi-Person Keypoint Detection And Multi-Threading C++ Library, 2017. https://github.com/CMU-Perceptual-Computing-Lab/openpose
  8. Cao, Zhe, et al., "Realtime multi-person 2d pose estimation using part affinity fields," arXiv preprint arXiv:1611.08050, 2016. https://arxiv.org/abs/1611.08050
  9. Simon, Tomas, et al., "Hand Keypoint Detection in Single Images using Multiview Bootstrapping," arXiv preprint arXiv:1704.07809, 2017. https://arxiv.org/abs/1704.07809
  10. Wei, Shih-En, et al., "Convolutional pose machines," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724-4732, 2016. https://doi.org/10.1109/cvpr.2016.511
  11. Seohee Park, Junchul Chun., "A Robust Object Detection and Tracking Method using RGB-D Model", Journal of Internet Computing and Services (JICS), Vol 18, No. 4, pp. 61-67, 2017. http://dx.doi.org/10.7472/jksii.2017.18.4.61
  12. Papandreou, George, et al. "Towards Accurate Multi-person Pose Estimation in the Wild." arXiv preprint arXiv:1701.01779, 2017. https://arxiv.org/abs/1701.01779
  13. Linna, Marko, Juho Kannala, and Esa Rahtu., "Real-time human pose estimation from video with convolutional neural networks," arXiv preprint arXiv:1609.07420, 2016. https://arxiv.org/abs/1609.07420
  14. Ramakrishna, Varun, et al., "Pose machines: Articulated pose estimation via inference machines," European Conference on Computer Vision, pp. 33-47, 2014. https://doi.org/10.1007/978-3-319-10605-2_3
  15. Andriluka, Mykhaylo, et al., "2d human pose estimation: New benchmark and state of the art analysis," Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp. 3686-3693, 2014. https://doi.org/10.1109/cvpr.2014.471
  16. Bulat, Adrian, and Georgios Tzimiropoulos., "Human pose estimation via convolutional part heatmap regression," European Conference on Computer Vision, pp. 717-732, 2016. https://doi.org/10.1007/978-3-319-46478-7_44
  17. Belagiannis, Vasileios, and Andrew Zisserman., "Recurrent human pose estimation," Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on. IEEE, pp. 468-475, 2017. https://doi.org/10.1109/fg.2017.64
  18. Google, "MNIST For ML Beginners,". https://www.tensorflow.org
  19. Mehta, Dushyant, et al., "VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera," arXiv preprint arXiv:1705.01583, 2017. https://arxiv.org/abs/1705.01583
  20. Ramakrishna, Varun, Takeo Kanade, and Yaser Sheikh., "Reconstructing 3d human pose from 2d image landmarks," Computer Vision-ECCV 2012, pp. 573-586, 2012. https://doi.org/10.1007/978-3-642-33765-9_41