영상과 비디오로부터의 3차원 휴먼 자세 및 형상 복원 기술

  • Published : 2021.07.30

Abstract

미래의 메타버스 환경에서 3차원 가상 휴먼 표현은 매우 중요한 기술이며 영상 또는 비디오로부터 3차원 가상 휴먼 모델링이 핵심 기술이다. 본 기고문은 이 분야에 대한 충분한 사전 지식의 제공을 목표로 한다. 휴먼 복원 문제를 다루는 연구가 늘어남에 따라, 본 기고문에서 우리는 단일 영상 혹은 비디오로부터의 3차원 휴먼 복원 연구들에 대해 조사하고 그 결과를 다음과 같이 체계적으로 제시한다. 첫째, 3차원 휴먼 복원에 대한 배경 개념을 정의한다. 둘째, 제안된 분류법, 기여도, 정량적 결과에 따라 기존의 방법들을 상세하게 분석한다. 셋째, 관련 데이터셋 및 정성적 결과를 요약하여 연구자들이 이를 쉽게 활용할 수 있도록 한다. 마지막으로, 우리는 각 연구들을 분석하여 해당 방법들의 장점과 약점을 제시한다.

Keywords

References

  1. Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik, End-to-End Recovery of Human Shape and Pose, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 7122-7131.
  2. Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, and Jitendra Malik, Learning 3D Human Dynamics From Video, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 5614-5623.
  3. Benjamin Biggs, David Novotny, Sebastien Ehrhardt, Hanbyul Joo, Benjamin Graham, and Andrea Vedaldi, 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data, Proc. Neural Information Processing Systems (2020).
  4. Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu, Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments, IEEE Trans. on Pattern Analysis and Machine Intelligence (2014), 36(7):1325-1339. https://doi.org/10.1109/TPAMI.2013.248
  5. Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt, Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision, Proc. International Conference on 3D Vision (2017), pp. 506-516.
  6. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter V. Gehler, Javier Romero, and Michael J. Black, Keep It SMPL: Automatic Estimation of {3D} Human Pose and Shape from a Single Image, Proc. European Conference on Computer Vision (2016), pp. 561-578.
  7. Georgios Georgakis, Ren Li, Srikrishna Karanam, Terrence Chen, Jana Kosecka, and Ziyan Wu, Hierarchical Kinematic Human Mesh Recovery, Proc. European Conference on Computer Vision (2020), pp. 768-784.
  8. Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis, Learning to Estimate {3D} Human Pose and Shape From a Single Color Image, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 459-468.
  9. Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black, Expressive Body Capture: 3D Hands, Face, and Body From a Single Image, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 10975-10985.
  10. Gyeongsik Moon and Kyoung Mu Lee, I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image, Proc. European Conference on Computer Vision (2020), pp. 752-768.
  11. Hanbyul Joo, Tomas Simon, and Yaser Sheikh, Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 8320-8329.
  12. Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee, Beyond Static Features for Temporally Consistent {3D} Human Pose and Shape from a Video, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021).
  13. Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William T. Freeman, Rahul Sukthankar, and Cristian Sminchisescu, GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 6183-6192.
  14. Jiefeng Li, Chao Xu, Zhicun Chen, Siyuan Bian, Lixin Yang, and Cewu Lu, HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for Human Pose and Shape Estimation, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021).
  15. Kevin Lin, Lijuan Wang, and Zicheng Liu, End-to-End Human Pose and Mesh Reconstruction with Transformers, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021).
  16. Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter V. Gehler, and Bernt Schiele, DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation, Proc. IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 4929-4937.
  17. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black, SMPL: A skinned multi-person linear model, ACM Trans. on Graphics (2015), 34(6):248:1-248:16.
  18. Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black, VIBE: Video Inference for Human Body Pose and Shape Estimation, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 5252-5262.
  19. Mykhaylo Andriluka, Leonid Pishchulin, Peter V. Gehler, and Bernt Schiele, 2D Human Pose Estimation: New Benchmark and State of the Art Analysis, Proc. IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 3686-3693.
  20. Mykhaylo Andriluka, Umar Iqbal, Eldar Insafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, and Bernt Schiele, PoseTrack: A Benchmark for Human Pose Estimation and Tracking, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5167-5176.
  21. Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis, Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop, Proc. IEEE/CVF International Conference on Computer Vision (2019), pp. 2252-2261.
  22. Nikos Kolotouros, Georgios Pavlakos, and Kostas Daniilidis, Convolutional Mesh Regression for Single-Image Human Shape Reconstruction, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 4501-4510.
  23. Sam Johnson and Mark Everingham, Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation, Proc. British Machine Vision Conference (2010), pp. 1-11.
  24. Sam Johnson and Mark Everingham, Learning effective human pose estimation from inaccurate annotation, Proc. IEEE Conference on Computer Vision and Pattern Recognition (2011), pp. 1465-1472.
  25. Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo, Whole-Body Human Pose Estimation in the Wild, Proc. European Conference on Computer Vision (2020), pp. 196-214.
  26. Tianshu Zhang, Buzhen Huang, and Yangang Wang, Object-Occluded Human Shape and Pose Estimation from a SingleColor Image, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 7374-7383.
  27. Timo von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, and Gerard Pons-Moll, Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera, Proc. European Conference on Computer Vision (2018), pp. 614-631.
  28. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick, Microsoft COCO: Common Objects in Context, Proc. European Conference on Computer Vision (2014), pp. 740-755.
  29. Weiyu Zhang, Menglong Zhu, and Konstantinos G. Derpanis, From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding, Proc. IEEE International Conference on Computer Vision (2013), pp. 2248-2255.
  30. Xiangyu Xu, Hao Chen, Francesc Moreno-Noguer, Laszlo A. Jeni, and Fernando De la Torre, 3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning, Proc. European Conference on Computer Vision (2020), pp. 284-300.
  31. Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh, OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields, IEEE Trans. on Pattern Analysis and Machine Intelligence (2019), 43(1):172-186.
  32. Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black, AMASS: Archive of Motion Capture As Surface Shapes, Proc. IEEE/CVF International Conference on Computer Vision (2019), pp. 5441-5450.
  33. Ian Goodfellow, Jean Pouget-Abadi, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengi, Generative Adversarial Nets, Proc. Neural Information Processing Systems (2014), pp. 2672-2680.
  34. https://www.instagram.com