DOI QR코드

DOI QR Code

증강현실 캐릭터 구현을 위한 AI기반 객체인식 연구

AI-Based Object Recognition Research for Augmented Reality Character Implementation

  • 투고 : 2023.10.27
  • 심사 : 2023.12.27
  • 발행 : 2023.12.31

초록

본 연구는 증강현실에서 적용할 캐릭터 생성에서 단일 이미지를 통해 여러 객체에 대한 3D 자세 추정 문제를 연구한다. 기존 top-down 방식에서는 이미지 내의 모든 객체를 먼저 감지하고, 그 후에 각각의 객체를 독립적으로 재구성한다. 문제는 이렇게 재구성된 객체들 사이의 중첩이나 깊이 순서가 불일치 하는 일관성 없는 결과가 발생할 수 있다. 본 연구의 목적은 이러한 문제점을 해결하고, 장면 내의 모든 객체에 대한 일관된 3D 재구성을 제공하는 단일 네트워크를 개발하는 것이다. SMPL 매개변수체를 기반으로 한 인체 모델을 top-down 프레임워크에 통합이 중요한 선택이 되었으며, 이를 통해 거리 필드 기반의 충돌 손실과 깊이 순서를 고려하는 손실 두 가지를 도입하였다. 첫 번째 손실은 재구성된 사람들 사이의 중첩을 방지하며, 두 번째 손실은 가림막 추론과 주석이 달린 인스턴스 분할을 일관되게 렌더링하기 위해 객체들의 깊이 순서를 조정한다. 이러한 방법은 네트워크에 이미지의 명시적인 3D 주석 없이도 깊이 정보를 제공하게 한다. 실험 결과, 기존의 Interpenetration loss 방법은 MuPoTS-3D가 114, PoseTrack이 654에 비해서 본 연구의 방법론인 Lp 손실로 네트워크를 훈련시킬 때 MuPoTS-3D가 34, PoseTrack이 202로 충돌수가 크게 감소하는 것으로 나타났다. 본 연구 방법은 표준 3D 자세벤치마크에서 기존 방법보다 더 나은 성능을 보여주었고, 제안된 손실들은 자연 이미지에서 더욱 일관된 재구성을 실현하게 하였다.

This study attempts to address the problem of 3D pose estimation for multiple human objects through a single image generated during the character development process that can be used in augmented reality. In the existing top-down method, all objects in the image are first detected, and then each is reconstructed independently. The problem is that inconsistent results may occur due to overlap or depth order mismatch between the reconstructed objects. The goal of this study is to solve these problems and develop a single network that provides consistent 3D reconstruction of all humans in a scene. Integrating a human body model based on the SMPL parametric system into a top-down framework became an important choice. Through this, two types of collision loss based on distance field and loss that considers depth order were introduced. The first loss prevents overlap between reconstructed people, and the second loss adjusts the depth ordering of people to render occlusion inference and annotated instance segmentation consistently. This method allows depth information to be provided to the network without explicit 3D annotation of the image. Experimental results show that this study's methodology performs better than existing methods on standard 3D pose benchmarks, and the proposed losses enable more consistent reconstruction from natural images.

키워드

과제정보

이 논문은 2021년 순천대학교 학술연구비(과제번호: 2021-0320) 공모과제로 연구되었음.

참고문헌

  1. H. Sim, "Development of Augmented Reality Character System based on Markerless Tracking", J. of the Korea Institute of Electronic Communication Sciences, vol. 17, no. 6, 2022, pp. 1275-1282.
  2. J. Jung, G. Lee and B. Kim, "A Study on Stable Service of Marker based Augmented Reality Using 3D Location Measurement of Beacons", J. of the Korea Institute of Electronic Communication Sciences, vol. 12, no. 5, 2017, pp.883-890.
  3. D. Mehta, S. Sridhar, O. Sotnychenko, H. Rhodin, M. Shafiei, H. Seidel, W. Xu, D. Casas, and C. Theobalt, "VNect: Real-time 3D human pose estimation with a single RGB camera," ACM Transactions on Graphics (TOG), vol. 36 no. 4, May 2017, pp. 21-44. https://doi.org/10.1145/3072959.3073596
  4. N. Wojke, A. Bewley, and D. Paulus, "Simple online and realtime tracking with a deep association metric," 2017 IEEE International Conference on Image Processing(ICIP), Beijing, China May. 2017.
  5. M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, "SMPL: A skinned multi-person linear model," ACM transactions on graphics (TOG), vol. 34, no. 248, Oct. 2015, pp. 1-16. https://doi.org/10.1145/2816795.2818013
  6. G. Varol, D. Ceylan, B. Russell, J. Yang, E. Yumer, I. Laptev, and C. Schmid, "BodyNet: Volumetric inference of 3D human body shapes," European Conference on Computer Vision(In ECCV), Munich, Germany, Aug. 2018.
  7. F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black, "Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image," European Conference on Computer Vision(In ECCV), Amsterdam, The Netherlands, July. 2016.
  8. M. Omran, C. Lassner, G. Pons-Moll, P. Gehler, and B. Schiele, "Neural body fitting: Unifying deep learning and model based human pose and shape estimation," 2018 International Conference on 3D Vision, Verona, Italy, Aug. 2018.
  9. A. Kanazawa, M. J. Black, D. W. Jacobs, and Jitendra Malik, "End-to-end recovery of human shape and pose," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(In CVPR), Salt Lake City, UT, USA, Dec. 2018.
  10. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," J. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, 2017, pp. 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031
  11. G. Rogez, P. Weinzaepfel, and C. Schmid, " LCR-Net: Localization-Classification-Regression for Human pose," IEEE Computer Society Conference on Computer Vision and Pattern Recognition(In CVPR), Honolulu, HI, USA, July 2017.
  12. A. Zanfir, E. Marinoiu, and C. Sminchisescu, "Monocular 3D pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(In CVPR), Salt Lake City, UT, USA, June 2018.
  13. A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple online and realtime tracking," 2016 IEEE International Conference on Image Processing(ICIP), Phoenix, AZ, USA, Sept. 2016.
  14. F. Zeng, B. Dong, Y. Zhang, T. Wang, X. Zhang, and Y. Wei, "MOTR: End-to-end multiple-object tracking with transformer," 2022 European Conference on Computer Vision(ECCV), Tel Aviv, Israel, July 2022.
  15. M. Hassan, V. Choutas, D. Tzionas, and M. J. Black, "Resolving 3D human pose ambiguities with 3D scene constraints," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, Oct. 2019.
  16. T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, " Microsoft COCO: Common objects in context," European Conference on Computer Vision, Zurich, Switzerland, Sept. 2014, pp.740-755.
  17. H. Kato, Y. Ushiku, and T. Harada, "Neural 3D mesh renderer," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(In CVPR), Salt Lake City, UT, USA,, June 2018, pp. 3907-3916.
  18. D. Stutz, "Learning shape completion from bounding boxes with CAD shape priors," PhD thesis, Masters thesis, RWTH Aachen University, Sept. 2017.
  19. D. Stutz and A. Geiger, "Learning 3D shape completion from laser scan data with weak supervision," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(In CVPR), Salt Lake City, UT, USA,, June 2018.