DOI QR코드

DOI QR Code

Deep Learning Based On-Device Augmented Reality System using Multiple Images

다중영상을 이용한 딥러닝 기반 온디바이스 증강현실 시스템

  • Jeong, Taehyeon (Inha University, Department of Electrical & Computer Engineering) ;
  • Park, In Kyu (Inha University, Department of Electrical & Computer Engineering)
  • 정태현 (인하대학교 전기컴퓨터공학과) ;
  • 박인규 (인하대학교 전기컴퓨터공학과)
  • Received : 2022.02.22
  • Accepted : 2022.04.12
  • Published : 2022.05.30

Abstract

In this paper, we propose a deep learning based on-device augmented reality (AR) system in which multiple input images are used to implement the correct occlusion in a real environment. The proposed system is composed of three technical steps; camera pose estimation, depth estimation, and object augmentation. Each step employs various mobile frameworks to optimize the processing on the on-device environment. Firstly, in the camera pose estimation stage, the massive computation involved in feature extraction is parallelized using OpenCL which is the GPU parallelization framework. Next, in depth estimation, monocular and multiple image-based depth image inference is accelerated using the mobile deep learning framework, i.e. TensorFlow Lite. Finally, object augmentation and occlusion handling are performed on the OpenGL ES mobile graphics framework. The proposed augmented reality system is implemented as an application in the Android environment. We evaluate the performance of the proposed system in terms of augmentation accuracy and the processing time in the mobile as well as PC environments.

본 논문은 온디바이스 환경에서 다중 시점 영상을 입력 받아 객체를 증강하고, 현실 공간에 의한 가려짐을 구현하는 딥러닝 기반의 증강현실 시스템을 제안한다. 이는 세부적으로 카메라 자세 추정, 깊이 추정, 객체 증강 구현의 세 기술적 단계로 나눠지며 각 기법은 온디바이스 환경에서의 최적화를 위해 다양한 모바일 프레임워크를 사용한다. 카메라 자세 추정 단계에서는 많은 계산량을 필요로 하는 특징 추출 알고리즘을 GPU 병렬처리 프레임워크인 OpenCL을 통해 가속하여 사용하며, 깊이 영상 추론 단계에서는 모바일 심층신경망 프레임워크 TensorFlow Lite를 사용하여 가속화된 단안, 다중 영상 기반의 깊이 영상 추론을 수행한다. 마지막으로 모바일 그래픽스 프레임워크 OpenGL ES를 활용해 객체 증강 및 가려짐을 구현한다. 제시하는 증강현실 시스템은 안드로이드 환경에서 GUI를 갖춘 애플리케이션으로 구현되며 모바일과 PC 환경에서의 동작 정확도 및 처리 시간을 평가한다.

Keywords

Acknowledgement

This work was partly supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (2020-0-01389, Artificial Intelligence Convergence Research Center (Inha University), No.2021-0-02068, Artificial Intelligence Innovation Hub).

References

  1. Android Studio, https://android.com/ (accessed Feb. 2, 2022)
  2. OpenCV, https://opencv.org/ (accessed Feb. 2, 2022)
  3. OpenGL ES, https://www.khronos.org/opengles/ (accessed Feb. 2, 2022)
  4. PyTorch Mobile, https://pytorch.org/mobile/ (accessed Feb. 2, 2022).
  5. TensorFlow Lite, https://www.tensorflow.org/lite/ (accessed Feb. 2, 2022)
  6. A. Ivan and I. K. Park, "A flexible and configurable GPGPU stereo matching framework," Multimedia Tools and Applications, vol. 79, no. 25, pp. 18367-18386, 2020. doi: https://doi.org/10.1007/s11042-020-08756-2
  7. A. Munshi, B. Gaster, T. G. Mattson, and D. Ginsburg, OpenCL programming guide, Pearson Education, 2011.
  8. D. Gallup, J. M. Frahm, P. Mordohai, Q. Yang, and M. Pollefeys, "Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions," Proc. IEEE Conference on Computer Vision and Pattern Recognition, June 2007. doi: https://doi.org/10.1109/cvpr.2007.383245
  9. D. G. Lowe, "Object recognition from local scale-invariant features," Proc. IEEE International Conference on Computer Vision, September 1999. doi: https://doi.org/10.1109/iccv.1999.790410
  10. J. E. Stone, D. Gohara, and G. Shi, "OpenCL: A parallel programming standard for heterogeneous computing systems," Computing in Science & Engineering, vol. 12, no. 3, pp. 66-72, 2010. doi: https://doi.org/10.1109/mcse.2010.69
  11. J. L. Schonberger and J. -M. Frahm, "Structure-from-motion revisited," Proc. IEEE Computer Vision and Pattern Recognition, June 2016. doi: https://doi.org/10.1109/cvpr.2016.445
  12. R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, and A. W. Fitzgibbon, "KinectFusion: Real-time dense surface mapping and tracking," Proc. IEEE International Symposium on Mixed and Augmented Reality, October 2011. doi: https://doi.org/10.1109/ismar.2011.6092378
  13. R. Hartley and A. Zisserman, Multiple view geometry in computer vision, Cambridge University Press, 2003.
  14. R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, "Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer" IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1623-1637, March 2020. doi: https://doi.org/10.1109/tpami.2020.3019967
  15. R. T. Collins, "A space-sweep approach to true multi-image matching," Proc. IEEE Conference on Computer Vision and Pattern Recognition, June 1996. doi: https://doi.org/10.1109/cvpr.1996.517097
  16. S. H. Im, H. G. Jeon, S. Lin, and I. S. Kweon, "DPSNet: End-to-end deep plane sweep stereo," Proc. International Conference on Learning Representations, May 2019.
  17. V. Garro, G. Pintore, F. Ganovelli, E. Gobbetti, and R. Scopigno, "Fast metric acquisition with mobile devices," Proc. Vision, Modeling and Visualization, pp. 29-36, 2016.
  18. W. Yin, J. Zhang, O. Wang, S. Niklaus, L. Mai, S. Chen, and C. Shen, "Learning to recover 3D scene shape from a single image," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021. doi: https://doi.org/10.1109/cvpr46437.2021.00027
  19. Y. B. Jeon and I. K. Park, "Deep neural network for handcrafted cost-based multi-view stereo," Proc. International Workshop on Advanced Imaging Technology, January 2021. doi: https://doi.org/10.1117/12.2591008
  20. Z. Zhang, "A flexible new technique for camera calibration," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330-1334, 2000. doi: https://doi.org/10.1109/34.888718