DOI QR코드

DOI QR Code

360 RGBD Image Synthesis from a Sparse Set of Images with Narrow Field-of-View

소수의 협소화각 RGBD 영상으로부터 360 RGBD 영상 합성

  • Kim, Soojie (Inha University, Department of Electrical and Computer Engineering) ;
  • Park, In Kyu (Inha University, Department of Electrical and Computer Engineering)
  • 김수지 (인하대학교 전기컴퓨터공학과) ;
  • 박인규 (인하대학교 전기컴퓨터공학과)
  • Received : 2022.05.09
  • Accepted : 2022.06.08
  • Published : 2022.07.30

Abstract

Depth map is an image that contains distance information in 3D space on a 2D plane and is used in various 3D vision tasks. Many existing depth estimation studies mainly use narrow FoV images, in which a significant portion of the entire scene is lost. In this paper, we propose a technique for generating 360° omnidirectional RGBD images from a sparse set of narrow FoV images. The proposed generative adversarial network based image generation model estimates the relative FoV for the entire panoramic image from a small number of non-overlapping images and produces a 360° RGB and depth image simultaneously. In addition, it shows improved performance by configuring a network reflecting the spherical characteristics of the 360° image.

깊이 영상은 3차원 공간상의 거리 정보를 2차원 평면에 나타낸 영상이며 다양한 3D 비전 연구에서 유용하게 사용된다. 기존의 많은 깊이 추정 연구는 주로 좁은 FoV (Field of View) 영상을 사용하여 전체 장면 중 상당 부분이 소실된 영상에 대한 깊이 정보를 추정한다. 본 논문에서는 소수의 좁은 FoV 영상으로부터 360° 전 방향 RGBD 영상을 동시에 생성하는 기법을 제안한다. 오버랩 되지 않는 4장의 소수 영상으로부터 전체 파노라마 영상에 대해서 상대적인 FoV를 추정하고 360° 컬러 영상과 깊이 영상을 동시에 생성하는 적대적 생성 신경망 기반의 영상 생성 모델을 제안하였으며, 두 모달리티의 특징을 공유하여 상호 보완된 결과를 확인한다. 또한 360° 영상의 구면 특성을 반영한 네트워크를 구성하여 개선된 성능을 보인다.

Keywords

Acknowledgement

본 논문은 삼성전자 미래기술육성센터의 지원을 받아 수행한 연구결과임 (과제번호 SRFC-IT1702-54). 이 논문은 2022년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원(2020-0-01389, 인공지능융합연구센터지원(인하대학교), RS-2022-00155915, 인공지능융합혁신인재양성사업(인하대학교))을 받아 수행된 연구임.

References

  1. Y. Wang, W. L. Chao, D.Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger, "Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019. doi: https://doi.org/10.48550/arXiv.1812.07179
  2. F. E. Wang, Y. H. Yeh, M. Sun, W. C. Chiu, and Y. H. Tsai, "LED2-Net: Monocular 360 layout estimation via differentiable depth rendering," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021. doi: https://doi.org/10.48550/arXiv.2104.00568
  3. S. Li, "Binocular spherical stereo," IEEE Trans. on Intelligent Transportation Systems, vol 9, December 2008. doi: https://doi.org/10.1109/TITS.2008.2006736
  4. N. Zioulis, A. Karakottas, D. Zarpalas, F. Alvarez, and P.Daras, "Spherical view synthesis for self-supervised 360 depth estimation," Proc. International Conference on 3D Vision, September 2019. doi: https://doi.org/10.48550/arXiv.1909.08112
  5. F. E. Wang, Y. H. Yeh, M. Sun, W. C. Chiu, and Y. H. Tsai, "BiFuse: Monocular 360° depth estimation via bi-projection fusion," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020. doi: https://doi.org/10.1109/CVPR42600.2020.00054
  6. N. Ziloulis, A. Karakottas, D. Zarpalas, and P. Daras, "OmniDepth: Dense depth estimation for indoors spherical panoramas," Proc. European Conference on Computer Vision, September 2018. doi: https://doi.org/10.48550/arXiv.1807.09620
  7. J. Zbontar and Y. LeCun, "Stereo matching by training a convolutional neural network to compare image patches," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2016. doi: https://doi.org/10.48550/arXiv.1510.05970
  8. A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, and A. Bry, "End-to-end learning of geometry and context for deep stereo regression," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, July 2017. doi: https://doi.org/10.48550/arXiv.1703.04309
  9. J. R. Chang and Y. S. Chen, "Pyramid stereo matching network," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018. doi: https://doi.org/10.48550/arXiv.1803.08669
  10. C. Godard, O. M. Aodha, and G. J. Brostow, "Unsupervised monocular depth estimation with left-right consistency," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, July 2017. doi: https://doi.org/10.48550/arXiv.1609.03677
  11. K. Lu, N. Barnes, S. Anwar, and L. Zheng, "From depth what can you see? Depth completion via auxiliary image reconstruction," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2020. doi: https://doi.org/10.1109/CVPR42600.2020.01132
  12. Y. Zhang and T. Funkhouser, "Deep depth completion of a single RGB-D image," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018. doi: https://doi.org/10.48550/arXiv.1803.09326
  13. N. H. Wang, B. Solarte, Y. H. Tsai, W. C. Chiu, and M. Sun, "360SD-Net: 360° Stereo depth estimation with learnable cost volume," Proc. IEEE International Conference on Robotics and Automation, November 2020. doi: https://doi.org/10.48550/arXiv.1911.04460
  14. J. S. Sumantri and I. K. Park, "360 Panorama synthesis from a sparse set of images with unknown field of view," IEEE Trans. on Computational Imaging, vol. 6, pp. 1179-1193, July 2020. doi: https://doi.org/10.48550/arXiv.1904.03326
  15. C. O. W.J. Cho and K. Yoon, "RGBD panorama synthesis using normal field-of-view cameras and mobile depth sensors in arbitrary configuration," Proc. The 33rd Workshop on Image Processing and Image Understanding, p1-11, February 2021. https://arxiv.org/pdf/2112.06179.pdf
  16. D. C. Dowson and B. V. Landau, "The Frechet distance between multivariate normal distributions," in Journal of Multivariate Analysis, 12:450-455, 1982. doi: https://doi.org/10.1016/0047-259X(82)90077-X
  17. Y. Ren, X. Yu, R. Zhang, T. H. Li, S. Liu, and G. Li, "Structureflow: Image inpainting via structure-aware appearance flow," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2019. doi: https://doi.org/10.48550/arXiv.1908.03852
  18. H. Liu, B. Jiang, Y. Song, W. Huang, and C. Yang, "Rethinking image inpainting via a mutual encoder-decoder with feature equalizations," Proc. European Conference on Computer Vision, November 2020. doi: https://doi.org/10.48550/arXiv.2007.06929
  19. A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, "Matterport3D: Learning from RGB-D data in indoor environments," Proc. International Conference on 3D Vision, October 2017. doi: https://doi.org/10.48550/arXiv.1709.06158
  20. I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, "Joint 2D-3D semantic data for indoor scene understanding," arXiv preprint arXiv:1702.01105, April 2017. doi: https://doi.org/10.48550/arXiv.1702.01105
  21. S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, "Semantic scene completion from a single depth image," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, July 2017. doi: https://doi.org/10.48550/arXiv.1611.08974
  22. C. Sun, M. Sun, and H. T. Chen, "HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features," Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021. doi: https://doi.org/10.48550/arXiv.2011.11498
  23. K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi, "EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning," arXiv:1901.00212, January 2019. doi: https://doi.org/10.48550/arXiv.1901.00212
  24. K. He, X. Chen, S. Xie, Y. Li, P. Dollar, and R. Grirshick, "Masked Autoencoders Are Scalable Vision Learners," arXiv:2111.06377, November 2021. doi: https://doi.org/10.48550/arXiv.2111.06377