영상과 비디오로부터의 가상 시점 영상 생성 기술

  • Published : 2021.10.30

Abstract

실감형 미디어를 구성하기 위해서는 다시점 영상 또는 비디오들로 구성된 대용량의 콘텐츠가 필수적이다. 이러한 콘텐츠는 다량의 카메라들을 목적에 따라 배치하여 획득하므로 영상 구성의 복잡성과 콘텐츠의 크기가 급격히 커진다는 문제점을 갖고 있다. 3D 미디어 환경에서 카메라의 개수를 최소화하면서도 목적에 맞게 다양한 시점을 제공할 수 있는 가상시점 영상 생성은 핵심적인 기술이다. 본 기고문에서는 다시점 영상과 비디오로부터 학습 기반의 가상 시점 영상 생성 연구들에 대해 체계적인 조사를 통해 그 결과를 다음과 같이 제시한다. 첫째, 가상 시점 영상 생성에 대한 배경 개념을 정의한다. 둘째, 제안하는 분류 방식에 따라 기존의 제안된 방법들을 상세하게 분석한다. 셋째, 가상 시점 영상 생성에 주로 사용되는 관련 데이터셋을 조사한다. 마지막으로는 각 연구들이 갖고 있는 특징들을 분석하고, 정량적, 정성적 평가 결과를 비교한다.

Keywords

Acknowledgement

이 기고문은 삼성전자 미래기술육성센터의 지원을 받아 수행된 연구임(SRFC-IT1702-54). 이 기고문은 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임(2020-0-01389, 인공지능융합연구 센터지원(인하대학교)).

References

  1. Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa, pixelNeRF: Neural Radiance Fields from One or Few Images, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 4578-4587.
  2. Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Niessner, ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2017), pp. 5828-5839.
  3. Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu, Shapenet: An Information-Rich 3D Model Repository, arXiv preprint arXiv:1512.03012(2015).
  4. Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun, Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction, ACM Trans. on Graphics (2017), Vol. 36, No. 4, pp. 1-13.
  5. Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar, Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines, ACM Trans. on Graphics (2019), Vol. 38, No. 4, pp. 1-14.
  6. Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng, Nerf: Representing Scenes as Neural Radiance Fields for View Synthesis, Proc. European Conference on Computer Vision (2020), pp. 405-421.
  7. C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. High-quality Video View Interpolation Using a Layered Representation. ACM Trans. on Graphics (2004), Vol. 23, No. 3, pp. 600-608. https://doi.org/10.1145/1015706.1015766
  8. Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis, Depth Synthesis and Local Warps for Plausible Image-based Navigation. ACM Trans. on Graphics (2013), Vol. 32, No. 3, pp. 1-12.
  9. Jae Shin Yoon, Kihwan Kim, Orazio Gallo, Hyun Soo Park, and Jan Kautz, Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 5336-5345.
  10. John Flynn, Ivan Neulander, James Philbin, and Noah Snavely, DeepStereo: Learning to Predict New Views from the World's Imagery, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2016), pp. 5515-5524.
  11. John Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan Overbeck, Noah Snavely, and Richard Tucker, DeepView: View Synthesis with Learned Gradient Descent, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 2367-2376.
  12. Jonathan Shade, Steven Gortler, Li-wei He, and Richard Szeliski, Layered Depth Images, ACM Trans. on Graphics (1998), pp. 231-242. https://doi.org/10.1145/1015706.1015708
  13. Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt, Neural Sparse Voxel Fields, Proc. Advances in Neural Information Processing Systems (2020).
  14. Marc Levoy, and Pat Hanrahan, Light Field Rendering, ACM Trans. on Graphics (1996), pp. 31-42. https://doi.org/10.1145/1276377.1276416
  15. Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox, Multi-view 3D Models from Single Images with a Convolutional Network, Proc. European Conference on Computer Vision (2016), pp. 322-337.
  16. Miaomiao Liu, Xuming He, and Mathieu Salzmann, Geometry-Aware Deep Network for Single-Image Novel View Synthesis, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), pp. 4616-4624.
  17. Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. Learning-based View Synthesis for Light Field Cameras, ACM Trans. on Graphics (2016), Vol. 35, No. 6, pp. 1-10.
  18. Olivia Wiles, Georgia Gkioxari, Richard Szeliski, and Justin Johnson, SynSin: End-to-end View Synthesis from a Single Image, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 7467-7477.
  19. Paul E. Debevec, Camillo J. Taylor, and Jitendra Malik, Modeling and Rendering Architecture from Photographs: a Hybrid Geometry- and Image-based Approach, ACM Trans. on Graphics (1996), pp. 11-20.
  20. Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow, Deep Blending for Free-Viewpoint Image-based Rendering. ACM Trans. on Graphics (2018), pp. 1-15.
  21. Pratul P. Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng, Learning to Synthesize a 4D RGBD Light Field from a Single Image, Proc. IEEE/CVF International Conference on Computer Vision (2017), pp. 2243-2251.
  22. Pratul P. Srinivasan, Richard Tucker, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, and Noah Snavely, Pushing the Boundaries of View Extrapolation with Multiplane Images, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 175-184.
  23. Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Dosovitskiy, and Daniel Duckworth, Nerf in the wild: Neural Radiance Fields for Unconstrained Photo Collections, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 7210-7219.
  24. Richard Tucker, Noah Snavely, Single-View View Synthesis with Multiplane Images, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 551-560.
  25. Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, and Yaser Sheikh, Neural Volumes: Learning Dynamic Renderable Volumes from Images, ACM Trans. on Graphics (2019), Vol. 38, No. 4, pp. 1-14.
  26. Shenchang Eric Chen, and Lance Williams, View Interpolation for Image Synthesis, ACM Trans. on Graphics (1993), pp. 279-288.
  27. Steven M. Seitz, and Charles R. Dyer, View Morphing, ACM Trans. on Graphics (1996), pp. 21-30.
  28. Suttisak Wizadwongsa, Pakkapon Phongthawee, Jiraphon Yenphraphai, and Supasorn Suwajanakorn, NeX: Real-Time View Synthesis with Neural Basis Expansion, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 8534-8543.
  29. Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A. Efros, View Synthesis by Appearance Flow, Proc. European Conference on Computer Vision (2016), pp. 286-301.
  30. Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely, Stereo Magnification: Learning View Synthesis Using Multiplane Images, ACM Trans. on Graphics (2018), Vol. 37, No. 4, pp. 1-12.
  31. Vincent Sitzmann, Michael Zollhoefer, Gordon Wetzstein, Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, Proc. Advances in Neural Information Processing Systems (2019).
  32. Wenqi Xian, Jia-Bin Huang, Johannes Kopf, and Changil Kim, Space-time Neural Irradiance Fields for Free-Viewpoint Video, Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 9421-9431.
  33. Zhengqi Li, Wenqi Xian, Abe Davis, Noah Snavely, Crowdsampling the Plenoptic Function, Proc. European Conference on Computer Vision (2020), pp. 178-196.