DOI QR코드

DOI QR Code

Video classifier with adaptive blur network to determine horizontally extrapolatable video content

적응형 블러 기반 비디오의 수평적 확장 여부 판별 네트워크

  • 김민선 (카이스트 비주얼 미디어 연구실) ;
  • 서창욱 (애니그마 테크놀로지스) ;
  • 윤현호 (카이스트 비주얼 미디어 연구실) ;
  • 노준용 (카이스트 비주얼 미디어 연구실)
  • Received : 2024.06.15
  • Accepted : 2024.07.05
  • Published : 2024.07.25

Abstract

While the demand for extrapolating video content horizontally or vertically is increasing, even the most advanced techniques cannot successfully extrapolate all videos. Therefore, it is important to determine if a given video can be well extrapolated before attempting the actual extrapolation. This can help avoid wasting computing resources. This paper proposes a video classifier that can identify if a video is suitable for horizontal extrapolation. The classifier utilizes optical flow and an adaptive Gaussian blur network, which can be applied to flow-based video extrapolation methods. The labeling for training was rigorously conducted through user tests and quantitative evaluations. As a result of learning from this labeled dataset, a network was developed to determine the extrapolation capability of a given video. The proposed classifier achieved much more accurate classification performance than methods that simply use the original video or fixed blur alone by effectively capturing the characteristics of the video through optical flow and adaptive Gaussian blur network. This classifier can be utilized in various fields in conjunction with automatic video extrapolation techniques for immersive viewing experiences.

기존에 존재하는 비디오 영역을 가로 혹은 세로로 확장하는 비디오 확장 기술에 대한 수요가 높아지고 있지만, 최신 기술로도 모든 비디오를 성공적으로 확장할 수는 없다. 따라서 비디오 확장을 시도하기 전에 해당 비디오가 잘 확장될 수 있을지 판단하는 것이 중요하다. 이를 통해 불필요한 컴퓨팅 자원 낭비를 줄일 수 있기 때문이다. 이 논문은 비디오가 수평 확장에 적합한지 판별하는 비디오 분류기를 제안한다. 이 분류기는 광학 흐름과 적응형 가우시안 블러 네트워크를 활용하여 흐름 기반 비디오 확장 방식에 적용할 수 있다. 학습을 위한 라벨링은 유저 테스트 및 정량적 평가를 거쳐 엄격하게 이루어졌다. 이렇게 라벨링된 데이터셋으로 학습한 결과, 주어진 비디오의 확장 가능성을 분류하는 네트워크를 개발할 수 있었다. 제안된 분류기는 광학 흐름과 적응형 가우시안 블러 네트워크를 통해 비디오의 특성을 효과적으로 포착함으로써, 단순히 원본 비디오나 고정된 블러만을 사용하는 경우보다 훨씬 정확한 분류 성능을 보였다. 이 분류기는 향후 다양한 분야에서 활용될 수 있으며, 특히 몰입감 있는 시청 경험을 위해 장면을 자동으로 확장하는 기술과 함께 사용될 수 있을 것으로 기대된다.

Keywords

Acknowledgement

본 연구는 문화체육관광부 및 한국콘텐츠진흥원의 2023년도 문화체육관광 연구개발사업으로 수행되었음 (과제명: 아바타 개성표현을 위한 유니버설 패션 창작 플랫폼 기술개발, 과제번호: RS-2023-00228331, 기여율: 100%)

References

  1. Jungjin Lee, Sangwoo Lee, Younghui Kim, and Junyong Noh. Screenx: Public immersive theatres with uniform movie viewing experiences. IEEE transactions on visualization and computer graphics, 23(2):1124-1138, 2016.
  2. Brian Guenter, Mark Finch, Steven Drucker, Desney Tan, and John Snyder. Foveated 3d graphics. ACM transactions on Graphics (tOG), 31(6):1-10, 2012.
  3. Sanghoon Lee, Marios S Pattichis, and Alan C Bovik. Foveated video quality assessment. IEEE Transactions on Multimedia, 4(1):129-132, 2002.
  4. Cornelius Weber and Jochen Triesch. Implementations and implications of foveated vision. Recent Patents on Computer Science, 2(1):75-85, 2009.
  5. Mohammed Yeasin and Rajeev Sharma. Foveated vision sensor and image processing-a review. Machine Learning and Robot Perception, pages 57-98, 2005.
  6. David V Wick, Ty Martinez, Sergio R Restaino, and BR Stone. Foveated imaging demonstration. Optics Express, 10(1):60-65, 2002.
  7. Zhou Wang and Alan C Bovik. Foveated image and video coding. Digital Video, Image Quality and Perceptual Coding, pages 431-457, 2006.
  8. Loic Dehan, Wiebe Van Ranst, Patrick Vandewalle, and Toon Goedeme. Complete and temporally consistent video out-painting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 687-695, 2022.
  9. Amit Aides, Tamar Avraham, and Yoav Y Schechner. Multiscale ultrawide foveated video extrapolation. In 2011 IEEE International Conference on Computational Photography (ICCP), pages 1-8. IEEE, 2011.
  10. Tamar Avraham and Yoav Y Schechner. Ultrawide foveated video extrapolation. IEEE Journal of Selected Topics in Signal Processing, 5(2):321-334, 2010.
  11. Sangwoo Lee, Jungjin Lee, Bumki Kim, Kyehyun Kim, and Junyong Noh. Video extrapolation using neighboring frames. ACM Transactions on Graphics (TOG), 38(3):1-13, 2019.
  12. Johannes L Schonberger and Jan-Michael Frahm. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104-4113, 2016.
  13. Onur Ozyesil, Vladislav Voroninski, Ronen Basri, and Amit Singer. A survey of structure from motion*. Acta Numerica, 26:305-364, 2017.
  14. Chen Gao, Ayush Saraf, Jia-Bin Huang, and Johannes Kopf. Flow-edge guided video completion. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XII 16, pages 713-729. Springer, 2020.
  15. Yu Kong and Yun Fu. Human action recognition and prediction: A survey. International Journal of Computer Vision, 130(5):1366-1401, 2022.
  16. Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2625-2634, 2015.
  17. Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1):221-231, 2012.
  18. Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, 27, 2014.
  19. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 4489-4497, 2015.
  20. Xin Liu, Silvia L Pintea, Fatemeh Karimi Nejadasl, Olaf Booij, and Jan C Van Gemert. No frame left behind: Full video action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14892-14901, 2021.
  21. Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 724-732, 2016.
  22. Emre Akbas and Miguel P Eckstein. Object detection through search with a foveated visual system. PLoS computational biology, 13(10):e1005743, 2017.
  23. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016.
  24. Ji Lin, Chuang Gan, and Song Han. Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7083-7093, 2019.
  25. Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  26. Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, and Johannes Kopf. Consistent video depth estimation. ACM Transactions on Graphics (ToG), 39(4):71-1, 2020.