DOI QR코드

DOI QR Code

동영상 안정화를 위한 옵티컬 플로우의 비지도 학습 방법

Deep Video Stabilization via Optical Flow in Unstable Scenes

  • 이보희 (성균관대학교 DMC공학과) ;
  • 김광수 (성균관대학교 소프트웨어융합대학)
  • Bohee Lee (Department of Digital Media and Communications Engineering, Sungkyunkwan University) ;
  • Kwangsu Kim (College of Computing and Informatics, Sunkyunkwan University)
  • 투고 : 2023.04.20
  • 심사 : 2023.05.12
  • 발행 : 2023.06.30

초록

동영상 안정화 기술은 최근 1인 미디어 시장이 거대화됨에 따라 그 중요성이 점점 커지고 있는 카메라 기술 중 하나이다. 딥러닝 기반의 기존 방법들에서는 안정화 전/후 동영상 데이터 쌍을 사용하였으나 동영상의 특성상 동기화된 안정화 전/후 데이터를 만드는 것은 많은 시간과 노력이 필요하다. 최근 이러한 문제를 완화하기 위하여 안정화 전 데이터만을 사용하는 비지도 학습 방법이 제시되고 있다. 본 논문에서는 비지도 학습 방법의 하나인 Convolutional Autoencoder 구조를 사용하여 안정화 전/후 동영상 데이터 쌍 없이 안정화 전 영상만으로 안정화 궤적을 학습하는 네트워크 구조를 제안한다. 네트워크 입력 및 출력으로 옵티컬 플로우를 사용하고 네트워크 경량화 및 노이즈 최소화를 위해 옵티컬 플로우를 Grid 단위로 맵핑하여 사용했다. 또한 비지도 학습 방법으로 안정화된 궤적을 생성하기 위해 옵티컬 플로우를 부드럽게 만드는 손실함수를 정의하였고 결과 비교를 통해 손실함수의 의도대로 부드러운 궤적을 생성하도록 네트워크가 학습되었음을 확인했다.

Video stabilization is one of the camera technologies that the importance is gradually increasing as the personal media market has recently become huge. For deep learning-based video stabilization, existing methods collect pairs of video datas before and after stabilization, but it takes a lot of time and effort to create synchronized datas. Recently, to solve this problem, unsupervised learning method using only unstable video data has been proposed. In this paper, we propose a network structure that learns the stabilized trajectory only with the unstable video image without the pair of unstable and stable video pair using the Convolutional Auto Encoder structure, one of the unsupervised learning methods. Optical flow data is used as network input and output, and optical flow data was mapped into grid units to simplify the network and minimize noise. In addition, to generate a stabilized trajectory with an unsupervised learning method, we define the loss function that smoothing the input optical flow data. And through comparison of the results, we confirmed that the network is learned as intended by the loss function.

키워드

참고문헌

  1. Bradley, A., Klivington, J., Triscari, J., & van der Merwe, R. (2021). Cinematic-L1 video stabilization with a log-homography model. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1041-1049).
  2. Chen, Y. T., Tseng, K. W., Lee, Y. C., Chen, C. Y., & Hung, Y. P. (2021, September). Pixstabnet: Fast multi-scale deep online video stabilization with pixel-based warping. In 2021 IEEE International Conference on Image Processing (ICIP) (pp. 1929-1933). IEEE.
  3. Choi, J., & Kweon, I. S. (2020). Deep iterative frame interpolation for full-frame video stabilization. ACM Transactions on Graphics (TOG), 39(1), 1-9. https://doi.org/10.1145/3363550
  4. Choi, J., Park, J., & Kweon, I. S. (2021). Self-supervised real-time video stabilization. arXiv preprint arXiv:2111.05980.
  5. Grundmann, M., Kwatra, V., & Essa, I. (2011, June). Auto-directed video stabilization with robust l1 optimal camera paths. In CVPR 2011 (pp. 225-232). IEEE.
  6. Liu, F., Gleicher, M., Jin, H., & Agarwala, A. (2009). Content-preserving warps for 3D video stabilization. ACM Transactions on Graphics (ToG), 28(3), 1-9.
  7. Liu, S., Wang, Y., Yuan, L., Bu, J., Tan, P., & Sun, J. (2012, June). Video stabilization with a depth camera. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 89-95). IEEE.
  8. Liu, S., Yuan, L., Tan, P., & Sun, J. (2013). Bundled camera paths for video stabilization. ACM transactions on graphics (TOG), 32(4), 1-10.
  9. Liu, Y. L., Lai, W. S., Yang, M. H., Chuang, Y. Y., & Huang, J. B. (2021). Hybrid neural fusion for full-frame video stabilization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 2299-2308).
  10. Shen, X., Wang, C., Li, X., Yu, Z., Li, J., Wen, C., ... & He, Z. (2019). Rf-net: An end-to-end image matching network based on receptive field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8132-8140).
  11. Shi, Z., Shi, F., Lai, W. S., Liang, C. K., & Liang, Y. (2022). Deep online fused video stabilization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1250-1258).
  12. Sun, D., Yang, X., Liu, M. Y., & Kautz, J. (2018). Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8934-8943).
  13. Wang, M., Yang, G. Y., Lin, J. K., Zhang, S. H., Shamir, A., Lu, S. P., & Hu, S. M. (2018). Deep online video stabilization with multi-grid warping transformation learning. IEEE Transactions on Image Processing, 28(5), 2283-2292.
  14. Xu, S. Z., Hu, J., Wang, M., Mu, T. J., & Hu, S. M. (2018, October). Deep video stabilization using adversarial networks. In Computer Graphics Forum (Vol. 37, No. 7, pp. 267-276).
  15. Xu, Y., Zhang, J., Maybank, S. J., & Tao, D. (2022). DUT: learning video stabilization by simply watching unstable videos. IEEE Transactions on Image Processing, 31, 4306-4320. https://doi.org/10.1109/TIP.2022.3182887
  16. Yu, J., & Ramamoorthi, R. (2020). Learning video stabilization using optical flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8159-8167).
  17. Zhang, L., Chen, X. Q., Kong, X. Y., & Huang, H. (2017). Geodesic video stabilization in transformation space. IEEE Transactions on Image Processing, 26(5), 2219-2229. https://doi.org/10.1109/TIP.2017.2676354