DOI QR코드

DOI QR Code

Latent Shifting and Compensation for Learned Video Compression

신경망 기반 비디오 압축을 위한 레이턴트 정보의 방향 이동 및 보상

  • Received : 2021.11.24
  • Accepted : 2021.12.21
  • Published : 2022.01.30

Abstract

Traditional video compression has developed so far based on hybrid compression methods through motion prediction, residual coding, and quantization. With the rapid development of technology through artificial neural networks in recent years, research on image compression and video compression based on artificial neural networks is also progressing rapidly, showing competitiveness compared to the performance of traditional video compression codecs. In this paper, a new method capable of improving the performance of such an artificial neural network-based video compression model is presented. Basically, we take the rate-distortion optimization method using the auto-encoder and entropy model adopted by the existing learned video compression model and shifts some components of the latent information that are difficult for entropy model to estimate when transmitting compressed latent representation to the decoder side from the encoder side, and finally compensates the distortion of lost information. In this way, the existing neural network based video compression framework, MFVC (Motion Free Video Compression) is improved and the BDBR (Bjøntegaard Delta-Rate) calculated based on H.264 is nearly twice the amount of bits (-27%) of MFVC (-14%). The proposed method has the advantage of being widely applicable to neural network based image or video compression technologies, not only to MFVC, but also to models using latent information and entropy model.

전통적인 비디오 압축은 움직임 예측, 잔차 신호 변환 및 양자화를 통한 하이브리드 압축 방식을 기반으로 지금까지 발전해왔다. 최근 인공 신경망을 통한 기술이 빠르게 발전함에 따라, 인공 신경망 기반의 이미지 압축, 비디오 압축 연구 또한 빠르게 진행되고 있으며, 전통적인 비디오 압축 코덱의 성능과 비교해 높은 경쟁력을 보여주고 있다. 본 논문에서는 이러한 인공 신경망 기반 비디오 압축 모델의 성능을 향상시킬 수 있는 새로운 방법을 제시한다. 기본적으로는 기존 인공 신경망 기반 비디오 압축 모델들이 채택하고 있는 변환 및 복원 신경망과 엔트로피 모델(Entropy model)을 이용한 율-왜곡 최적화(Rate-distortion optimization) 방법을 사용하며, 인코더 측에서 디코더 측으로 압축된 레이턴트 정보(Latent information)를 전송할 때 엔트로피 모델이 추정하기 어려운 정보의 값을 이동시켜 전송할 비트량을 감소시키고, 손실된 정보를 추가로 전송함으로써 손실된 정보에 대한 왜곡을 보정한다. 이러한 방법을 통해 기존의 인공 신경망 기반 비디오 압축 기술인 MFVC(Motion Free Video Compression) 방법을 개선하였으며, 실험 결과를 통해 H.264를 기준으로 계산한 BDBR (Bjøntegaard Delta-Bitrate) 수치(%)로 MFVC(-14%) 보다 두 배 가까운 비트량 감축(-27%)이 가능함을 입증하였다. 제안된 방법은 MFVC 뿐 아니라, 레이턴트 정보와 엔트로피 모델을 사용하는 신경망 기반 이미지 또는 비디오 압축 기술에 광범위하게 적용할 수 있다는 장점이 있다.

Keywords

Acknowledgement

이 논문은 2021년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임 (No. 2017-0-00072, 초실감 테라미디어를 위한 AV 부호화 및 LF 미디어 원천기술 개발).

References

  1. Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra, "Overview of the h. 264/avc video coding standard," IEEE Transactions on circuits and systems for video technology, 13(7):560-576, 2003. https://doi.org/10.1109/TCSVT.2003.815165
  2. Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand, "Overview of the high efficiency video coding (hevc) standard," IEEE Transactions on circuits and systems for video technology, 22(12):1649-1668, 2012. https://doi.org/10.1109/TCSVT.2012.2221191
  3. Jens-Rainer Ohm and Gary J Sullivan, "Versatile video coding-towards the next generation of video compression," In Picture Coding Symposium, volume 2018, 2018.
  4. Johannes Balle, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston, "Variational image compression with a scale hyperprior," In International Conference on Learning Representations, 2018.
  5. Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Dongyang Li, Yichen Qian, Hao Li, "Spatiotemporal Entropy Model is All You Need f or Learned Video Compression," arXiv, 2021, https://arxiv.org/abs/2104.06083 (accessed Oct. 24, 2021).
  6. F. Bellard, BPG image format, http://bellard.org/bpg/ (accessed: Jan. 30, 2017).
  7. Z. Wang, E. P. Simoncelli, and A. C. Bovik, "Multiscale structural similarity for imagequality assessment," in Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on, IEEE, vol. 2, 2003, pp. 1398-1402
  8. Johannes Balle, Valero Laparra, and Eero P. Simoncelli, "End-to-end optimized image compression," In International Conference on Learning Representations, 2017.
  9. David Minnen, Johannes Balle, and George D Toderici, "Joint autoregressive and hierarchical priors for learned imagecompression," In Advances in Neural Information Processing Systems, pages 10771-10780, 2018.
  10. Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al., "Conditional image generation with PixelCNN decoders," In Advances in neural information processing systems, pages 4790-4798, 2016.
  11. Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto, "Learned image compression with discretized gaussian mixture likelihoods and attention modules," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7939-7948, 2020.
  12. Z. Cheng, H. Sun, M. Takeuchi, J. Katto, "Deep ResidualLearning for Image Compression," CVPR Workshop, pp. 1-4, June 16-20, 2019.
  13. Y.Zhang, K. Li, K. Li, B. Zhong, Y. Fu, "Residual Nonlocal Attention Networks for Image Restoration," International Conference on Learning Representations, pp. 1-18, 2019
  14. Reynolds, Douglas. (2008), "Gaussian Mixture Models," Encyclopedia of Biometrics, 10.1007/978-0-387-73003-5_196.
  15. Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao, "DVC: An end-to-end deep video compression framework," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages, 11006-11015, 2019.
  16. Anurag Ranjan and Michael J Black, "Optical flow estimation using a spatial pyramid network," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4161-4170, 2017.
  17. Abdelaziz Djelouah, Joaquim Campos, Simone Schaub-Meyer, and Christopher Schroers, "Neural inter-frame compression for video coding," In Proceedings of the IEEE International Conference on Computer Vision, pages 6421-6429, 2019.
  18. Ren Yang, Fabian Mentzer, Luc Van Gool, and Radu Timofte, "Learning for video compression with hierarchical quality and recurrent enhancement," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6628-6637, 2020.
  19. Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Balle, Sung Jin Hwang, and George Toderici, "Scale-space flow for end-to-end optimized video compression," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8503-8512, 2020.
  20. Fabian Mentzer, Eirikur Agustsson, Johannes Balle, David Minnen, Nick Johnston and George Toderici, "Towards Generative Video Compression," arXiv, 2021, https://arxiv.org/abs/2107.12038 (accessed Aug. 26, 2021).
  21. Goodfellow, Ian J., Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville and Yoshua Bengio, "Generative Adversarial Nets," NeurIPS, 2014.
  22. Fabian Mentzer, George D Toderici, Michael Tschannen, and Eirikur Agustsson, "High-fidelity generative image compression," Advances in Neural Information Processing Systems, 33, 2020
  23. David Minnen, Johannes Balle, and George Toderici, "Joint autoregressive and hierarchical priors for learned image compression," In Advances in Neural Information Processing Systems, pages 10771-10780, 2018.
  24. Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee, "Variable rate deep image compression with a conditional autoencoder," In Proceedings of the IEEE International Conference on Computer Vision, pages 3146-3154, 2019.
  25. Compressai, https://interdigitalinc.github.io/CompressAI/index.html# (accessed Nov. 24, 2021).
  26. Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Free man, "Video enhancement with task-oriented flow," International Journal of Computer Vision, 127(8):1106-1125, 2019. https://doi.org/10.1007/s11263-018-01144-2
  27. Diederik P Kingma and Jimmy Ba, "Adam: A method for stochastic optimization," arXiv preprint 2014, https://arxiv.org/abs/2107.12038 (accessed Nov. 24, 2021).
  28. Alexandre Mercat, Marko Viitanen, and Jarno Vanne, "Uvg dataset: 50/120fps 4k sequences for video codec analysis anddevelopment," In Proceedings of the 11th ACM Multimedia Systems Conference, pages 297-302, 2020.
  29. Oren Rippel, Alexander G. Anderson, Kedar Tatwawadi, Sanjay Nair, Craig Lytle and Lubomir Bourdev, "ELF-VC: Efficient Learned Flexible-Rate Video Coding," arXiv preprint, 2021, https://arxiv.org/abs/2104.14335 (accessed Nov. 24, 2021).