DOI QR코드

DOI QR Code

Reinforcement Learning based Inactive Region Padding Method

강화학습 기반 비활성 영역 패딩 기술

  • 김동신 (한국항공대학교 항공전자정보공학부) ;
  • 우딘 쿠툽 (한국항공대학교 항공전자정보공학부) ;
  • 오병태 (한국항공대학교 항공전자정보공학부)
  • Received : 2021.09.14
  • Accepted : 2021.10.26
  • Published : 2021.09.30

Abstract

Inactive region means a region filled with invalid pixel values to represent a specific image. Generally, inactive regions are occurred when the non-rectangular formatted images are converted to the rectangular shaped image, especially when 3D images are represented in 2D format. Because these inactive regions highly degrade the compression efficiency, filtering approaches are often applied to the boundaries between active and inactive regions. However, the image characteristics are not carefully considered during filtering. In the proposed method, inactive regions are padded through reinforcement learning that can consider the compression process and the image characteristics. Experimental results show that the proposed method performs an average of 3.4% better than the conventional padding method.

비활성 영역이란 특정 영상을 표현하기 위해 유효하지 않은 화소 값으로 채워지는 영역을 의미한다. 일반적으로 원본 영상의 형태가 사각형 형태가 아닌 경우 이를 사각형 형태로 변환하는 과정에서 주로 발생하며, 특히 3D 영상을 2D로 표현할 때 자주 발생한다. 이러한 비활성 영역은 압축 효율을 크게 저하시키기 때문에, 활성 영역과 비활성 영역의 경계 부분에 필터링 기술 등을 적용해 해결해 왔다. 하지만 일반적인 필터링 적용 기술은 영상의 특성을 적절하게 반영하지 못할 가능성이 크다. 제안하는 기법에서는 영상의 특성과 압축 과정을 고려한 강화학습을 통한 패딩을 진행하였다. 실험결과 제안한 기법이 기존 기법보다 평균 3.4% 성능이 향상됨을 확인할 수 있다.

Keywords

Acknowledgement

본 연구는 2020년도 정부(교육부)의 재원으로 한국연구재단 기초연구사업(NRF-2019R1F1A1063229)과 경기도 지역협력 연구센터 사업 (GRRC)(2017-B02, 3차원 공간 데이터 처리 및 응용기술 연구)의 지원을 받아 수행되었음.

References

  1. Y. Ye, E. Alshina, and J. Boyce, "Algorithm descriptions of projection format conversion and video quality metrics in 360Lib (Version 5)," Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JVET-H1004, Oct. 2017.
  2. B. Salahieh, B. Kroon, J. Jung, M. Domanski (Eds.), "Test model 2 for Immersive Video," ISO/IEC JTC1/SC29/WG11, N18577, July 2019.
  3. Y.-H. Lee, H.-C. Lin, J.-L. Lin, S.-K. Chang, C.-C. Ju, "EE4: ERP/EAP-based segmented sphere projection with different padding sizes," Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JVET-G0097, Jul. 2017.
  4. A. Abbas, "AHG8: An Update on RSP Projection," Joint Video Exploration Team of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JVET-H0056, Oct. 2017.
  5. G. Sullivan, J. Ohm, W. Han, and T. Wiegand, "Overview of the High Efficiency Video Coding (HEVC) Standard", IEEE Transactions on Circuits and Systems for Video Technology, Vol.22, No.12, pp.1649-1668, December 2012. https://doi.org/10.1109/TCSVT.2012.2221191
  6. J. Lee, J. Park, H. Choi, J. Byeon, and D. Sim, "Overview of VVC", Broadcasting and Media Magazine, Vol.24, No.4, pp.10-25, October 2019.
  7. Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems 27, 2014.
  8. Eldesokey, Abdelrahman, Michael Felsberg, and Fahad Shahbaz Khan. "Confidence propagation through cnns for guided sparse depth regression." IEEE transactions on pattern analysis and machine intelligence 42.10: 2423-2436, 2019. https://doi.org/10.1109/tpami.2019.2929170
  9. Takeda, Hiroyuki, Sina Farsiu, and Peyman Milanfar. "Kernel regression for image processing and reconstruction." IEEE Transactions on image processing 16.2: 349-366, 2007. https://doi.org/10.1109/TIP.2006.888330
  10. Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
  11. Bjontegaard, G. "Calculation of average PSNR differences between RD-curves." VCEG-M33, 2001.
  12. M. Yu, H. Lakshman, and B. Girod, "A framework to evaluate omnidirectional video coding schemes," in IEEE International Symposium on Mixed and Augmented Reality, pp. 31-36, 2015.
  13. Y. Sun, A. Lu, and L. Yu, "AHG8: WS-PSNR for 360 video objective quality evaluation," in Joint Video Exploration Team of ITU-T SG16WP3 and ISO/IEC JTC1/SC29/WG11, JVET-D0040, Chengdu, 2016.