Pyramid Feature Compression with Inter-Level Feature Restoration-Prediction Network

Kim, Minsub;Sim, Donggyu;

doi:10.5909/JBE.2022.27.3.283

방송공학회논문지 (Journal of Broadcast Engineering)

제27권3호
/
Pages.283-294
/
2022
/
1226-7953(pISSN)
/
2287-9137(eISSN)

한국방송∙미디어공학회 (The Korean Institute of Broadcast and Media Engineers)

DOI QR Code

계층 간 특징 복원-예측 네트워크를 통한 피라미드 특징 압축

Pyramid Feature Compression with Inter-Level Feature Restoration-Prediction Network

김민섭 (광운대학교 컴퓨터공학과) ;
심동규 (광운대학교 컴퓨터공학과)

Kim, Minsub (Department of Computer Engineering, Kwangwoon University) ;
Sim, Donggyu (Department of Computer Engineering, Kwangwoon University)

투고 : 2022.04.12
심사 : 2022.05.03
발행 : 2022.05.30

https://doi.org/10.5909/JBE.2022.27.3.283 인용 PDF KSCI KPUBS

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

딥 러닝 네트워크에서 사용되는 특징 맵은 일반적으로 영상보다 데이터가 크며 특징 맵을 전송하기 위해서는 영상의 압축률보다 더 높은 압축률이 요구된다. 본 논문은 딥러닝 기반의 영상처리에서 객체의 크기에 대한 강인성을 가지는 FPN 구조의 네트워크에서 사용되는 피라미드 특징 맵을 높은 압축률로 전송하기 위해 제안한 복원-예측 네트워크를 통해 전송된 일부 계층의 피라미드 특징 맵으로 전송하지 않은 계층의 피라미드 특징 맵을 예측하며, 압축으로 인한 손상을 복원하는 구조를 제안한다. 제안한 방법의 COCO 데이터셋 2017 Train images에 대한 객체 탐지의 성능은 rate-precision 그래프에서 VTM12.0을 통해 특징 맵을 압축한 결과 대비 BD-rate 31.25%의 성능향상을 보였고, PCA와 DeepCABAC을 통한 압축을 수행한 방법 대비 BD-rate 57.79%의 성능향상을 보였다.

The feature map used in the network for deep learning generally has larger data than the image and a higher compression rate than the image compression rate is required to transmit the feature map. This paper proposes a method for transmitting a pyramid feature map with high compression rate, which is used in a network with an FPN structure that has robustness to object size in deep learning-based image processing. In order to efficiently compress the pyramid feature map, this paper proposes a structure that predicts a pyramid feature map of a level that is not transmitted with pyramid feature map of some levels that transmitted through the proposed prediction network to efficiently compress the pyramid feature map and restores compression damage through the proposed reconstruction network. Suggested mAP, the performance of object detection for the COCO data set 2017 Train images of the proposed method, showed a performance improvement of 31.25% in BD-rate compared to the result of compressing the feature map through VTM12.0 in the rate-precision graph, and compared to the method of performing compression through PCA and DeepCABAC, the BD-rate improved by 57.79%.

키워드

과제정보

이 논문은 2022년도 광운대학교 교내학술연구비 지원 및 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업(NRF-2021R1A2C2092848)의 지원을 받아 작성되었습니다.

참고문헌

Y. LeCun, Y. Bengio, G. E. Hinton, "Deep learning," Nature, vol. 512, pp. 436-444, 2015. doi: https://doi.org/10.1038/nature14539
M. F. Mahmood, N. Hussin, "Information in conversion era: Impact and influence from 4th industrial revolution," International Journal of Academic Research in Business and Social Sciences, Vol.8, No.9, pp. 320-328, 2018. doi: https://doi.org/10.6007/IJARBSS/v8-i9/4594
G. Sullivan, J. Ohm, W. Han, and T. Wiegand, "Overview of the high efficiency video coding (HEVC) standard," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, No. 12, pp. 1649-1668, Dec. 2012. doi: https://doi.org/10.1109/TCSVT.2012.2221191
B. Bross, Y. K. Wang, Y. Ye, S. Liu, J. Chen, "Overview of the versatile video coding (VVC) standard and its applications," IEEE Transactions on Circuits and Systems for Video Technology, Vol 31, No 10, pp. 3736-3764, 2021. doi: https://doi.org/10.1109/TCSVT.2021.3101953
S. Wang, Z. Wang, Y. Ye, S. Wang, "[VCM] Investigation on feature map layer selection for object detection and compression," ISO/IEC JTC 1/SC 29/WG 2, m55787, Online, Dec. 2020.
Vedeo Coding for Machines, https://mpeg.chiariglione.org/standards/exploration/video-coding-machines (accessed July. 2019).
VTM12.0, https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-12.0 (accessed Nov. 26, 2021).
Y. Lee, S., K. Yoon, H. Lim, H. Choo, W. Cheong, J. Seo, "[VCM] Updated FLIR Anchor results for object detection," ISO/IEC JTC 1/SC29/WG 2, m57375, Online, Jul. 2021.
S. Wang, Z. Wang, Y. Ye, S. Wang, "[VCM] End-to-end image compression towards machine vision for object detection," ISO/IEC JTC 1/SC 29/WG 2, m57500, Online, Jul. 2021.
M. Lee, H. Choi, S. Park, M. Kim, "[VCM] A feature map compression based on optimal transformation with VVC and DeepCABAC for VCM," ISO/IEC JTC 1/SC 29/WG 2, m58022, Online, October. 2021.
D. Gwak, C. Kim, J. Lim, "[VCM track 1] Feature data compression based on generalized PCA for object detection," ISO/IEC JTC 1/SC 29/WG 2, m58785, Online, Jan. 2022.
T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, S. Belongie, "Feature pyramid networks for object detection." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117-2125, July. 2017. doi: https://doi.org/10.48550/arXiv.1612.03144
S. Wiedemann et al., "DeepCABAC: A universal compression algorithm for deep neural networks," IEEE J. Sel. Topics Signal Process., Vol. 14, No. 4, pp. 700-714, May 2020. doi: https://doi.org/10.1109/JSTSP.2020.2969554
COCO2017 validation set, https://cocodataset.org/#download (accessed Nov. 26, 2021).
G. Bjontegaard, "Calculation of average PSNR differences between RDcurves," Tech. Rep. VCEGM33, Video Coding Experts Group (VCEG), 2001. doi: https://doi.org/10.3169/itej.67.529
K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, June. 2016. doi: https://doi.org/10.1109/cvpr.2016.90
S. Xie, R. Girshick, P. Dollar, Z. Tu, K. He, "Aggregated Residual Transformations for Deep Neural Networks," arXiv, 2017. doi: https://doi.org/10.1109/cvpr.2017.634
Detectron2, https://github.com/facebookresearch/detectron2 (accessed 2019).
V. Nair, G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," International Conference on Machine Learning, June. 2010. doi:https://dl.acm.org/doi/10.5555/3104322.3104425
J. Balle, V. Laparra, E. P. Simoncelli, "Density modeling of images using a generalized normalization transformation," In 4th International Conference on Learning Representations, May. 2016. doi: https://doi.org/10.48550/arXiv.1511.06281
J. Balle, V. Laparra, E. P. Simoncelli, "End-to-end optimized image compression," In 5th International Conference on Learning Representations, May. 2017. doi: https://doi.org/10.48550/arXiv.1611.01704
K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, W. Zuo, "End-to-end blind image quality assessment using deep neural networks," IEEE Transactions on Image Processing, Vol.27, No.3, pp. 1202-1213, 2017. doi: https://doi.org/10.1109/tip.2017.2774045
J. Lee, S. Cho, H. Y. Kim, J. S. Choi, "A study on nonlinear transform layers in neural networks for image compression," In Proceedings of the Korean Society of Broadcast Engineers Conference, The Korean Institute of Broadcast and Media Engineers, pp. 267-269, 2018. doi: https://www.koreascience.or.kr/article/CFKO201815540966800
J. Balle, P. A. Chou, D. Minnen, S. Singh, N. Johnston, E. Agustsson, G. Toderici, "Nonlinear transform coding," IEEE Journal of Selected Topics in Signal Processing, Vol.15, No.2, pp. 339-353, 2021. doi: https://doi.org/10.1109/JSTSP.2020.3034501
S. Ren, K. He, R. Girshick, J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems, pp. 91-99, 2015. doi: https://doi.org/10.1109/tpami.2016.2577031

방송공학회논문지 (Journal of Broadcast Engineering)

계층 간 특징 복원-예측 네트워크를 통한 피라미드 특징 압축

Pyramid Feature Compression with Inter-Level Feature Restoration-Prediction Network

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)