DOI QR코드

DOI QR Code

Evaluation of Video Codec AI-based Multiple tasks

인공지능 기반 멀티태스크를 위한 비디오 코덱의 성능평가 방법

  • Kim, Shin (Dept. of Computer Science and Engineering, Konkuk University) ;
  • Lee, Yegi (Dept. of Computer Science and Engineering, Konkuk University) ;
  • Yoon, Kyoungro (Dept. of Computer Science and Engineering, Konkuk University) ;
  • Choo, Hyon-Gon (Immersive Media Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Lim, Hanshin (Immersive Media Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Seo, Jeongil (Immersive Media Research Laboratory, Electronics and Telecommunications Research Institute)
  • 김신 (건국대학교 컴퓨터공학과) ;
  • 이예지 (건국대학교 컴퓨터공학과) ;
  • 윤경로 (건국대학교 컴퓨터공학과) ;
  • 추현곤 (한국전자통신연구원 실감미디어연구실) ;
  • 임한신 (한국전자통신연구원 실감미디어연구실) ;
  • 서정일 (한국전자통신연구원 실감미디어연구실)
  • Received : 2022.04.12
  • Accepted : 2022.05.06
  • Published : 2022.05.30

Abstract

MPEG-VCM(Video Coding for Machine) aims to standardize video codec for machines. VCM provides data sets and anchors, which provide reference data for comparison, for several machine vision tasks including object detection, object segmentation, and object tracking. The evaluation template can be used to compare compression and machine vision task performance between anchor data and various proposed video codecs. However, performance comparison is carried out separately for each machine vision task, and information related to performance evaluation of multiple machine vision tasks on a single bitstream is not provided currently. In this paper, we propose a performance evaluation method of a video codec for AI-based multi-tasks. Based on bits per pixel (BPP), which is the measure of a single bitstream size, and mean average precision(mAP), which is the accuracy measure of each task, we define three criteria for multi-task performance evaluation such as arithmetic average, weighted average, and harmonic average, and to calculate the multi-tasks performance results based on the mAP values. In addition, as the dynamic range of mAP may very different from task to task, performance results for multi-tasks are calculated and evaluated based on the normalized mAP in order to prevent a problem that would be happened because of the dynamic range.

MPEG 내 VCM 그룹은 머신을 위한 비디오 코덱을 표준화하는 것으로 목표로 하고 있다. VCM 그룹은 객체 탐지, 객체 분할, 객체 추적 등 3가지의 머신비전 태스크를 포함한 데이터 세트와 데이터 세트 별 기준 데이터인 Anchor를 제공하고 있으며, 평가 템플릿을 이용하여 후보 기술군과 Anchor의 압축 대비 머신비전 성능을 비교할 수 있다. 하지만 성능 비교는 머신비전 태스크 별로 분리하여 수행되고 있으며, 다수의 머신비전 태스크에 대한 성능 평가를 수행할 수 있는 비트스트림을 생성할 수 있는 데이터는 별도로 제공하고 있지 않다. 본 논문에서는 인공 지능 기반 멀티 태스크를 위한 비디오 코덱의 성능 평가 방안에 대해 제안한다. 하나의 비트스트림의 크기 척도인 픽셀 당 비트수(BPP, Bits Per Pixel) 와 각 태스크의 정확도 결과인 Mean Average Precision(mAP)를 기반으로 산술 평균, 가중 평균, 조화 평균 등 총 3가지의 멀티 태스크 성능 평가 지표를 제안하며 mAP 결과를 기반으로 성능 결과를 비교하고자 한다. 멀티 태스크에서 태스크 별 mAP 결과 값의 범위의 차이가 있을 수 있으며 차이로 인해 생길 수 있는 성능 평가와 관련된 문제를 방지하고자 정규화한 mAP 기반 멀티 태스크 성능 결과를 산출하고 평가하고자 한다.

Keywords

Acknowledgement

본 연구 논문은 과학기술정보통신부 및 정보통신기획평가원의 출연금으로 수행되고 있는 한국전자통신연구원 "기계를 위한 영상 부호화 기술 개발"(2020-0-00011)의 연구결과입니다.

References

  1. ISO/IEC 23090-3, "2021 Information technology - Coded representation of immersive media - Part 3:Versatile video Coding" https://www.iso.org/standard/73022.html
  2. ISO/IEC JTC 1/SC 29/WG 2, "Common Test Conditions and Evaluation Methodology for Video Coding for Machines," the 137th MPEG meeting, January 2022. https://dms.mpeg.expert/doc_end_user/documents/137_OnLine/wg11/MDS21288_WG02_N00163.zip
  3. Free FLIR Thermal dataset, https://www.flir.com/oem/adas/dataset/ (accessed Jan, 8, 2020).
  4. X. Xu, S. Liu and Z. Li, "Tencent Video Dataset (TVD): A Video Dataset for Learning-based Visual Data Compression and Analysis", arXiv:2105.05961, May 2021. doi: https://doi.org/10.48550/arXiv.2105.05961
  5. Open Images V6, https://storage.googleapis.com/openimages/web/index. html (accessed Mar, 1, 2020)
  6. T. Takehiro, H. Choi, and I. V. Bajic. "SFU-HW-Tracks-v1: Object Tracking Dataset on Raw Video Sequences." arXiv preprint arXiv:2112.14934 , 2021. doi: https://doi.org/10.48550/arXiv.2112.14934
  7. ISO/IEC JTC 1/SC 29/WG 2, "Call for Evidence for Video Coding for Machines", the 133rd MPEG meeting, January 2021. https://dms.mpeg.expert/doc_end_user/documents/133_OnLine/wg11/MDS20126_WG02_N00042.zip
  8. B. Zhu, L. Yu, D. Li and Y. Pan, "[VCM] ZJU response to cfe: deep learning-based compression for machine vision ", the 134th MPEG meeting, April 2021. https://dms.mpeg.expert/doc_end_user/documents/134_OnLine/wg11/m56445-v3-m56445[VCM]ZJUresponsetocfe.zip
  9. Y. Lee, S. Kim, K. Yoon, H. Lim, H. Choo, W. Cheong and J. Seo, "[VCM] Response to CfE: Object detection results with the FLIR dataset," the 134th MPEG meeting, April 2021. https://dms.mpeg.expert/doc_end_user/documents/134_OnLine/wg11/m56572-v1-m56572_v2.zip
  10. S. Ren, K. He, R. Girshick and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, 28, 2015. https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf
  11. VTM 12.0/VVCSoftware_VTM, https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-12.0 (accessed April, 1, 2021) https://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf
  12. T. Y. Lin, P. Dollar, R. Girshick, He, B. Hariharan and S. Belongie, "Feature pyramid networks for object detection," In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2117-2125, 2017. https://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf
  13. K. He, G. Gkioxari, P. Dollar and R. Girshick, "Mask r-cnn," In Proceedings of the IEEE international conference on computer vision, pp. 2961-2969, 2017. https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48
  14. W. Gao, X. Xu, S. Liu and M. Qin, "[VCM] TVD dataset for Object Segmentation", the 135th MPEG meeting, July 2021.
  15. W. Gao, X. Xu and S. Liu "[VCM] Updated anchor results for object detection using TVD dataset", the 135th MPEG meeting, July 2021.
  16. T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D and Zitnick, C. L, "Microsoft coco: Common objects in context," In European conference on computer vision, Springer, Cham, pp. 740-755, 2014.