Compression Method for MPEG CDVA Global Feature Descriptors

MPEG CDVA 전역 특징 서술자 압축 방법

  • Kim, Joonsoo (Electronics and Telecommunications Research Institute) ;
  • Jo, Won (Department of Artificial Intelligence, Sejong University) ;
  • Lim, Guentaek (Department of Intelligent Mechatronics Engineering, Sejong University) ;
  • Yun, Joungil (Electronics and Telecommunications Research Institute) ;
  • Kwak, Sangwoon (Electronics and Telecommunications Research Institute) ;
  • Jung, Soon-heung (Electronics and Telecommunications Research Institute) ;
  • Cheong, Won-Sik (Electronics and Telecommunications Research Institute) ;
  • Choo, Hyon-Gon (Electronics and Telecommunications Research Institute) ;
  • Seo, Jeongil (Electronics and Telecommunications Research Institute) ;
  • Choi, Yukyung (Department of Intelligent Mechatronics Engineering, Sejong University)
  • Received : 2022.04.12
  • Accepted : 2022.05.13
  • Published : 2022.05.30


In this paper, we propose a novel compression method for scalable Fisher vectors (SCFV) which is used as a global visual feature description of individual video frames in MPEG CDVA standard. CDVA standard has adopted a temporal descriptor redundancy removal technique that takes advantage of the correlation between global feature descriptors for adjacent keyframes. However, due to the variable length property of SCFV, the temporal redundancy removal scheme often results in inferior compression efficiency. It is even worse than the case when the SCFVs are not compressed at all. To enhance the compression efficiency, we propose an asymmetric SCFV difference computation method and a SCFV reconstruction method. Experiments on the FIVR dataset show that the proposed method significantly improves the compression efficiency compared to the original CDVA Experimental Model implementation.

본 논문은 동영상의 시각적 특징을 추출하는 MPEG CDVA 표준 기술에서 개별 프레임의 전역적인 특징을 표현하는 scalable Fisher vector (SCFV)의 새로운 압축 방법을 제안한다. CDVA 표준은 전역 특징 서술자에 대한 시간적 중복성 제거 기법을 도입하였으며, 구체적으로 부호화 단위 세그먼트 내의 SCFV 들이 서로 유사할 가능성이 높다는 점을 활용하여 SCFV에 대한 차분을 부호화하는 방식을 사용하고 있다. 그러나 SCFV의 구조적 특징에 의해 SCFV의 차분을 부호화 한 결과물이 원본 데이터보다도 용량이 큰 경우가 발생하게 된다. 이와 같은 현상을 방지하기 위해 비대칭적 SCFV의 차분 계산 방법과 변경된 SCFV 차분을 활용하여 원본 SCFV를 복원하는 새로운 방법을 제안하였다. FIVR 데이터셋을 활용한 실험결과는 전역 특징 서술자의 압축 효율이 기존 CDVA Experimental Model에 대비하여 유의미하게 증가함을 보여준다.



본 논문은 정보통신기획평가원의 지원을 받아 수행된 연구임 (No. 2020-0-00011, 기계를 위한 영상부호화 기술, No. 2021-0-02067, 다목적 비디오 검색을 위한 차세대 인공신경망 기술 개발).


  1. D. Nister, and H. Stewenius, "Scalable Recognition with a Vocabulary Tree," Proceeding of Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, pp. 2161-2168, 2006. doi:
  2. H. Jegou, M. Douze, C. Schmid, and P. Perez, "Aggregating Local Descriptors into a Compact Image Representation," Proceeding of Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, pp. 3304-3311, 2010. doi:
  3. F. Perronnin, Y. Liu, J. Sanchez, and H. Poirier, "Large-Scale Image Retrieval with Compressed Fisher Vectors," Proceeding of Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 3384-3391, 2010. doi:
  4. A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky, "Neural Codes for Image Retrieval," Proceeding of European Conference on Computer Vision, Zurich, Switzerland, pp. 584-599, 2014.
  5. A. Gordo, J. Almazan, J. Revaud, and D. Larlus, "Deep Image Retrieval: Learning Global Representations for Image Search," Proceeding of European Conference on Computer Vision, Amsterdam, Netherlands, pp. 241-257, 2016.
  6. L.-Y. Duan, V. Chandrasekhar, J. Chen, J. Lin, Z. Wang, T. Huang, B. Girod, and W. Gao, "Overview of the MPEG-CDVS Standard," IEEE Transactions on Image Processing, Vol. 25, No. 1, pp. 179-194, Nov. 2015. doi:
  7. L.-Y. Duan, Y. Lou, Y. Bai, T. Huang, W. Gao, V. Chandrasekhar, J. Lin, S. Wang, and A. C. Kot, "Compact Descriptors for Video Analysis: The Emerging MPEG Standard," IEEE MultiMedia, Vol. 26, No. 2, pp. 44-54, Oct. 2018. doi:
  8. Z. Huang, L.-Y. Duan, J. Lin, S. Wang, S. Ma, and T. Huang, "An Efficient Coding Framework for Compact Descriptors Extracted from Video Sequence," Proceeding of IEEE International Conference on Image Processing, Quebec City, QC, Canada, pp. 3822-3826, 2015. doi:
  9. Y Uchida, and S. Sakazawa, "Image Retrieval with Fisher Vectors of Binary Features," Proceeding of Asian Conference on Pattern Recognition, Naha, Japan, pp. 23-28, 2013. doi:
  10. Y. Wu, F. Gao, Y. Huang, J. Lin, V. Chandrasekhar, J. Yuan, and L.-Y. Duan, "Codebook-Free Compact Descriptors for Scalable Visual Search," IEEE Transactions on Multimedia, Vol. 21. No. 2, pp. 388-401, July, 2018. doi:
  11. J. Lin, L.-Y. Duan, S. Wang, Y. Bai, Y. Lou, V. Chandrasekhar, T. Huang, A. Kot, and W. Gao, "HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval," IEEE Transactions on Multimedia, Vol. 19, No. 9, pp. 1968-1983, June, 2017. doi:
  12. M. Bober, W. Bailer, and J. Chen, "Results of the Call for Proposals on CDVA," ISO/IEC JTC1/SC29/WG11, N15938, San Diego, USA, Feb. 2016.
  13. W. Bailer, "JRS response to CDVA Core Experiment 1," ISO/IEC JTC1/SC29/WG11, m38519, Geneva, CH, May. 2016.
  14. Z.Huang, L. Wei, S. Wang, L.-Y. Duan, J. Chen, A. Kot, S. Ma, T. Huang, and W. Gao, "PKU's Response to CDVA CE1," ISO/IEC JTC1/SC29/WG11, m38625, Geneva, CH, May. 2016.
  15. M. Balestri, G. Francini, S. Lepsoy, M. Bober, S. Husain, and S. Paschalakis "BRIDGET Response to the MPEG CfP for Compact Descriptors for Video Analysis (CDVA) - Search and Retrieval," ISO/IEC JTC1/SC29/WG11, m37880, San Diego, USA, Feb. 2016.
  16. Z. Huang, L.-Y. Duan, J. Chen, L. Wei, T. Huang, and W. Gao, "PKU's Response to MPEG CfP for Compact Descriptor for Visual Analysis," ISO/IEC JTC1/SC29/WG11, m37636, San Diego, USA, Feb. 2016.
  17. W. Bailer and S. Wechtitsch, "JRS Response to Call for Proposals for Technologies Compact Descriptors for Video Analysis(CDVA) - Search and Retrieval," ISO/IEC JTC1/SC29/WG11, m37794, San Diego, USA, Feb. 2016.
  18. M. Balestri, G. Francini, S. Lepsoy, M. Bober, and S. Husain, "BRIDGET Report on CDVA Core Experiment 1 (CE1)," ISO/IEC JTC1/SC29/WG11, m38664, Geneva, CH, May. 2016.
  19. G. Kordopatis-Zilos, S. Papadopoulos, I. Patras, and I. Kompatsiaris, "FIVR: Fine-Grained Incident Video Retrieval," IEEE Transactions on Multimedia, Vol. 21, No. 10, pp. 2638-2652, Oct., 2019. doi: 10.1109/TMM.2019.2905741
  20. G. Kordopatis-Zilos, S. Papadopoulos, I. Patras, and I. Kompatsiaris, "ViSiL: Fine-Grained Spatio-Temporal Video Similarity Learning," Proceeding of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, pp. 6351-6360, 2019. doi: 10.1109/ICCV.2019.00645