DOI QR코드

DOI QR Code

Spatiotemporal Saliency-Based Video Summarization on a Smartphone

스마트폰에서의 시공간적 중요도 기반의 비디오 요약

  • Lee, Won Beom (School of Information and Communication Engineering, Inha University) ;
  • Williem, Williem (School of Information and Communication Engineering, Inha University) ;
  • Park, In Kyu (School of Information and Communication Engineering, Inha University)
  • 이원범 (인하대학교 정보통신공학부) ;
  • 윌리엄 (인하대학교 정보통신공학부) ;
  • 박인규 (인하대학교 정보통신공학부)
  • Received : 2012.08.20
  • Accepted : 2013.03.04
  • Published : 2013.03.30

Abstract

In this paper, we propose a video summarization technique on a smartphone, based on spatiotemporal saliency. The proposed technique detects scene changes by computing the difference of the color histogram, which is robust to camera and object motion. Then the similarity between adjacent frames, face region, and frame saliency are computed to analyze the spatiotemporal saliency in a video clip. Over-segmented hierarchical tree is created using scene changes and is updated iteratively using mergence and maintenance energies computed during the analysis procedure. In the updated hierarchical tree, segmented frames are extracted by applying a greedy algorithm on the node with high saliency when it satisfies the reduction ratio and the minimum interval requested by the user. Experimental result shows that the proposed method summaries a 2 minute-length video in about 10 seconds on a commercial smartphone. The summarization quality is superior to the commercial video editing software, Muvee.

본 논문에서는 스마트폰의 플랫폼으로 하여 시공간적 중요도 기반으로 비디오를 요약하는 효율적인 기법을 제안한다. 제안하는 기법은 주어진 비디오에서 카메라 및 물체의 움직임에 강건한 색상 히스토그램의 차분으로 장면 전환을 검출하고 연속적인 프레임간의 유사성, 얼굴의 영역, 개별 프레임(frame)의 중요도를 통해 시공간적 중요도를 분석한다. 그리고 검출된 장면 전환을 이용하여 과분할된 계층적 트리를 생성하고 비디오 분석 과정에서 계산한 병합 및 유지 에너지를 이용하여 반복적으로 갱신한다. 또한 갱신된 계층적 트리에서 사용자가 요구하는 재생 길이와 최소 구간 길이를 충족하고 동시에 높은 중요도를 가진 노드들로부터 탐욕 알고리즘(greedy algorithm)을 통해 프레임을 추출한다. 실험 결과 상용 스마트폰에서 2분길이 분량의 입력 비디오를 10초 내외의 수행시간으로 요약할 수 있었으며, 그 결과는 상용 비디오 편집 소프트웨어인 Muvee보다 우수함을 보였다.

Keywords

References

  1. E. P. Bennett and L. McMillan, "Computational time-lapse video," ACM Trans. on Graphics, vol. 26, no. 3, Article No. 102, July 2007.
  2. D. DeMenthon, V. Kobla, and D. Doermann, "Video summarization by curve simplification," Proc. of 6th ACM Conference on Multimedia, September 1998.
  3. Y. Zhuang, Y. Rui, T. S. Huang, and S. Mehrotra, "Adaptive key frame extraction using unsupervised clustering," Proc. of International Conference on Image Processing, vol. 1, pp. 866-870, October 1998.
  4. A. Hanjalic and H. Zhang, "An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis," IEEE Trans. on Circuits and Systems for Video Technology, vol. 9, no. 8, pp. 1280-1289, December 1999. https://doi.org/10.1109/76.809162
  5. H. S. Chang, S. Sull, and S. U. Lee, "Efficient video indexing scheme for content-based retrieval," IEEE Trans. on Circuits and Systems for Video Technology, vol. 9, no. 8, pp. 1269-1279, December 1999. https://doi.org/10.1109/76.809161
  6. H. W. Kang, X. Q. Chen, Y. Matsushita, and X. Tang, "Space-time video montage," Proc. IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1331-1338, June 2006.
  7. Y. Pritch, A. Rav-Acha, and S. Peleg, "Nonchronological video synopsis and indexing," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1971-1984, November 2008. https://doi.org/10.1109/TPAMI.2008.29
  8. T. Mei, B. Yang, S. Q. Yang, and X. S. Hua, "Video collage: presenting a video sequence using a single image," The Visual Computer, vol. 25, no. 1, pp. 39-51, December 2008.
  9. C. W. Ngo, Y. F. Ma, and H. J. Zhang, "Video summarization and scene detection by graph modeling," IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, no. 2, pp. 296-305, February 2005. https://doi.org/10.1109/TCSVT.2004.841694
  10. Y. Gong and X. Liu, "Video summarization using singular value decomposition," Proc. IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 174-180, June 2000.
  11. B. Yu, W. Y. Ma, K. Nahrstedt, and H. J. Zhang, "Video summarization based on user log enhanced link analysis," Proc. of 11th ACM Conference on Multimedia, pp. 382-391, November 2003.
  12. L. Herranz and J. M. Martinez, "A framework for scalable summarization of video," IEEE Trans. on Circuits and Systems for Video Technology, vol. 20, no. 9, pp. 1265-1270, September 2010. https://doi.org/10.1109/TCSVT.2010.2057020
  13. Y. Fu, Y. Guo, Y. Zhu, F. Liu, C. Song, and Z. H. Zhou, "Multi-view video summarization," IEEE Trans. on Multimedia, vol. 12, no. 7, pp. 717-729, November 2010. https://doi.org/10.1109/TMM.2010.2052025
  14. B. L. Tseng, C. Y. Lin, and J. R. Smith, "Video summarization and personalization for pervasive mobile devices," Proc. SPIE : Storage and Retrieval for Media Databases, vol. 4676, pp. 359-784, December 2001.
  15. L. Itti, C. Koch, and E. Niebur, "A model of saliency based visual attention for rapid scene analysis," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254-1259, November 1998. https://doi.org/10.1109/34.730558
  16. P. Viola and M. Jones, "Robust real-time face detection," International Journal of Computer Vision, vol. 57, no. 2, pp. 137-154, May 2004. https://doi.org/10.1023/B:VISI.0000013087.49260.fb
  17. L. Xu, C. Lu, Y. Xu, and J. Jia, "Image smoothing via $L_0$ gradient minimization," ACM Trans. on Graphics, vol. 30, no. 6, Article No. 174, December 2011.
  18. Muvee, http://www.muvee.com/en/

Cited by

  1. User-centred personalised video abstraction approach adopting SIFT features vol.76, pp.2, 2017, https://doi.org/10.1007/s11042-015-3210-4