DOI QR코드

DOI QR Code

Video Highlight Prediction Using GAN and Multiple Time-Interval Information of Audio and Image

오디오와 이미지의 다중 시구간 정보와 GAN을 이용한 영상의 하이라이트 예측 알고리즘

  • Lee, Hansol (Dept. of Media IT Engineering, Graduate School, Seoul National University of Science and Technology) ;
  • Lee, Gyemin (Dept. of Media IT Engineering, Graduate School, Seoul National University of Science and Technology)
  • 이한솔 (서울과학기술대학교 일반대학원 미디어IT공학과) ;
  • 이계민 (서울과학기술대학교 일반대학원 미디어IT공학과)
  • Received : 2019.12.18
  • Accepted : 2020.02.04
  • Published : 2020.03.30

Abstract

Huge amounts of contents are being uploaded every day on various streaming platforms. Among those videos, game and sports videos account for a great portion. The broadcasting companies sometimes create and provide highlight videos. However, these tasks are time-consuming and costly. In this paper, we propose models that automatically predict highlights in games and sports matches. While most previous approaches use visual information exclusively, our models use both audio and visual information, and present a way to understand short term and long term flows of videos. We also describe models that combine GAN to find better highlight features. The proposed models are evaluated on e-sports and baseball videos.

최근 다양한 매체를 통해 폭발적인 양의 콘텐츠가 업로드 되고 있으며 그 가운데 게임과 스포츠 영상은 상당한 비율을 차지한다. 방송사에서는 시청자 편의를 위해 경기 영상 중 흥미를 끄는 장면을 모아 하이라이트 영상을 만들어 제공한다. 그러나 이는 시간과 비용이 많이 소요되는 문제가 있다. 본 논문에서는 게임과 스포츠 경기에서 자동으로 하이라이트를 예측하는 모델을 제안한다. 기존의 방법들이 이미지 정보만을 주로 이용하는데 반해 우리는 오디오와 이미지 정보를 함께 사용하며, 영상의 단기적 전후관계와 중장기적 흐름을 동시에 파악하는 방법을 제시한다. 또한 더 좋은 특징벡터를 찾아내기 위해 GAN을 결합한 모델을 설명한다. 제안하는 모델들은 e스포츠 경기 영상과 야구 경기 영상을 이용하여 평가한다.

Keywords

References

  1. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative Adversarial Nets," In NIPS, pp. 2672-2680, 2014, http://papers.nips.cc/paper/5423-generative-adversarial-nets.
  2. K. Zhang, WL. Chao, F. Sha, and K. Grauman, "Video Summarization with Long Short-term Memory," European Conference on Computer Vision, Amsterdam, Netherlands, pp. 766-782, 2016, doi:10.1007/978-3-319-46478-7_47.
  3. B. Mahasseni, M. Lam, and S. Todorovic, "Unsupervised Video Summarization with Adversarial LSTM Networks," The IEEE Conference on Computer Vision and Pattern Recognition, pp. 2982-2991, 2017, doi: https://doi.org/10.1109/cvpr.2017.318.
  4. K. Zhang, K. Grauman, and F. Sha, "Retrospective Encoders for Video Summarization," In ECCV, pp. 383-399, 2018, doi: https://doi.org/10.1007/978-3-030-01237-3_24.
  5. K. Zhou, Y. Qiao, and Tao Xiang, "Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward," In Thirty-Second AAAI Conference on Artificial Intelligence, pp. 7582-7589, 2018, https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/16395.
  6. H. Lee, G. Lee, "Summarizing Long-Length Videos with GANEnhanced Audio/Visual Features," In ICCV workshop, 2019.
  7. E. Kim, G. Lee, "Highlight Detection in Personal Broadcasting by Analysing Chat Traffic : Game Contests as a Test Case," Journal of Broadcast Engineering, Vol. 23, No. 2, pp. 218-226, 2018, doi: http://dx.doi.org/10.5909/JBE.2018.23.2.218.
  8. E. Kim, G. Lee, "Video Highlight Prediction Using Multiple Time-Interval Information of Chat and Audio," Journal of Broadcast Engineering, Vol. 24, No. 4, pp. 553-563, 2019, https://doi.org/10.5909/JBE.2019.24.4.1.
  9. Twitch, https://www.twitch.tv/ (accessed Dec. 23, 2019).
  10. Kakao TV, https://tv.kakao.com/ (accessed Dec. 23, 2019).
  11. A. Krizhevsky, I. Sutskever, and G. Hinton, "Imagenet Classification with Deep Convolutional Neural Networks," In NIPS, 2012, doi: https://doi.org/10.1145/3065386.
  12. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," In CVPR, pp. 770-778, 2016, doi: https://doi.org/10.1109/cvpr.2016.90.
  13. OGN, http://ogn.tving.com/ (accepted Dec. 23, 2019).
  14. Naver-sports, https://sports.news.naver.com/(accepted Dec. 23, 2019).