지능형 미디어 콘텐츠 편집 기술 개발 현황

  • 발행 : 2023.04.30

초록

미디어 콘텐츠 편집 기술은 콘텐츠 제작 과정에 필수적으로 요구되는 기술로, 비디오 특성 기반 장르 분류 기술, 자동 장면 분할 기술, 객체 인식 기술 등으로 구분될 수 있다. 코로나19 이후 미디어 콘텐츠 시장은 폭발적으로 성장하였으며, 인공지능을 활용하여 콘텐츠를 보다 쉽게 제작하려는 요구가 증가하면서 인공지능 기반의 미디어 콘텐츠 제작 및 편집 기술에 대한 연구 개발이 활발히 진행되고 있다. 본고에서는 미디어 콘텐츠 제작 과정에 적용 가능한 인공지능 기반의 미디어 편집 기술에 대한 개발현황에 대하여 살펴본다.

키워드

과제정보

이 논문은 2023년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임(No.2021-0-00804, 학습 기반 연출 기법이 적용된 미디어 제작 기술 개발)

참고문헌

  1. 한국 최초/최대 유튜브 채널분석 소셜러스, "2021 한국 유튜브 분석 보고서," https://socialerus.com/
  2. 최은서, "20시간 일하고 3만 원... 계약서 한 장 없이 헐값 시장에 방치된 영상 편집자들," 한국일보, 2022년 2월 5일자.
  3. S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan, "YouTube-8M: A Large-Scale Video Classification Benchmark," arXiv preprint arXiv:1609.08675, 2016.
  4. F. Mao, X. Wu, H. Xue, and R. Zhang, "Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network," in Proceedings of the European Conference on Computer Vision Workshop (ECCVW), 2018.
  5. S. Bhardwaj, M. Srinivasan, M. M. Khapra, "Efficient Video Classification using Fewer Frames," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  6. W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman, "The Kinetics Human Action Video Dataset," arXiv preprint arXiv:1705.06950, 2017.
  7. J. Carreira, E. Noland, A. Banki-Horvath, C. Hillier, and A. Zisserman, "A Short Note about Kinetics-600," arXiv preprint arXiv:1808.01340, 2018.
  8. J. Carreira, E. Noland, C. Hillier, and A. Zisserman, "A Short Note on the Kinetics-700 Human Action Dataset," arXiv preprint arXiv:1907.06987, 2019.
  9. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
  10. N. Dalal and B. Triggs, "Histogram of Oriented Gradients for Human Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
  11. C. Wei, H. Fan, S. Xie, C.-Y. Wu, A. Yuille, and C. Feichtenhofer, "Masked Feature Prediction for Self-Supervised Visual Pre-Training," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  12. K. He, X. Chen, S. Xie, Y. Li, P. Dolllar, and Ross Girshick, "Masked Autoencoders Are Scalable Vision Learners," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  13. C. Feichtenhofer, H. Fan, Y. Li, and K. He, "Masked Autoencoders As Spatiotemporal Learners," arXiv preprint arXiv:2205.09113, 2022.
  14. Q. Huang, Y. xiong, A. Rao, J. Wang, and D. Lin, "MovieNet: A Holistic Dataset for Movie Understanding," in Proceedings of the European Conference on Computer Vision (ECCV), 2020.
  15. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, "Momentum Contrast for Unsupervised Visual Representation Learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  16. S. Chen, X. Nie, D. Fan, D. Zhang, V. Bhat, and R. Hamid, "Shot Contrastive Self-Supervised Learning for Scene Boundary Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  17. H. Wu, K. Chen, Y. Luo, R. Qiao, B. Ren, H. Liu, W. Xie, and L. Shen, "Scene Consistency Representation Learning for Video Scene Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  18. M. Z. Shou, S. W. Lei, W. Wang, D. Ghadiyaram, and M. Feiszli, "Generic Event Boundary Detection: A Benchmark for Event Segmentation," in Proceedings of the IEEE International Conference on Computer (ICCV), 2021.
  19. C. Li, X. Wang, L. Wen, D. Hong, T. Luo, and L. Zhang, "End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  20. H. Kang, J. Kim, T. Kim, and S. J. Kim, "UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  21. M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, "The Pascal Visual Object Class (VOC) Challenge," in International Journal of Computer Vision (IJCV), 2010.
  22. O. Russakovsky, J. Deng, H. su, J. Krause, S. Satheesh, S. ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge," in International Journal of Computer Vision (IJCV), 2015.
  23. A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems (NIPS), 2012.
  24. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  25. K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
  26. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
  27. R. Girshick, "Fast R-CNN," in Proceedings of the IEEE International Conference on Computer (ICCV), 2015.
  28. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in Advances in Neural Information Processing Systems (NIPS), 2015.
  29. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  30. M. Tan, and Q. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.
  31. M. Tan, R. Pang, and Q. V. Le, "EfficientDet: Scalable and Efficient Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  32. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context," in Proceedings of the European Conference on Computer Vision (ECCV), 2014.
  33. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," in Proceedings of the International Conference on Learning Representations (ICLR), 2021.
  34. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows," in Proceedings of the IEEE International Conference on Computer (ICCV), 2021.
  35. L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, "Fully-Convolutional Siamese Networks for Object Tracking," in Proceedings of the European Conference on Computer Vision Workshop (ECCVW), 2016.
  36. S. Chopra, R. Hadsell, and Y. Lecun, "Learning A Similarity Metric Discriminatively, with Application to Face Verification," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
  37. B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, "High Performance Visual Tracking with Siamese Region Proposal Network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  38. Z. Zhu, Q. Wang, B. Li, W. Wu, J. Yan, and W. Hu, "Distractor-aware Siamese Networks for Visual Object Tracking," in Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  39. H. Nam, and B. Han, "Learning Multi-Domain Convolutional Neural Networks for Visual Tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  40. M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg, "ATOM: Accurate Tracking by Overlap Maximization," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  41. M. Kristan, and et al, "The sixth Visual Object Tracking VOT2018 challenge results," in Proceedings of the European Conference on Computer Vision Workshop (ECCVW), 2018.
  42. P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, "TransTrack: Multiple Object Tracking with Transformer," in arXiv preprint arXiv:2012.15460, 2020.
  43. X. chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, "Transformer Tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.