Acknowledgement
이 논문은 2023년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임(No.2021-0-00804, 학습 기반 연출 기법이 적용된 미디어 제작 기술 개발)
References
- 한국 최초/최대 유튜브 채널분석 소셜러스, "2021 한국 유튜브 분석 보고서," https://socialerus.com/
- 최은서, "20시간 일하고 3만 원... 계약서 한 장 없이 헐값 시장에 방치된 영상 편집자들," 한국일보, 2022년 2월 5일자.
- S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan, "YouTube-8M: A Large-Scale Video Classification Benchmark," arXiv preprint arXiv:1609.08675, 2016.
- F. Mao, X. Wu, H. Xue, and R. Zhang, "Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network," in Proceedings of the European Conference on Computer Vision Workshop (ECCVW), 2018.
- S. Bhardwaj, M. Srinivasan, M. M. Khapra, "Efficient Video Classification using Fewer Frames," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman, "The Kinetics Human Action Video Dataset," arXiv preprint arXiv:1705.06950, 2017.
- J. Carreira, E. Noland, A. Banki-Horvath, C. Hillier, and A. Zisserman, "A Short Note about Kinetics-600," arXiv preprint arXiv:1808.01340, 2018.
- J. Carreira, E. Noland, C. Hillier, and A. Zisserman, "A Short Note on the Kinetics-700 Human Action Dataset," arXiv preprint arXiv:1907.06987, 2019.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
- N. Dalal and B. Triggs, "Histogram of Oriented Gradients for Human Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
- C. Wei, H. Fan, S. Xie, C.-Y. Wu, A. Yuille, and C. Feichtenhofer, "Masked Feature Prediction for Self-Supervised Visual Pre-Training," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- K. He, X. Chen, S. Xie, Y. Li, P. Dolllar, and Ross Girshick, "Masked Autoencoders Are Scalable Vision Learners," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- C. Feichtenhofer, H. Fan, Y. Li, and K. He, "Masked Autoencoders As Spatiotemporal Learners," arXiv preprint arXiv:2205.09113, 2022.
- Q. Huang, Y. xiong, A. Rao, J. Wang, and D. Lin, "MovieNet: A Holistic Dataset for Movie Understanding," in Proceedings of the European Conference on Computer Vision (ECCV), 2020.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, "Momentum Contrast for Unsupervised Visual Representation Learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- S. Chen, X. Nie, D. Fan, D. Zhang, V. Bhat, and R. Hamid, "Shot Contrastive Self-Supervised Learning for Scene Boundary Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- H. Wu, K. Chen, Y. Luo, R. Qiao, B. Ren, H. Liu, W. Xie, and L. Shen, "Scene Consistency Representation Learning for Video Scene Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- M. Z. Shou, S. W. Lei, W. Wang, D. Ghadiyaram, and M. Feiszli, "Generic Event Boundary Detection: A Benchmark for Event Segmentation," in Proceedings of the IEEE International Conference on Computer (ICCV), 2021.
- C. Li, X. Wang, L. Wen, D. Hong, T. Luo, and L. Zhang, "End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- H. Kang, J. Kim, T. Kim, and S. J. Kim, "UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, "The Pascal Visual Object Class (VOC) Challenge," in International Journal of Computer Vision (IJCV), 2010.
- O. Russakovsky, J. Deng, H. su, J. Krause, S. Satheesh, S. ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge," in International Journal of Computer Vision (IJCV), 2015.
- A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems (NIPS), 2012.
- K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
- R. Girshick, "Fast R-CNN," in Proceedings of the IEEE International Conference on Computer (ICCV), 2015.
- S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in Advances in Neural Information Processing Systems (NIPS), 2015.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- M. Tan, and Q. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.
- M. Tan, R. Pang, and Q. V. Le, "EfficientDet: Scalable and Efficient Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context," in Proceedings of the European Conference on Computer Vision (ECCV), 2014.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," in Proceedings of the International Conference on Learning Representations (ICLR), 2021.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows," in Proceedings of the IEEE International Conference on Computer (ICCV), 2021.
- L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, "Fully-Convolutional Siamese Networks for Object Tracking," in Proceedings of the European Conference on Computer Vision Workshop (ECCVW), 2016.
- S. Chopra, R. Hadsell, and Y. Lecun, "Learning A Similarity Metric Discriminatively, with Application to Face Verification," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
- B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, "High Performance Visual Tracking with Siamese Region Proposal Network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Z. Zhu, Q. Wang, B. Li, W. Wu, J. Yan, and W. Hu, "Distractor-aware Siamese Networks for Visual Object Tracking," in Proceedings of the European Conference on Computer Vision (ECCV), 2018.
- H. Nam, and B. Han, "Learning Multi-Domain Convolutional Neural Networks for Visual Tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg, "ATOM: Accurate Tracking by Overlap Maximization," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- M. Kristan, and et al, "The sixth Visual Object Tracking VOT2018 challenge results," in Proceedings of the European Conference on Computer Vision Workshop (ECCVW), 2018.
- P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, "TransTrack: Multiple Object Tracking with Transformer," in arXiv preprint arXiv:2012.15460, 2020.
- X. chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, "Transformer Tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.