DOI QR코드

DOI QR Code

Generating Extreme Close-up Shot Dataset Based On ROI Detection For Classifying Shots Using Artificial Neural Network

인공신경망을 이용한 샷 사이즈 분류를 위한 ROI 탐지 기반의 익스트림 클로즈업 샷 데이터 셋 생성

  • Kang, Dongwann (Department of Computer Science and Engineering, Seoul National University of Science and Technology) ;
  • Lim, Yang-mi (Department of IT Media Engineering, Duksung Women's University)
  • 강동완 (서울과학기술대학교 컴퓨터공학과) ;
  • 임양미 (덕성여자대학교 IT 미디어공학과)
  • Received : 2019.09.16
  • Accepted : 2019.11.11
  • Published : 2019.11.30

Abstract

This study aims to analyze movies which contain various stories according to the size of their shots. To achieve this, it is needed to classify dataset according to the shot size, such as extreme close-up shots, close-up shots, medium shots, full shots, and long shots. However, a typical video storytelling is mainly composed of close-up shots, medium shots, full shots, and long shots, it is not an easy task to construct an appropriate dataset for extreme close-up shots. To solve this, we propose an image cropping method based on the region of interest (ROI) detection. In this paper, we use the face detection and saliency detection to estimate the ROI. By cropping the ROI of close-up images, we generate extreme close-up images. The dataset which is enriched by proposed method is utilized to construct a model for classifying shots based on its size. The study can help to analyze the emotional changes of characters in video stories and to predict how the composition of the story changes over time. If AI is used more actively in the future in entertainment fields, it is expected to affect the automatic adjustment and creation of characters, dialogue, and image editing.

본 연구는 영상 샷의 크기에 따라 다양한 스토리를 갖고 있는 영상들을 분석하는 것을 목표로 한다. 따라서 영상 분석에 앞서, 익스트림 클로즈업 샷, 클로즈업 샷, 미디엄 샷, 풀 샷, 롱 샷 등 샷 사이즈에 따라 데이터셋을 분류하는 것이 선행되어야 한다. 하지만 일반적인 비디오 스토리 내의 샷 분포는 클로즈업 샷, 미들 샷, 풀 샷, 롱 샷 위주로 구성되어 있기 때문에 충분한 양의 익스트림 클로즈업 샷 데이터를 얻는 것이 상대적으로 쉽지 않다. 이를 해결하기 위해 본 연구에서는 관심 영역 (Region Of Interest: ROI) 탐지 기반의 이미지 크롭핑을 통해 익스트림 클로즈업 샷을 생성함으로써 영상 분석을 위한 데이터셋을 확보 방법을 제안한다. 제안 방법은 얼굴 인식과 세일리언시(Saliency)를 활용하여 이미지로부터 얼굴 영역 위주의 관심 영역을 탐지한다. 이를 통해 확보된 데이터셋은 인공신경망의 학습 데이터로 사용되어 샷 분류 모델 구축에 활용된다. 이러한 연구는 비디오 스토리에서 캐릭터들의 감정적 변화를 분석하고 시간이 지남에 따라 이야기의 구성이 어떻게 변화하는지 예측 가능하도록 도움을 줄 수 있다. 향후의 엔터테인먼트 분야에 AI 활용이 적극적으로 활용되어질 때 캐릭터, 대화, 이미지 편집 등의 자동 조정, 생성 등에 영향을 줄 것이라 예상한다.

Keywords

References

  1. E. Chu and D. Roy, "Audio-Visual Sentiment Analysis for Learning Emotional Arcs in Movies," in published 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, Nov. 2017, https://doi.org/10.1109/ICDM.2017.100
  2. L. Itti, and C. Koch, "Computational modelling of visual attention," Nature reviews neuroscience, Vol. 2, No. 3, pp. 194-203, 2001, https://doi.org/10.1038/35058500
  3. K. Duncan, and S. Sarkar, "Saliency in images and video: a brief survey," IET Computer Vision, Vol. 6, No.6 pp. 514-523, 2012, https://doi.org/10.1049/iet-cvi.2012.0032
  4. Y.M. Lim, "The Climax Expression Analysis Based on the Shot-list Data of Movies," Journal of The Korean Society Of Broad Engineers, Vol. 21, No. 6, pp. 965-976, November 2016, https://doi.org/10.5909/JBE.2016.21.6.965
  5. W. Zhao, R. Chellappa and P.J. Phillips, "A. Rosenfeld Face recognition: a literature survey," ACM Comput. Surv. (CSUR), Vol.35, No.4, pp. 399-458, 2003, https://doi.org/10.1145/954339.954342
  6. Z. Kalal, K. Mikolajczyk and J. Matas, "Face-tld: tracking-learning-detection applied to faces," 17th IEEE International Conference on Image Processing (ICIP), pp. 3789-3792, 2010, https://doi.org/10.1109/ICIP.2010.5653525
  7. M. Pantic and L.J.M. Rothkrantz, "Automatic analysis of facial expressions: the state of the art," the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol.22 No.12, pp. 1424-1445, 2000, https://doi.ieeecomputersociety.org/10.1109/34.895976
  8. N. Kumar and A.C. Berg, P.N. Belhumeur and S.K. Nayar, "Attribute and simile classifiers for face verification," IEEE 12th International Conference on Computer Vision, pp. 365-372, 2009, https://doi.org/10.1109/ICCV.2009.5459250
  9. V. Blanz and T. Vetter, "A morphable model for the synthesis of 3d faces," Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187-194, 1999, https://doi.org/10.1145/311535.311556
  10. I. Kemelmacher-Shlizerman, E. Shechtman, R. Garg and S.M. Seitz, "Exploring photobios," ACM Transactions on Graphics (TOG), Vol. 30, No. 61, 2011, https://doi.org/10.1145/2010324.1964956
  11. S. J. McKenna, S. Gong, and Y. Raja, "Modelling facial colour and identity with Gaussian mixtures," Pattern Recognition, April, 1998, https://doi.org/10.1016/S0031-3203(98)00066-1
  12. X.-G. Lv, J. Zhou, and C.-S. Zhang, "A novel algorithm for rotated human face detection," in IEEE Conferenceon Computer Vision and Pattern Recognition, 2000, https://doi.org/10.1109/CVPR.2000.855897
  13. J. Wang and T. Tan, "A new face detection method based on shape information," Journal of Pattern Recognition Letters, Vol.21, Issue 6-7, pp. 463-471, 2000, https://doi.org/10.1016/S0167-8655(00)00008-8
  14. M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: active contour models," in Proceedings of 1st International Conference on Computer Vision, London, 1987, https://doi.org/10.1007/BF00133570
  15. A. Lanitis, C. J. Taylor, and T. F. Cootes, "Automatic tracking, coding and reconstruction of human faces, using flexible appearance models," IEEE Electron. Letters, Vol.30, pp.1578-1579, 1994, https://doi.org/10.1049/el:19941110
  16. K. C. Yow and R. Cipolla, "Feature-based human face detection," Image Vision Comput. Vol.15, No.9, 1997, https://doi.org/10.1016/S0262-8856(97)00003-6
  17. M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience, Vol. 3, No. 1, pp. 71-86, 1991, https://doi.org/10.1162/jocn.1991.3.1.71
  18. M. Tanaka, K. Hotta, T. Kurita, and T. Mishima, "Dynamic attention map by ising model for human face detection," in Proceedings of International Conference on Pattern Recognition, 1998, https://doi.org/10.1109/ICPR.1998.711870
  19. H. A. Rowley, S. Baluja, and T. Kanade, "Neural network-based face detection," in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, pp.:203-208, 1998, https://doi.org/10.1109/34.655647
  20. H. A. Rowley, S. Baluja, and T. Kanade, "Rotation invariant neural network-based face detection," in Proceedings of .IEEE International Conference on Computer Vision and Pattern Recognition, pp. 38-44, 1998, https://doi.org/10.1109/CVPR.1998.698585
  21. R. Benenson, M. Mathias, T. Tuytelaars, and L. Van Gool, "Seeking the strongest rigid detector," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3666-3673, 2013, https://doi.org/10.1109/CVPR.2013.470
  22. J. Li, Y. Zhang, "Learning surf cascade for fast and accurate object detection," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3468-3475, 2013, https://doi.org/10.1109/CVPR.2013.445
  23. E. Osuna, R. Freund and F. Girosi, "Training support vector machines: An application to face detection," in Proceedings of CVPR, 1997, https://doi.org/10.1109/CVPR.1997.609310
  24. C. Zhang, Z. Zhang, "Improving multiview face detection with multi-task deep convolutional neural networks," IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1036-1041. 2014, https://doi.org/10.1109/WACV.2014.6835990
  25. Y. Bengio, Y. LeCun, and G. Hinton, "Deep Learning," Nature, 521, (7553): 436-444, 2015, https://doi.org/10.1038/nature14539
  26. K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks,". IEEE Signal Processing Letters, Vol. 23, No. 10, pp. 1499-1503, 2016, https://doi.org/10.1109/LSP.2016.2603342
  27. A. Borji, M. Cheng, H. Jiang, and J. Li, "Salient object detection: A benchmark," IEEE Transactions on Image Processing, Vol. 24, No. 12, pp. 5706-5722, 2015, https://doi.org/10.1109/TIP.2015.2487833
  28. L. Itti, C. Koch, and E. Niebur, "A model of saliency-based visual attention for rapid scene analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, pp. 1254-1259, 1998, https://doi.ieeecomputersociety.org/10.1109/34.730558
  29. S. Mahamud, L.R. Williams, K.K. Thornber, K. Xu, "Segmentation of multiple salient closed contours from real images," IEEE Trans. Pattern Anal. Mach. Intell., Vol.25, No.4, pp. 433-444. 2003, https://doi.org/10.1109/TPAMI.2003.1190570
  30. D. Gao, S. Han and N. Vasconcelos "Discriminant saliency, the detection of suspicious coincidences and applications to visual recognition," IEEE Trans. Pattern Anal. Mach. Intell., Vol. 31, pp. 989-1005, 2009, https://doi.org/10.1109/TPAMI.2009.27
  31. V. Gopalakrishnan, Y. Hu and D. Rajan,"Salient region detection by modeling distributions of color and orientation," IEEE Trans. Multimedia, Vol. 11, pp. 892 -905, 2009, https://doi.org/10.1109/TMM.2009.2021726
  32. N. Liu, J. Han, and M. H. Yang, "PiCANet: Learning pixel-wise contextual attention for saliency detection," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3089-3098, 2018, https://doi.org/10.1109/CVPR.2018.00326
  33. S.R. Park and Y.M. Lim, "The Implementing a Color, Edge, Optical Flow based on Mixed Algorithm for Shot Boundary Improvement," Journal of Korea Multimedia Society, Vol. 21, No. 8, pp. 829-836, August 2018, https://doi.org/10.9717/kmms.2018.21.8.829
  34. S.R. Park, J.E. Eom and Y.M. Lim, "The System Design and Implementation for Detecting the Types of Shot Size," Proceeding of Conference Korea Multimedia Society, Vol. 21, No. 1, pp. 968-996, Seoul, Korea, May 2018.
  35. Keras Documentaion. https://keras.io/layers/core/#lambda. (accessed Sept. 07, 2019)
  36. Keras Documentaion. https://keras.io/layers/merge/. (accessed Sept. 07, 2019)
  37. J. Redmon and S. Divvala et al., "You only look once: Unified, real-time object detection." IEEE CVPR, pp. 779-788, 2016, https://doi.org/10.1109/CVPR.2016.91