DOI QR코드

DOI QR Code

Story-based Information Retrieval

스토리 기반의 정보 검색 연구

  • You, Eun-Soon (Graduate School of Cinematic Content, Dankook University) ;
  • Park, Seung-Bo (Institute of Media Content, Dankook University)
  • 유은순 (단국대학교 영화콘텐츠전문대학원) ;
  • 박승보 (단국대학교 미디어콘텐츠연구원)
  • Received : 2013.12.05
  • Accepted : 2013.12.19
  • Published : 2013.12.31

Abstract

Video information retrieval has become a very important issue because of the explosive increase in video data from Web content development. Meanwhile, content-based video analysis using visual features has been the main source for video information retrieval and browsing. Content in video can be represented with content-based analysis techniques, which can extract various features from audio-visual data such as frames, shots, colors, texture, or shape. Moreover, similarity between videos can be measured through content-based analysis. However, a movie that is one of typical types of video data is organized by story as well as audio-visual data. This causes a semantic gap between significant information recognized by people and information resulting from content-based analysis, when content-based video analysis using only audio-visual data of low level is applied to information retrieval of movie. The reason for this semantic gap is that the story line for a movie is high level information, with relationships in the content that changes as the movie progresses. Information retrieval related to the story line of a movie cannot be executed by only content-based analysis techniques. A formal model is needed, which can determine relationships among movie contents, or track meaning changes, in order to accurately retrieve the story information. Recently, story-based video analysis techniques have emerged using a social network concept for story information retrieval. These approaches represent a story by using the relationships between characters in a movie, but these approaches have problems. First, they do not express dynamic changes in relationships between characters according to story development. Second, they miss profound information, such as emotions indicating the identities and psychological states of the characters. Emotion is essential to understanding a character's motivation, conflict, and resolution. Third, they do not take account of events and background that contribute to the story. As a result, this paper reviews the importance and weaknesses of previous video analysis methods ranging from content-based approaches to story analysis based on social network. Also, we suggest necessary elements, such as character, background, and events, based on narrative structures introduced in the literature. We extract characters' emotional words from the script of the movie Pretty Woman by using the hierarchical attribute of WordNet, which is an extensive English thesaurus. WordNet offers relationships between words (e.g., synonyms, hypernyms, hyponyms, antonyms). We present a method to visualize the emotional pattern of a character over time. Second, a character's inner nature must be predetermined in order to model a character arc that can depict the character's growth and development. To this end, we analyze the amount of the character's dialogue in the script and track the character's inner nature using social network concepts, such as in-degree (incoming links) and out-degree (outgoing links). Additionally, we propose a method that can track a character's inner nature by tracing indices such as degree, in-degree, and out-degree of the character network in a movie through its progression. Finally, the spatial background where characters meet and where events take place is an important element in the story. We take advantage of the movie script to extracting significant spatial background and suggest a scene map describing spatial arrangements and distances in the movie. Important places where main characters first meet or where they stay during long periods of time can be extracted through this scene map. In view of the aforementioned three elements (character, event, background), we extract a variety of information related to the story and evaluate the performance of the proposed method. We can track story information extracted over time and detect a change in the character's emotion or inner nature, spatial movement, and conflicts and resolutions in the story.

웹의 발전과 콘텐츠 산업의 팽창으로 비디오 데이터가 폭발적으로 증가함에 따라 데이터의 정보 검색은 매우 중요한 문제가 되었다. 그동안 비디오 데이터의 정보 검색과 브라우징을 위해 비디오의 프레임(frame)이나 숏(shot)으로부터 색채(color)와 질감(texture), 모양(shape)과 같은 시각적 특징(features)들을 추출하여 비디오의 내용을 표현하고 유사도를 측정하는 내용 기반(content-based)방식의 비디오 분석이 주를 이루었다. 영화는 하위 레벨의 시청각적 정보와 상위 레벨의 스토리 정보를 포함하고 있다. 저차원의 시각적 특징을 통해 내용을 표현하는 내용 기반 분석을 영화에 적용할 경우 내용 기반 분석과 인간이 인지하는 영화의 내용 사이에는 의미적 격차(semantic gap)가 발생한다. 왜냐하면 영화의 스토리는 시간의 진행에 따라 그 내용이 변하고, 관점에 따라 주관적 해석이 가능한 고차원의 의미정보이기 때문이다. 따라서 스토리 차원의 정보 검색을 위해서는 스토리를 모델링하는 정형화된 모형이 필요하다. 최근 들어 소셜 네트워크 개념을 활용한 스토리 기반의 비디오 분석 방법들이 등장하고 있다. 그러나 영화 속 등장인물들의 소셜 네트워크를 통해 스토리를 표현하는 이 방법들은 몇 가지 문제점들을 드러내고 있다. 첫째, 등장인물들의 관계에만 초점이 맞추어져 있으며, 스토리 진행에 따른 등장인물들의 관계 변화를 역동적으로 표현하지 못한다. 둘째, 등장인물의 정체성과 심리상태를 보여주는 감정(emotion)과 같은 심층적 정보를 간과하고 있다. 셋째, 등장인물 이외에 스토리를 구성하는 사건과 배경에 대한 정보들을 반영하지 못하고 있다. 따라서 본 연구는 기존의 스토리 기반의 비디오 분석 방법들의 한계를 살펴보고, 문제 해결을 위해 문학 이론에서 제시하고 있는 서사 구조에 근거하여 스토리 모델링에 필요한 요소들을 인물, 배경, 사건의 세 가지 측면에서 제시하고자 한다.

Keywords

References

  1. Brodbeck, F., Cinemetrics, 2011, Available at http://cinemetrics.fredericbrodbeck.de (Downloaded 30 September, 2013).
  2. Calic, J., D. Gibson and N. Campbell, "Efficient layout of comic-like video summaries," IEEE Transactions on Circuits and Systems for Video Technology, Vol.17, No.7(2007), 931-936. https://doi.org/10.1109/TCSVT.2007.897466
  3. Chatman, S., Story and Discourse: Narrative Structure in Fiction and Film, Minumsa, Seoul, 1990.
  4. Cowgill, L., Secrets of Screenplay Structure, Sigongart, Seoul, 2003.
  5. Ding, L. and A. Yilmaz, "Learning Relations Among Movie Characters : A Social Network Perspective,"Proceedings of the 11th European conference on Computer vision : Part IV (ECCV), (2010), 410-423.
  6. Ekin, A., A. M. Tekalp and R. Mehrotra, "Automatic soccer video analysis and summarization," IEEE Transactions on Image Processing, Vol.12, No.7(2003), 796-807. https://doi.org/10.1109/TIP.2003.812758
  7. Gong, Y. H., "Summarizing audio-visual contents of a video program," EURASIP Journal on Advances in Signal Processing, Vol.2003, No.2(2003), 160-169. https://doi.org/10.1155/S1110865703211082
  8. Hauptmann, A., R., Yan and W-H. Lin, "How many high-level concepts will fill the semantic gap in video retrieval?," Proceedings of the 6th ACM international Conference on Image and Video Retrieval, (2007), 627-634.
  9. Hung, H., D. Jayagopi, C. Yeo, G. Friendland, S. Ba, J. Ramchandran, N. Mirghafori and D. Gatica-Perez, "Using audio and video features to classify the most dominant person in a group meeting," Proceedings of ACM Multimedia Conference, (2007), 835-838.
  10. Jung, B., T. Kwak, J. Song and Y. Lee, "Narrative abstraction model for story-oriented video," Proceedings of ACM Multimedia Conference, (2004), 828-835.
  11. Kaminski, J. and M. Schober, "Social networks in movies," International Conference on Collaborative Innovation Networks COINs, (2011), 1-3.
  12. Laptev, I., M. Marszałek, C. Schmid and B. Rozenfeld, "Learning realistic human actions from movies," IEEE Conference on Computer Vision and Pattern Recognition, (2008), 1-8.
  13. Lee, Y.-H., K.-J. Oh, S. Visal, and G.-S. Jo, "A Collaborative Video Annotation and Browsing System using Linked Data," Korea Intelligent Information System Society, Vol.17, No.3(2011), 203-219.
  14. Marks, D., Inside Story, Three Mountain Press, 2007.
  15. Mckee, R., Story : Substance, Structure, Style and the Principles of Screenwriting, Golden Bough, Seoul, 2002.
  16. Nothelfer, C. E., J. E. DeLong and J. E. Cutting, "Shot Structure in Hollywood Film," Indiana Undergraduate Journal of Cognitive Science, Vol.4(2009), 103-113.
  17. Otsuka, I., K. Nakane, A. Divakaran, K. Hatanaka and M. Ogawa, "A highlight scene detection and video summarization system using audio feature for a personal video recorder," IEEE Trans. Consumer Electronics, Vol.51, No.1, (2005), 112-116. https://doi.org/10.1109/TCE.2005.1405707
  18. Park, S.-B., Semantic Multimedia Browsing System based on Character-net, PhD Thesis, Department of Information Engineering, INHA University, 2011.
  19. Park, S.-B., E.-S. You and J. Jung, "Extracting Beginning Boundaries for Efficient Management of Movie Storytelling," Journal of Intelligence and Information Systems, Vol.17, No.4(2011), 279-292.
  20. Park, S.-B., K.-J. Oh and G.-S. Jo, "Social Network Analysis in a Movie using Character-net," Multimedia Tools and Applications, Vol.59, No.2(2012), 601-627. https://doi.org/10.1007/s11042-011-0725-1
  21. Park, S.-B., Y.-W. Kim and G.-S. Jo, "Conversation Context Annotation using Speaker Detection," Korea Multimedia Society, Vol.12, No.9(2009), 1252-1261.
  22. Peker, K. A., I. Otsuka and A. Divakaran, "Broadcast video program summarization using face tracks," IEEE International Conference on Multimedia and Expo, (2006), 1053-1056.
  23. Rasheed, Z., Y. Sheikh and M. Shah, "On the use of computable features for film classification," Circuits and Systems for Video Technology, IEEE Transactions on, Vol.15, No.1 (2005), 52-64. https://doi.org/10.1109/TCSVT.2004.839993
  24. Rienks, R., D. Zhang and W. Post, "Detection and application of influence rankings in small group meetings," Proceedings of International Conference on Multimodal Interfaces, (2006), 257-264.
  25. Roth, V., "Content-based retrieval from digital video," Image and Vision Computing, Vol.17 (1999), 531-540. https://doi.org/10.1016/S0262-8856(98)00144-9
  26. Son, D. W., Social Network Analysis, Kyungmunsa, Seoul, 2002.
  27. Wan, K., X. Yan and C. Xu, "Automatic mobile sports highlights," IEEE International Conference on Multimedia and Expo(ICME), (2005), 638-641.
  28. Wasserman, S. and K. Faust, Social Network Analysis : Methods and Applications, Cambridge University Press, 1994.
  29. Weng, C. Y., W. T. Chu and J. L. Wu, "RoleNet: movie analysis from the perspective of social network," IEEE Transaction on Multimedia, Vol.11, No.2(2009), 256-271. https://doi.org/10.1109/TMM.2008.2009684
  30. Xie X.-N. and F. Wu, "Automatic video summarization by affinity propagation clustering and semantic content mining," 2008 International Symposium on Electronic Commerce and Security, (2008), 203-208.
  31. Yang, S. G., S. K. Kim and Y. M. Ro, "Semantic home photo categorization," IEEE Transactions on Circuits and Systems for Video Technology, Vol.17, No.3(2007), 324-335. https://doi.org/10.1109/TCSVT.2007.890829

Cited by

  1. Meta-data Configuration and Wellness Feature Analysis Technique for Wellness Content Recommendation vol.19, pp.8, 2014, https://doi.org/10.9708/jksci.2014.19.8.083
  2. Hybrid Preference Prediction Technique Using Weighting based Data Reliability for Collaborative Filtering Recommendation System vol.19, pp.5, 2014, https://doi.org/10.9708/jksci.2014.19.5.061
  3. Development of Collaborative Script Analysis Platform Based on Web for Information Retrieval Related to Story vol.19, pp.9, 2014, https://doi.org/10.9708/jksci.2014.19.9.093
  4. Design of Narrative Text Visualization Through Character-net vol.15, pp.2, 2015, https://doi.org/10.5392/JKCA.2015.15.02.086