Search | Korea Science

Kim, Kyung-Min;Ha, Jung-Woo;Lee, Beom-Jin;Zhang, Byoung-Tak
- Journal of KIISE
- /
- v.42 no.4
- /
- pp.451-458
- /
- 2015
Previous multimodal learning methods focus on problem-solving aspects, such as image and video search and tagging, rather than on knowledge acquisition via content modeling. In this paper, we propose the Multimodal Concept Hierarchy (MuCH), which is a content modeling method that uses a cartoon video dataset and a character-based subtitle generation method from the learned model. The MuCH model has a multimodal hypernetwork layer, in which the patterns of the words and image patches are represented, and a concept layer, in which each concept variable is represented by a probability distribution of the words and the image patches. The model can learn the characteristics of the characters as concepts from the video subtitles and scene images by using a Bayesian learning method and can also generate character-based subtitles from the learned model if text queries are provided. As an experiment, the MuCH model learned concepts from 'Pororo' cartoon videos with a total of 268 minutes in length and generated character-based subtitles. Finally, we compare the results with those of other multimodal learning models. The Experimental results indicate that given the same text query, our model generates more accurate and more character-specific subtitles than other models.
https://doi.org/10.5626/JOK.2015.42.4.451 인용 KSCI

Nan, Chang-Jun;Kim, Kyung-Min;Zhang, Byoung-Tak
- KIISE Transactions on Computing Practices
- /
- v.22 no.11
- /
- pp.619-624
- /
- 2016
Social-aware video displays not only the relationships between characters but also diverse information on topics such as economics, politics and culture as a story unfolds. Particularly, the speaking habits and behavioral patterns of people in different situations are very important for the analysis of social relationships. However, when dealing with this dynamic multi-modal data, it is difficult for a computer to analyze the drama data effectively. To solve this problem, previous studies employed the deep concept hierarchy (DCH) model to automatically construct and analyze social networks in a TV drama. Nevertheless, since location knowledge was not included, they can only analyze the social network as a whole in stories. In this research, we include location knowledge and analyze the social relations in different locations. We adopt data from approximately 4400 minutes of a TV drama Friends as our dataset. We process face recognition on the characters by using a convolutional- recursive neural networks model and utilize a bag of features model to classify scenes. Then, in different scenes, we establish the social network between the characters by using a deep concept hierarchy model and analyze the change in the social network while the stories unfold.
https://doi.org/10.5626/KTCP.2016.22.11.619 인용 KSCI