Conversation Context Annotation using Speaker Detection

Park, Seung-Bo;Kim, Yoo-Won;Jo, Geun-Sik;

Journal of Korea Multimedia Society (한국멀티미디어학회논문지)

Volume 12 Issue 9
/
Pages.1252-1261
/
2009
/
1229-7771(pISSN)
/
2384-0102(eISSN)

Korea Multimedia Society (한국멀티미디어학회)

Conversation Context Annotation using Speaker Detection

화자인식을 이용한 대화 상황정보 어노테이션

박승보 (인하대학교 정보공학과) ;
김유원 (인하대학교 정보공학과) ;
조근식 (인하대학교 컴퓨터정보공학부)

Published : 2009.09.30

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

One notable challenge in video searching and summarizing is extracting semantic from video contents and annotating context for video contents. Video semantic or context could be obtained by two methods to extract objects and contexts between objects from video. However, the method that use just to extracts objects do not express enough semantic for shot or scene as it does not describe relation and interaction between objects. To be more effective, after extracting some objects, context like relation and interaction between objects needs to be extracted from conversation situation. This paper is a study for how to detect speaker and how to compose context for talking to annotate conversation context. For this, based on this study, we proposed the methods that characters are recognized through face recognition technology, speaker is detected through mouth motion, conversation context is extracted using the rule that is composed of speaker existing, the number of characters and subtitles existing and, finally, scene context is changed to xml file and saved.

효율적인 영상의 검색과 동영상의 축약을 위해 선행되어야 하는 것이 동영상 정보에서 의미를 추출하여 영상 정보를 어노테이션 하는 작업이다. 어노테이션을 위한 동영상의 의미 정보는 다양한 방식에 의해 얻어질 수 있다. 동영상의 의미정보는 영상의 개체들의 단순한 정체 정보를 추출하는 방식과 개체들이 만들어 내는 상황정보를 추출하는 방식으로 구분될 수 있다. 하지만 개체들의 단순 정보만으로 어노테이션을 진행하기 보다는 개체들 간의 상호작용이나 관계에 대한 표현을 개체 정보와 함께 고려하여 대화 상황에 대한 온전한 의미를 어노테이션 하는 것이 더욱 좋다. 본 논문은 영상으로부터 화자정보를 추출하고 대화상황을 구성하여 어노테이션 하는 것에 대한 연구이다. 인식된 얼굴 정보로부터 현재 영상에 누가 있는 지 알아낸 후 입의 움직임을 분석하여 화자가 누구인지 알아내고, 화자와 청자 및 자막의 유무를 통해 대화 상황을 추출하여 XML로 변환하는 방법을 본 연구에서 제안한다.

Keywords

References

D. Yamamoto and K. Nagao, "iVAS: Web-based Video Annotation System and its Applications," In Proceedings of the 3rd International Semantic Web Conference, Demonstration Session, 2004.
박주현, 낭종호, "MPEG-7시각 정보 기술자와 텍스트 정보를 이용한 내용 기반 웹 이미지 검색 시스템," 한국정보과학회 학술발표 논문집 한국정보과학회 2006. 한국컴퓨터종합학술 대회 논문집(A), pp. 232-234, 2006.
김태희, 이응희, 정동석, "MPEG 압축 영역에서 내용 곡선을 이용한 Video 요약 기법," 한국통신학회논문지, 제27권, 제10A호, pp. 1021-1028,2002.
임동혁, 이석룡, 정진환, "비디오 검색과 시각적 요약을 위한 장면 기반 계층적 브라우징 기법," 정보과학회논문지: 데이터베이스, 제28권, 제2호, pp. 181-187, 2001.
오형철, 최종호, "에지 투영 및 방향성 벡터를 이용한 차량번호판 인식 알고리즘," 한국정보기술학회논문지, 제7권, 제1호, pp. 83-92, 2009.
J.R. Cozar, N. Guil, J.M. Gonzalez-Linares, E.L. Zapata, and E. Izquierdox, "Logo type detection to support semantic-based video annotation," Signal Processing: Image Communication, Vol. 22, Issues 7-8, pp. 669-679, 2006. https://doi.org/10.1016/j.image.2007.05.006
J. Assfalg, M. Bertini, C. Colombo, A.D. Bimbo, and W. Nunziati, "Semantic annotation of soccer videos" automatic highlights identification," Computer Vision and Image Understanding, Vol. 92, Issues 2-3, pp. 285-305, 2003. https://doi.org/10.1016/j.cviu.2003.06.004
M. Everingham, J. Sivic, and A. Zisserman, "Taking the bite out of automated naming of characters in TV video," Image and Vision Computing, In Press, Corrected Proof, May 2008.
S. Park, K. Oh, H. Kim, and G. Jo, "Automatic Subtitles Localization through Speaker Identification in Multimedia System," Semantic Computing and Applications, IWSCA'08 IEEE International Workshop on, pp. 166-172, 2008.
J. Yang, R. Yan, and A.G. Hauptmann, "Multiple instance learning for labeling faces in broadcasting news video," in: Proceedings of the ACM International Conference on Multimedia, pp. 31-40, 2005.
L. Chen, G. Chen, C. Xu, J. March, and S. Benford, "EmoPlayer: A media player for video clips with affective annotations," Interacting with Computers, Vol.20, pp. 17-28, 2008. https://doi.org/10.1016/j.intcom.2007.06.003
V. Roth, "Content-based retrieval from digital video," Image and Vision Computing, Vol.17, No.7, pp. 531-540, 1999. https://doi.org/10.1016/S0262-8856(98)00144-9
L. Liang, G. Haifeng, L. Li, and W. Liang, "Semantic event representation and recognition using syntactic attribute graph grammar," Pattern Recognition Letters, Vol.30, Issue 2, pp. 180-186, Jan. 2009. https://doi.org/10.1016/j.patrec.2008.02.023

Journal of Korea Multimedia Society (한국멀티미디어학회논문지)

Conversation Context Annotation using Speaker Detection

화자인식을 이용한 대화 상황정보 어노테이션

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)