Statistical Analysis Between Size and Balance of Text Corpus by Evaluation of the effect of Interview Sentence in Language Modeling

언어모델 인터뷰 영향 평가를 통한 텍스트 균형 및 사이즈간의 통계 분석

  • 정의정 (한국전자통신연구원 음성언어팀) ;
  • 이영직 (한국전자통신연구원 음성언어팀)
  • Published : 2002.07.01

Abstract

This paper analyzes statistically the relationship between size and balance of text corpus by evaluation of the effect of interview sentences in language model for Korean broadcast news transcription system. Our Korean broadcast news transcription system's ultimate purpose is to recognize not interview speech, but the anchor's and reporter's speech in broadcast news show. But the gathered text corpus for constructing language model consists of interview sentences a portion of the whole, $15\%$ approximately. The characteristic of interview sentence is different from the anchor's and the reporter's in one thing or another. Therefore it disturbs the anchor and reporter oriented language modeling. In this paper, we evaluate the effect of interview sentences in language model for Korean broadcast news transcription system and analyze statistically the relationship between size and balance of text corpus by making an experiment as the same procedure according to varying the size of corpus.

Keywords