N-gram 기반의 유사도를 이용한 대화체 연속 음성 언어 모델링

Spontaneous Speech Language Modeling using N-gram based Similarity

  • 발행 : 2003.06.01

초록

This paper presents our language model adaptation for Korean spontaneous speech recognition. Korean spontaneous speech is observed various characteristics of content and style such as filled pauses, word omission, and contraction as compared with the written text corpus. Our approaches focus on improving the estimation of domain-dependent n-gram models by relevance weighting out-of-domain text data, where style is represented by n-gram based tf/sup */idf similarity. In addition to relevance weighting, we use disfluencies as Predictor to the neighboring words. The best result reduces 9.7% word error rate relatively and shows that n-gram based relevance weighting reflects style difference greatly and disfluencies are good predictor also.

키워드