N-gram Based Robust Spoken Document Retrievals for Phoneme Recognition Errors

음소인식 오류에 강인한 N-gram 기반 음성 문서 검색

  • 이수장 (한국과학기술원(KAIST) 전자전산학과 전산학전공 음성인터페이스 연구실) ;
  • 박경미 (한국과학기술원(KAIST) 전자전산학과 전산학전공 음성인터페이스 연구실) ;
  • 오영환 (한국과학기술원(KAIST) 전자전산학과 전산학전공 음성인터페이스 연구실)
  • Published : 2008.09.30

Abstract

In spoken document retrievals (SDR), subword (typically phonemes) indexing term is used to avoid the out-of-vocabulary (OOV) problem. It makes the indexing and retrieval process independent from any vocabulary. It also requires a small corpus to train the acoustic model. However, subword indexing term approach has a major drawback. It shows higher word error rates than the large vocabulary continuous speech recognition (LVCSR) system. In this paper, we propose an probabilistic slot detection and n-gram based string matching method for phone based spoken document retrievals to overcome high error rates of phone recognizer. Experimental results have shown 9.25% relative improvement in the mean average precision (mAP) with 1.7 times speed up in comparison with the baseline system.

Keywords