대한음성학회지:말소리 (MALSORI)
- 제59호
- /
- Pages.53-68
- /
- 2006
- /
- 1226-1173(pISSN)
한국어 음성인식 플랫폼(ECHOS)의 개선 및 평가
Improvement and Evaluation of the Korean Large Vocabulary Continuous Speech Recognition Platform (ECHOS)
- 권석봉 (한국정보통신대학교) ;
- 윤성락 (한국과학기술원 전자전산학과) ;
- 장규철 (한국과학기술원 전자전산학과) ;
- 김용래 (충북대학교 제어계측공학과) ;
- 김봉완 (원광대학교 음성정보기술산업지원센터) ;
- 김회린 (한국정보통신대학교) ;
- 유창동 (한국과학기술원 전자전산학과) ;
- 이용주 (원광대학교 음성정보기술산업지원센터) ;
- 권오욱 (충북대학교 전기전자컴퓨터공학부)
- Kwon, Suk-Bong (ICU) ;
- Yun, Sung-Rack (KAIST) ;
- Jang, Gyu-Cheol (KAIST) ;
- Kim, Yong-Rae ;
- Kim, Bong-Wan (SiTEC) ;
- Kim, Hoi-Rin (ICU) ;
- Yoo, Chang-Dong (KAIST) ;
- Lee, Yong-Ju (SiTEC) ;
- Kwon, Oh-Wook
- 발행 : 2006.09.30
초록
We report the evaluation results of the Korean speech recognition platform called ECHOS. The platform has an object-oriented and reusable architecture so that researchers can easily evaluate their own algorithms. The platform has all intrinsic modules to build a large vocabulary speech recognizer: Noise reduction, end-point detection, feature extraction, hidden Markov model (HMM)-based acoustic modeling, cross-word modeling, n-gram language modeling, n-best search, word graph generation, and Korean-specific language processing. The platform supports both lexical search trees and finite-state networks. It performs word-dependent n-best search with bigram in the forward search stage, and rescores the lattice with trigram in the backward stage. In an 8000-word continuous speech recognition task, the platform with a lexical tree increases 40% of word errors but decreases 50% of recognition time compared to the HTK platform with flat lexicon. ECHOS reduces 40% of recognition errors through incorporation of cross-word modeling. With the number of Gaussian mixtures increasing to 16, it yields word accuracy comparable to the previous lexical tree-based platform, Julius.