A Study on Phoneme Likely Units to Improve the Performance of Context-dependent Acoustic Models in Speech Recognition

음성인식에서 문맥의존 음향모델의 성능향상을 위한 유사음소단위에 관한 연구

  • 임영춘 (주식회사 자모바) ;
  • 오세진 (한국천문연구원 KVN 사업본부) ;
  • 김광동 (한국천문연구원 KVN 사업본부) ;
  • 노덕규 (한국천문연구원 KVN 사업본부) ;
  • 송민규 (한국천문연구원 KVN 사업본부) ;
  • 정현열 (영남대학교 전자정보공학부)
  • Published : 2003.07.01

Abstract

In this paper, we carried out the word, 4 continuous digits. continuous, and task-independent word recognition experiments to verify the effectiveness of the re-defined phoneme-likely units (PLUs) for the phonetic decision tree based HM-Net (Hidden Markov Network) context-dependent (CD) acoustic modeling in Korean appropriately. In case of the 48 PLUs, the phonemes /ㅂ/, /ㄷ/, /ㄱ/ are separated by initial sound, medial vowel, final consonant, and the consonants /ㄹ/, /ㅈ/, /ㅎ/ are also separated by initial sound, final consonant according to the position of syllable, word, and sentence, respectively. In this paper. therefore, we re-define the 39 PLUs by unifying the one phoneme in the separated initial sound, medial vowel, and final consonant of the 48 PLUs to construct the CD acoustic models effectively. Through the experimental results using the re-defined 39 PLUs, in word recognition experiments with the context-independent (CI) acoustic models, the 48 PLUs has an average of 7.06%, higher recognition accuracy than the 39 PLUs used. But in the speaker-independent word recognition experiments with the CD acoustic models, the 39 PLUs has an average of 0.61% better recognition accuracy than the 48 PLUs used. In the 4 continuous digits recognition experiments with the liaison phenomena. the 39 PLUs has also an average of 6.55% higher recognition accuracy. And then, in continuous speech recognition experiments, the 39 PLUs has an average of 15.08% better recognition accuracy than the 48 PLUs used too. Finally, though the 48, 39 PLUs have the lower recognition accuracy, the 39 PLUs has an average of 1.17% higher recognition characteristic than the 48 PLUs used in the task-independent word recognition experiments according to the unknown contextual factor. Through the above experiments, we verified the effectiveness of the re-defined 39 PLUs compared to the 48PLUs to construct the CD acoustic models in this paper.

References

  1. Fundamentals of Speech Recognition L.Rabiner;B.H.Juang
  2. 確率モデ ルによる音聲認識 中川聖一
  3. 한국음향학회지 v.13 no.1 Diphone 단위의 hidden Markov model을 이용한 한국어 단어인식 박현상;은종관,박용규;권오욱
  4. 한국음향학회지 v.18 no.8 가변어휘 음성인식기의 음향모델 개선 밍 성능 분석 이승훈;김회린
  5. 한국음향학회지 v.16 no.3 인식 단위로서의 한국어 음절에 관한 연구 김유진;김회린;정재호
  6. 제15회 음성통신 및 신호처리 워크샵 논문집 기본음소 설정을 위한 음소인식률 이용 방안 연구 김호경;구명완
  7. Pro. of ICASSP '92 v.1 A successive state splitting algorithm for efficient allophone modeling J.Takami;S.Sagayama
  8. IEICE Trans. Info. & Syst. v.E78-D no.6 A new HMnet construction algorithm requiring no contextual factors M.Suzuki;S.Makino;A.Ito;H.Aso;H.Shimodaira
  9. Computer Speech and Language v.11 HMM topology design using maximum likelihood successive state splitting Ostendoft;H.Singer https://doi.org/10.1006/csla.1996.0021
  10. IEEE 4th workshop on Multimedia Signal Processing New state clustering of hidden Markov network with Korean Phonological rules for speech recognition S.J.Oh:C.J.Hwang;B.K.Kim;H.Y.Chung;A.Ito
  11. Proc. of ICASSP'90 Allophone clustering for continuous speech recognition K.Lee;S.Hayamizu;H.Hou;C.Huang;J.Swartz;R.Weide
  12. The HTK Book S.Young;D.Kershaw;J.Odell;D.Ollason;V.Valtchev;P.Woodland
  13. 한국음향학회지 v.18 no.2 음성인식 기능을 가진 주소입력 시스템의 개발과 평가 김득수;황철준;정현열
  14. 한국음향학회지 v.21 no.2 결정트리 상태 클러스트링에 의한 HM-Net 구조결정 알고리즘을 이용한 음성인식에 관한 연구 오세진;황철준;김범국;정호열;정현열
  15. 국어음성학 이호영
  16. 국어음운론 배주채