DOI QR코드

DOI QR Code

Korean speech recognition based on grapheme

문자소 기반의 한국어 음성인식

  • 이문학 (한양대학교 전자컴퓨터통신공학과) ;
  • 장준혁 (한양대학교 융합전자공학부)
  • Received : 2019.07.01
  • Accepted : 2019.09.03
  • Published : 2019.09.30

Abstract

This paper is a study on speech recognition in the Korean using grapheme unit (Cho-sumg [onset], Jung-sung [nucleus], Jong-sung [coda]). Here we make ASR (Automatic speech recognition) system without G2P (Grapheme to Phoneme) process and show that Deep learning based ASR systems can learn Korean pronunciation rules without G2P process. The proposed model is shown to reduce the word error rate in the presence of sufficient training data.

본 논문에서는 한국어 음성인식기 음향모델의 출력단위로 문자소를 제안한다. 제안하는 음성인식 모델은 한글을 G2P(Grapheme to Phoneme)과정 없이 초성, 중성, 종성 단위의 문자소로 분해하여 음향모델의 출력단위로 사용하며, 특별한 발음 정보를 주지 않고도 딥러닝 기반의 음향모델이 한국어 발음규정을 충분히 학습해 낼 수 있음을 보인다. 또한 기존의 음소기반 음성인식 모델과의 성능을 비교 평가하여 DB가 충분한 상황에서 문자소 기반 모델이 상대적으로 뛰어난 성능을 가진다는 것을 보인다.

Keywords

References

  1. J. W. Yoo, "A study on method of constructing pronunciation unit for continuous speech recognition," Hankuk University of Foreign Studies Rep., 1995.
  2. K. Irie, R. Prabhavalkar, A. Kannan, A. Bruguier, D. Rybach, and P. Nguyen, "Model unit exploration for sequence-to-sequence speech recognition," arXiv:1902. 01955 (2019).
  3. H. Hong and J. M. Hwa, Phonetics-based design of phoneme - like units for Korean speech recognition, (Master's degree, Seoul University graduate school, 2009).
  4. L. G. Nim and J. M. Hwa, "Pronunciation dictionary for continuous speech recognition" (in Korean), Proc. KIISE. Conf. 197-199 (2000).
  5. M. -S. Na and M. H. Chung, "Assistive program for automatic speech transcription based on G2P conversion and speech recognition" (in Korean), Proc. KSSS, 131-132 (2016).
  6. M. Schuster and K. Nakajima, "Japanese and Korean voice search," Proc. IEEE ICASSP, 5149-5152 (2012).
  7. J. -U. Bang, S. -H. Kim, and O. -W. Kwon, "Performance of speech recognition unit considering morphological pronunciation variation," Phonetics and Speech Sciences, 10, 111-119 (2018). https://doi.org/10.13064/KSSS.2018.10.4.111
  8. W. Chan, N. Jaitly, Q. Le, and O. Vinyals "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition," Proc. IEEE ICASSP, 4960-4964 (2016).
  9. G. N. Lee and M. H. Jeong, "Pronunciation lexicon modeling and design for Korean large vocabulary continuous speech recognition," Proc. Interspeech, 4-8 (2004).
  10. S. J. Young, J. J. Odell, and P. C. Woodland, "Treebased state tying for high accuracy acoustic modelling," Proc. the ARPA Human Language Technology Workshop, 307-312 (1994).
  11. T. Ko, V. Peddinti,, D. Povey, M. L. Seltzer, and S. Khudanpur, "A study on data augmentation of reverberant speech for robust speech recognition," Proc. IEEE ICASSP, 5220-5224 (2017).
  12. A. Stolcke, "SRILM an extensible language modeling toolkit," Proc. ICSLP, 5220-5224 (2002).
  13. D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The Kaldi speech recognition toolkit," IEEE Workshop on Automatic Speech Recognition and Understanding (2011).