DOI QR코드

DOI QR Code

Performance of Pseudomorpheme-Based Speech Recognition Units Obtained by Unsupervised Segmentation and Merging

비교사 분할 및 병합으로 구한 의사형태소 음성인식 단위의 성능

  • Received : 2014.07.29
  • Accepted : 2014.09.10
  • Published : 2014.09.30

Abstract

This paper proposes a new method to determine the recognition units for large vocabulary continuous speech recognition (LVCSR) in Korean by applying unsupervised segmentation and merging. In the proposed method, a text sentence is segmented into morphemes and position information is added to morphemes. Then submorpheme units are obtained by splitting the morpheme units through the maximization of posterior probability terms. The posterior probability terms are computed from the morpheme frequency distribution, the morpheme length distribution, and the morpheme frequency-of-frequency distribution. Finally, the recognition units are obtained by sequentially merging the submorpheme pair with the highest frequency. Computer experiments are conducted using a Korean LVCSR with a 100k word vocabulary and a trigram language model obtained by a 300 million eojeol (word phrase) corpus. The proposed method is shown to reduce the out-of-vocabulary rate to 1.8% and reduce the syllable error rate relatively by 14.0%.

Keywords

References

  1. Kwon, O.-W., Hwang, K. & Park, J. (1999). Korean large vocabulary continuous speech recognition using pseudomorpheme units. Proc. EUROSPEECH, 483-486.
  2. Yu, H.-J., Kim, H., Choi, J.-S. & Hong, J.-M. (1998). Automatic recognition of Korean broadcast news speech. Proc. ICSLP.
  3. Kwon, O.-W. & Park, J. (2003). Korean large vocabulary continuous speech recognition with morpheme-based recognition units. Speech Communication, Vol. 39, No. 3-4, 287-300. https://doi.org/10.1016/S0167-6393(02)00031-6
  4. Creutz, M. & Lagus, K. (2002). Unsupervised discovery of Morphemes. Proc. ACL-02 Workshop on Morphological and Phonological Learning, 21-30.
  5. 김영택, 옥철영, 이호석, 윤덕호, 강승식, 심광섭, 윤성희, 서병락, 이재원, 김유섭, 이종우, 오장민, 김선, 권혁철, 서영훈, 이근배, 문유진, 이하규, 장병탁, 양재형, 양승현, 김성동, 박성배, 장정호, 황규백, 신형주. (2001). 자연언어처리. 서울 : 생능출판사.
  6. Creutz, M. (2006). Induction of the Morphology of Natural Language: Unsupervised Morpheme Segmentation with Application to Automatic Speech Recognition, Ph.D. Dissertation, Helsinki University of Technology, Finland.
  7. Schuster, M. & Nakajima, K. (2012). Japanese and Korean voice search. Proc. ICASSP, 5149-5152.
  8. Creutz, M. & Lagus, K. (2006). Morfessor in the Morpho Challenge. Proc. PASCAL Challenge Workshop on Unsupervised Segmentation of Words into Morphemes.
  9. Creutz, M. (2003). Unsupervised segmentation of words using prior distributions of morph length and frequency. Proc. ACL-03, 280-287.
  10. Siivola, V., Hirsimaki, T., Creutz, M. & Kurimo, M. (2003). Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner. Proc. EUROSPEECH, 2293-2296.
  11. Hirsimaki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S. & Janne. (2006). Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech & Language, Vol. 20, No. 4, 515-541. https://doi.org/10.1016/j.csl.2005.07.002
  12. Kwon, O.-W., Kim, H., Kwon, S., Yun, S., Jang, G., Kim, Y.-R., Kim, B.-W., Yoo, C., & Lee, Y.-J. (2007). Development of a Korean large vocabulary continuous speech recognition platform (ECHOS). Proc. O-COCOSDA, 108-111.
  13. Stolcke, A. (2002). SRILM-An extensible language modeling toolkit. Proc. INTERSPEECH, 901-904.
  14. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., & Vesely, K. (2011). The Kaldi speech recognition toolkit. Proc. ASRU, 1-4.
  15. Downloading Kaldi. http://kaldi.sourceforge.net/install.html.
  16. 박종렬, 권오욱, 김도영, 최인정, 정호영, 은종관. (1995). 한국어 음성 인식을 위한 음성 데이터 수집. 음향학회지, 14권 4호, 74-81.
  17. 최인정, 권오욱, 박종렬, 박용규, 김도영, 정호영, 은종관. (1995). 대용량 한국어 연속음성인식 시스템 개발. 음향학회지, 14권 5호, 44-50.
  18. Jurafsky, D. and Martin, J. H. (2008). Speech and Language Processing, 2e. 95.
  19. Openmoko wiki. (2012). Google Voice Recognition. http://wiki.openmoko.org/wiki/Google_Voice_Recognition.
  20. Zipf's law, http://en.wikipedia.org/wiki/Zipf%27s_law.
  21. Jurafsky, D. and Martin, J. H. (2008). Speech and Language Processing, 2e. 4.5.2 Good-Turing Discounting.