Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook (Brain Science Research Center, KAIST) ;
  • Alex Waibel (Interactive Systems Laboratories, University of Karlsruhe)
  • 발행 : 2002.03.01

초록

Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

키워드

참고문헌

  1. D. S. Pallet, 'Overview of the 1997 DARPA speech recog-nition workshop,' Proc. 1997 DARPA Speech Recognition Workshop, Feb. 1997
  2. J. S. Garofolo, J. G. Fiscus, W. M. Fisher, 'Design and preparation of the 1996 HUB-4 broadcast news benchmark test corpora,' Proc. 1997 DARPA Speech Recognition Workshop, Feb. 1997
  3. D. S. Pallet, J. G. Fiscus, J. S. Garofolo, A. Martin, M. A. Przybocki, '1998 Broadcast News Benchmark Test Results,' Proc. 1999 DARPA Broadcast News Workshop, Feb. 1999
  4. D. S. Pallett, J. Fiscus, M. Przybocki, 'Broadcast News 1999 Test Results,' Proc. 2000 DARPA Speech Transcription Workshop, May, 2000
  5. K. Ohtsuki, S. Furui, N. Sakurai, A. Iwasaki, Z. P. Zhang, 'Improvements in Japanese Broadcast News Transcription,' Proc. 1999 DARPA Broadcast News Transcription, Feb. 1999
  6. K. Ohtshuki, T. Matsuoka, T. Mori, K. Yoshida, Y. Taguchi, S. Furui, K. Shirai, 'Japanese large-vocabulary continuous-speech recognition using a newspaper corpus and broadcast news,' Speech Communication 28, pp. 155-166, 1999 https://doi.org/10.1016/S0167-6393(99)00006-0
  7. H. J. Yu, H. Kim, J. S. Choi, J. M. Hong, K. S. Park, J. S. Lee, H. Y. Lee, 'Automatic recognition of Korean broadcast news speech,' Proc. ICSLP'98, Sydney, Australia, Dec. 1998
  8. O. W. Kwon, K. Hwang, J. Park, 'Korean large vocabulary continuous speech recognition using pseudomorpheme units,' Proc. EUROSPEECH'99, Budapest, Hungary, Sept. 1999
  9. O. W. Kwon, 'Performance of LVCSR with morpheme-based and syllable-based recognition units,' Proc. ICASSP 2000, pp. 1567-1570, June 2000
  10. P. Geutner, 'Using morphology towards better large-vocabulary speech recognition systems,' Proc. ICASSP'95, Detroit, USA, May 1995
  11. P. Scheytt, P. Geutner, A. Waibel, 'Serbo-Crotian LVCSR on the dictation and broadcasting news domain,' Proc. ICASSP'98, Seattle, USA, May 1998
  12. L. M. Tomokiyo, K. Ries, 'An automatic method for learning a Japanese lexicon for recognition of spontaneous speech,' Proc. ICASSP'98, Seattle, USA, May 1998
  13. H. K. J. Kuo, W. Reichl, "Phrased-based language models for speech recognition," EUROSPEECH'99, Budapest, Hungary, Sept. 1999
  14. L. M. Tomokiyo, K. Ries, 'An automatic method for learning a Japanese lexicon for recognition of spontaneous speech,' ICASSP'98, Seattle, USA, May 1998
  15. K. Ries, F. D. Buo, A. Waibel, 'Class phrase models for language modeling,' ICSLP'96, Philadelphia, USA, Oct 1996
  16. D. Kiecza, T. Schultz, A. Waibel, 'Data-driven determination of appropriate dictionary units for Korean LVCSR,' Proc. international Conference on Speech Processing (ICSP'99), pp. 323-327, Aug. 1999
  17. G. S. Lee, A. Waibel, 'Korean broadcast news speech recognition using HMM,' Proc. international Conference on Speech Processing (ICSP'99), Aug. 1999
  18. J. H. Kim, Lexical Disambiguation with Error-Driven Learning, Ph. D. dissert. Dept. Computer Science, Korea Advanced Institute of Science and Technology, 1996
  19. J. Jeon, S. Cha, M. Chung, J. Park, K. Hwang, 'Automatic generation of Korean pronunciation variants by multistage applications of phonological rules,' Proc. ICSLP'98, Sydney, Australia, Dec. 1998
  20. M. K. Ravishankar, Efficient Algorithms for Speech Recog-nition, Ph. D dissert., School of Computer Science, Carnegie Mellon Univ., 1996
  21. M. Finke, P. Geutner, H. Hild, T. Kemp, K. Ries, M. Westphal, 'The KarIsruhe-VerbmobiI speech recognition engine,' Proc. ICASSP'97, Munich, Germany, 1997
  22. P. Clarkson, R. Rosenfeld, 'Statistical language modeling using the CMU-Cambridge toolkit,' Proc. EUROSPEECH'97, pp. 2707-2710, 1997
  23. S. M. Katz, 'Estimation of probabilities from sparse data for the language model component of a speech recognizer,' IEEE Trans. Acousiics, Speech, and Signal Processing, vol. 35, pp. 400-401, 1987 https://doi.org/10.1109/TASSP.1987.1165125
  24. R. Bakis, S. Chen, P. Gopalakrishnan, R. Gopinath, S. Maes, L. Polymenakos, and M. Franz, 'Transcription of Broadcast News Shows with the IBM Large Vocabulary Speech Recognition System,' Proc. 1997 DARPA Speech Recognition Workshop, Feb. 1997
  25. P. Beyerlein, X. Aubert, R. Haeb-Umbach, M. Harris, Dietrich Klakow, A. Wendemuth, Sirko Molau, Michael Pitz, A. Sixtus, 'The Philips/RWTH System for Transcription of Broadcast News,' Proc. DARPA Broadcast News Transcrip-tion, Feb. 1999
  26. J. L. Gauvain.L. Lamel, G. Adda, M. Jardino, 'The LIMSI 1998 HUB-4E Transcription system,' Proc. DARPA Broadcast News Transcription, Feb. 1999
  27. H. J. Yu, H. Kim, J. M. Hong, M. S. Kim, J. S. Lee, 'Large vocabulary Korean continuous speech recognition using a one-pass algorithm,' Proc. ICSLP 2000, Oct. 2000