• Title/Summary/Keyword: Automatic concatenate morphemes generation

Search Result 1, Processing Time 0.017 seconds

Automatic Generation of Concatenate Morphemes for Korean LVCSR (대어휘 연속음성 인식을 위한 결합형태소 자동생성)

  • 박영희;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.407-414
    • /
    • 2002
  • In this paper, we present a method that automatically generates concatenate morpheme based language models to improve the performance of Korean large vocabulary continuous speech recognition. The focus was brought into improvement against recognition errors of monosyllable morphemes that occupy 54% of the training text corpus and more frequently mis-recognized. Knowledge-based method using POS patterns has disadvantages such as the difficulty in making rules and producing many low frequency concatenate morphemes. Proposed method automatically selects morpheme-pairs from training text data based on measures such as frequency, mutual information, and unigram log likelihood. Experiment was performed using 7M-morpheme text corpus and 20K-morpheme lexicon. The frequency measure with constraint on the number of morphemes used for concatenation produces the best result of reducing monosyllables from 54% to 30%, bigram perplexity from 117.9 to 97.3. and MER from 21.3% to 17.6%.