News Data Analysis Using Acoustic Model Output of Continuous Speech Recognition

연속음성인식의 음향모델 출력을 이용한 뉴스 데이터 분석

  • Published : 2006.10.28

Abstract

In this paper, the acoustic model output of CSR(Continuous Speech Recognition) was used to analyze news data News database used in this experiment was consisted of 2,093 articles. Due to the low efficiency of language model, conventional Korean CSR is not appropriate to the analysis of news data. This problem could be handled successfully by introducing post-processing work of recognition result of acoustic model. The acoustic model more robust than language model in Korean environment. The result of post-processing work was made into KIF(Keyword information file). When threshold of acoustic model's output level was 100, 86.9% of whole target morpheme was included in post-processing result. At the same condition, applying length information based normalization, 81.25% of whole target morpheme was recognized. The purpose of normalization was to compensate long-length morpheme. According to experiment result, 75.13% of whole target morpheme was recognized KIF(314MB) had been produced from original news data(5,040MB). The decrease rate of absolute information met was approximately 93.8%.

Keywords

News Data Analysis;Morpheme Based CSR