• Title/Summary/Keyword: Post-processing for Speech Recognition

Search Result 36, Processing Time 0.024 seconds

Performance Evaluation of Nonkeyword Modeling and Postprocessing for Vocabulary-independent Keyword Spotting (가변어휘 핵심어 검출을 위한 비핵심어 모델링 및 후처리 성능평가)

  • Kim, Hyung-Soon;Kim, Young-Kuk;Shin, Young-Wook
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.225-239
    • /
    • 2003
  • In this paper, we develop a keyword spotting system using vocabulary-independent speech recognition technique, and investigate several non-keyword modeling and post-processing methods to improve its performance. In order to model non-keyword speech segments, monophone clustering and Gaussian Mixture Model (GMM) are considered. We employ likelihood ratio scoring method for the post-processing schemes to verify the recognition results, and filler models, anti-subword models and N-best decoding results are considered as an alternative hypothesis for likelihood ratio scoring. We also examine different methods to construct anti-subword models. We evaluate the performance of our system on the automatic telephone exchange service task. The results show that GMM-based non-keyword modeling yields better performance than that using monophone clustering. According to the post-processing experiment, the method using anti-keyword model based on Kullback-Leibler distance and N-best decoding method show better performance than other methods, and we could reduce more than 50% of keyword recognition errors with keyword rejection rate of 5%.

  • PDF

A Model for Post-processing of Speech Recognition Using Syntactic Unit of Morphemes (구문형태소 단위를 이용한 음성 인식의 후처리 모델)

  • 양승원;황이규
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.7 no.3
    • /
    • pp.74-80
    • /
    • 2002
  • There are many researches on post-processing methods for the Korean continuous speech recognition enhancement using natural language processing techniques. It is very difficult to use a formal morphological analyzer for improving the speech recognition because the analysis technique of natural language processing is mainly for formal written languages. In this paper, we propose a speech recognition enhancement model using syntactic unit of morphemes. This approach uses the functional word level longest match which dose not consider spacing words. We describe the post-processing mechanism for the improving speech recognition by using proposed model which uses the relationship of phonological structure information between predicates md auxiliary predicates or bound nouns that are frequently occurred in Korean sentences.

  • PDF

Robust Speech Recognition in the Car Interior Environment having Car Noise and Audio Output (자동차 잡음 및 오디오 출력신호가 존재하는 자동차 실내 환경에서의 강인한 음성인식)

  • Park, Chul-Ho;Bae, Jae-Chul;Bae, Keun-Sung
    • MALSORI
    • /
    • no.62
    • /
    • pp.85-96
    • /
    • 2007
  • In this paper, we carried out recognition experiments for noisy speech having various levels of car noise and output of an audio system using the speech interface. The speech interface consists of three parts: pre-processing, acoustic echo canceller, post-processing. First, a high pass filter is employed as a pre-processing part to remove some engine noises. Then, an echo canceller implemented by using an FIR-type filter with an NLMS adaptive algorithm is used to remove the music or speech coming from the audio system in a car. As a last part, the MMSE-STSA based speech enhancement method is applied to the out of the echo canceller to remove the residual noise further. For recognition experiments, we generated test signals by adding music to the car noisy speech from Aurora 2 database. The HTK-based continuous HMM system is constructed for a recognition system. Experimental results show that the proposed speech interface is very promising for robust speech recognition in a noisy car environment.

  • PDF

User Adaptive Post-Processing in Speech Recognition for Mobile Devices (모바일 기기를 위한 음성인식의 사용자 적응형 후처리)

  • Kim, Young-Jin;Kim, Eun-Ju;Kim, Myung-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.5
    • /
    • pp.338-342
    • /
    • 2007
  • In this paper we propose a user adaptive post-processing method to improve the accuracy of speaker dependent, isolated word speech recognition, particularly for mobile devices. Our method considers the recognition result of the basic recognizer simply as a high-level speech feature and processes it further for correct recognition result. Our method learns correlation between the output of the basic recognizer and the correct final results and uses it to correct the erroneous output of the basic recognizer. A multi-layer perceptron model is built for each incorrectly recognized word with high frequency. As the result of experiments, we achieved a significant improvement of 41% in recognition accuracy (41% error correction rate).

An Automatic Post-processing Method for Speech Recognition using CRFs and TBL (CRFs와 TBL을 이용한 자동화된 음성인식 후처리 방법)

  • Seon, Choong-Nyoung;Jeong, Hyoung-Il;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.9
    • /
    • pp.706-711
    • /
    • 2010
  • In the applications of a human speech interface, reducing the error rate in recognition is the one of the main research issues. Many previous studies attempted to correct errors using post-processing, which is dependent on a manually constructed corpus and correction patterns. We propose an automatically learnable post-processing method that is independent of the characteristics of both the domain and the speech recognizer. We divide the entire post-processing task into two steps: error detection and error correction. We consider the error detection step as a classification problem for which we apply the conditional random fields (CRFs) classifier. Furthermore, we apply transformation-based learning (TBL) to the error correction step. Our experimental results indicate that the proposed method corrects a speech recognizer's insertion, deletion, and substitution errors by 25.85%, 3.57%, and 7.42%, respectively.

Multiple Acoustic Cues for Stop Recognition

  • Yun, Weon-Hee
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.3-16
    • /
    • 2003
  • ㆍAcoustic characteristics of stops in speech with contextual variability ㆍPosibility of stop recognition by post processing technique ㆍFurther work - Speech database - Modification of decoder - automatic segmentation of acoustic parameters

  • PDF

Semantic-oriented Error Correction for Spoken Query Processing (음성 질의 처리를 위한 의미 기반 오류 수정)

  • Jeong Minwoo;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.153-156
    • /
    • 2003
  • Voice input is often required in many new application environments such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low success rate of speech recognition makes it difficult to extend its application to new fields. Popular approaches to increase the accuracy of the recognition rate have been researched by post-processing of the recognition results, but previous approaches were mainly lexical-oriented ones in post error correction. We suggest a new semantic-oriented approach to correct both semantic level and lexical errors, which is also more accurate for especially domain-specific speech error correction. Through extensive experiments using a speech-driven in-vehicle telematics information application, we demonstrate the superior performance of our approach and some advantages over previous lexical-oriented approaches.

  • PDF

Recognition Performance Improvement of Unsupervised Limabeam Algorithm using Post Filtering Technique

  • Nguyen, Dinh Cuong;Choi, Suk-Nam;Chung, Hyun-Yeol
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.4
    • /
    • pp.185-194
    • /
    • 2013
  • Abstract- In distant-talking environments, speech recognition performance degrades significantly due to noise and reverberation. Recent work of Michael L. Selzer shows that in microphone array speech recognition, the word error rate can be significantly reduced by adapting the beamformer weights to generate a sequence of features which maximizes the likelihood of the correct hypothesis. In this approach, called Likelihood Maximizing Beamforming algorithm (Limabeam), one of the method to implement this Limabeam is an UnSupervised Limabeam(USL) that can improve recognition performance in any situation of environment. From our investigation for this USL, we could see that because the performance of optimization depends strongly on the transcription output of the first recognition step, the output become unstable and this may lead lower performance. In order to improve recognition performance of USL, some post-filter techniques can be employed to obtain more correct transcription output of the first step. In this work, as a post-filtering technique for first recognition step of USL, we propose to add a Wiener-Filter combined with Feature Weighted Malahanobis Distance to improve recognition performance. We also suggest an alternative way to implement Limabeam algorithm for Hidden Markov Network (HM-Net) speech recognizer for efficient implementation. Speech recognition experiments performed in real distant-talking environment confirm the efficacy of Limabeam algorithm in HM-Net speech recognition system and also confirm the improved performance by the proposed method.

A Noise Robust Speech Recognition Method Using Model Compensation Based on Speech Enhancement (음성 개선 기반의 모델 보상 기법을 이용한 강인한 잡음 음성 인식)

  • Shen, Guang-Hu;Jung, Ho-Youl;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.4
    • /
    • pp.191-199
    • /
    • 2008
  • In this paper, we propose a MWF-PMC noise processing method which enhances the input speech by using Mel-warped Wiener Filtering (MWF) at pre-processing stage and compensates the recognition model by using PMC (Parallel Model Combination) at post-processing stage for speech recognition in noisy environments. The PMC uses the residual noise extracted from the silence region of enhanced speech at pre-processing stage to compensate the clean speech model and thus this method is considered to improve the performance of speech recognition in noisy environments. For recognition experiments we dew.-sampled KLE PBW (Phoneme Balanced Words) 452 word speech data to 8kHz and made 5 different SNR levels of noisy speech, i.e., 0dB. 5dB, 10dB, 15dB and 20dB, by adding Subway, Car and Exhibition noise to clean speech. From the recognition results, we could confirm the effectiveness of the proposed MWF-PMC method by obtaining the improved recognition performances over all compared with the existing combined methods.

Adaptive Post Processing of Nonlinear Amplified Sound Signal

  • Lee, Jae-Kyu;Choi, Jong-Suk;Seok, Cheong-Gyu;Kim, Mun-Sang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.872-876
    • /
    • 2005
  • We propose a real-time post processing of nonlinear amplified signal to improve voice recognition in remote talk. In the previous research, we have found the nonlinear amplification has unique advantage for both the voice activity detection and the sound localization in remote talk. However, the original signal becomes distorted due to its nonlinear amplification and, as a result, the rest of sequence such as speech recognition show less satisfactorily results. To remedy this problem, we implement a linearization algorithm to recover the voice signal's linear characteristics after the localization has been done.

  • PDF