• Title/Summary/Keyword: noise-robust speech recognition

Search Result 134, Processing Time 0.025 seconds

On-line model compensation using noise masking effect for robust speech recognition (잡음 차폐를 이용한 온라인 모델 보상)

  • Jung Gue-Jun;Cho Hoon-Young;Oh Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.215-218
    • /
    • 2003
  • In this paper we apply PMC (parallel model combination) to speech recognition system online. As a representative of model based noise compensation techniques, PMC compensates environmental mismatch by combining pretrained clean speech models and real-time estimated noise information. This is very effective approach for compensating extreme environmental mismatch but is inadequate to use in on-line system for heavy computational cost. To reduce the computational cost and to apply PMC online, we use a noise masking effect - the energy in a frequency band is dominated either by clean speech energy or by noise energy - in the process of model compensation. Experiments on artificially produced noisy speech data confirm that the proposed technique is fast and effective for the on-line model compensation.

  • PDF

A Novel Integration Scheme for Audio Visual Speech Recognition

  • Pham, Than Trung;Kim, Jin-Young;Na, Seung-You
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.8
    • /
    • pp.832-842
    • /
    • 2009
  • Automatic speech recognition (ASR) has been successfully applied to many real human computer interaction (HCI) applications; however, its performance tends to be significantly decreased under noisy environments. The invention of audio visual speech recognition (AVSR) using an acoustic signal and lip motion has recently attracted more attention due to its noise-robustness characteristic. In this paper, we describe our novel integration scheme for AVSR based on a late integration approach. Firstly, we introduce the robust reliability measurement for audio and visual modalities using model based information and signal based information. The model based sources measure the confusability of vocabulary while the signal is used to estimate the noise level. Secondly, the output probabilities of audio and visual speech recognizers are normalized respectively before applying the final integration step using normalized output space and estimated weights. We evaluate the performance of our proposed method via Korean isolated word recognition system. The experimental results demonstrate the effectiveness and feasibility of our proposed system compared to the conventional systems.

Nonlinear Speech Enhancement Method for Reducing the Amount of Speech Distortion According to Speech Statistics Model (음성 통계 모형에 따른 음성 왜곡량 감소를 위한 비선형 음성강조법)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.465-470
    • /
    • 2021
  • A robust speech recognition technology is required that does not degrade the performance of speech recognition and the quality of the speech when speech recognition is performed in an actual environment of the speech mixed with noise. With the development of such speech recognition technology, it is necessary to develop an application that achieves stable and high speech recognition rate even in a noisy environment similar to the human speech spectrum. Therefore, this paper proposes a speech enhancement algorithm that processes a noise suppression based on the MMSA-STSA estimation algorithm, which is a short-time spectral amplitude method based on the error of the least mean square. This algorithm is an effective nonlinear speech enhancement algorithm based on a single channel input and has high noise suppression performance. Moreover this algorithm is a technique that reduces the amount of distortion of the speech based on the statistical model of the speech. In this experiment, in order to verify the effectiveness of the MMSA-STSA estimation algorithm, the effectiveness of the proposed algorithm is verified by comparing the input speech waveform and the output speech waveform.

A Noise Robust Speech Recognition Method Using Model Compensation Based on Speech Enhancement (음성 개선 기반의 모델 보상 기법을 이용한 강인한 잡음 음성 인식)

  • Shen, Guang-Hu;Jung, Ho-Youl;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.4
    • /
    • pp.191-199
    • /
    • 2008
  • In this paper, we propose a MWF-PMC noise processing method which enhances the input speech by using Mel-warped Wiener Filtering (MWF) at pre-processing stage and compensates the recognition model by using PMC (Parallel Model Combination) at post-processing stage for speech recognition in noisy environments. The PMC uses the residual noise extracted from the silence region of enhanced speech at pre-processing stage to compensate the clean speech model and thus this method is considered to improve the performance of speech recognition in noisy environments. For recognition experiments we dew.-sampled KLE PBW (Phoneme Balanced Words) 452 word speech data to 8kHz and made 5 different SNR levels of noisy speech, i.e., 0dB. 5dB, 10dB, 15dB and 20dB, by adding Subway, Car and Exhibition noise to clean speech. From the recognition results, we could confirm the effectiveness of the proposed MWF-PMC method by obtaining the improved recognition performances over all compared with the existing combined methods.

Energy Feature Normalization for Robust Speech Recognition in Noisy Environments

  • Lee, Yoon-Jae;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.129-139
    • /
    • 2006
  • In this paper, we propose two effective energy feature normalization methods for robust speech recognition in noisy environments. In the first method, we estimate the noise energy and remove it from the noisy speech energy. In the second method, we propose a modified algorithm for the Log-energy Dynamic Range Normalization (ERN) method. In the ERN method, the log energy of the training data in a clean environment is transformed into the log energy in noisy environments. If the minimum log energy of the test data is outside of a pre-defined range, the log energy of the test data is also transformed. Since the ERN method has several weaknesses, we propose a modified transform scheme designed to reduce the residual mismatch that it produces. In the evaluation conducted on the Aurora2.0 database, we obtained a significant performance improvement.

  • PDF

Adaptive Band Selection for Robust Speech Detection In Noisy Environments

  • Ji Mikyong;Suh Youngjoo;Kim Hoirin
    • MALSORI
    • /
    • no.50
    • /
    • pp.85-97
    • /
    • 2004
  • One of the important problems in speech recognition is to accurately detect the existence of speech in adverse environments. The speech detection problem becomes severer when recognition systems are used over the telephone network, especially in a wireless network and a noisy environment. In this paper, we propose a robust speech detection algorithm, which detects speech boundaries accurately by selecting useful bands adaptively to noisy environments. The bands where noises are mainly distributed, so called, noise-centric bands are introduced. In this paper, we compare two different speech detection algorithms with the proposed algorithm, and evaluate them on noisy environments. The experimental results show the excellence of the proposed speech detection algorithm.

  • PDF

Noise Robust Speech Recognition Based on Noisy Speech Acoustic Model Adaptation (잡음음성 음향모델 적응에 기반한 잡음에 강인한 음성인식)

  • Chung, Yongjoo
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.29-34
    • /
    • 2014
  • In the Vector Taylor Series (VTS)-based noisy speech recognition methods, Hidden Markov Models (HMM) are usually trained with clean speech. However, better performance is expected by training the HMM with noisy speech. In a previous study, we could find that Minimum Mean Square Error (MMSE) estimation of the training noisy speech in the log-spectrum domain produce improved recognition results, but since the proposed algorithm was done in the log-spectrum domain, it could not be used for the HMM adaptation. In this paper, we modify the previous algorithm to derive a novel mathematical relation between test and training noisy speech in the cepstrum domain and the mean and covariance of the Multi-condition TRaining (MTR) trained noisy speech HMM are adapted. In the noisy speech recognition experiments on the Aurora 2 database, the proposed method produced 10.6% of relative improvement in Word Error Rates (WERs) over the MTR method while the previous MMSE estimation of the training noisy speech produced 4.3% of relative improvement, which shows the superiority of the proposed method.

Robust Speech Recognition Using Real-Time Higher Order Statistics Normalization (고차통계 정규화를 이용한 강인한 음성인식)

  • Jeong, Ju-Hyun;Song, Hwa-Jeon;Kim, Hyung-Soon
    • MALSORI
    • /
    • no.54
    • /
    • pp.63-72
    • /
    • 2005
  • The performance of speech recognition system is degraded by the mismatch between training and test environments. Many studies have been presented to compensate for noise components in the cepstral domain. Recently, higher order cepstral moment normalization method has been introduced to improve recognition accuracy. In this paper, we present real-time high order moment normalization method with post-processing smoothing filter to reduce the parameter estimation error in higher order moment computation. In experiments using Aurora2 database, we obtained error rate reduction of 44.7% with proposed algorithm in comparison with baseline system.

  • PDF

CHMM Modeling using LMS Algorithm for Continuous Speech Recognition Improvement (연속 음성 인식 향상을 위해 LMS 알고리즘을 이용한 CHMM 모델링)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.10 no.11
    • /
    • pp.377-382
    • /
    • 2012
  • In this paper, the echo noise robust CHMM learning model using echo cancellation average estimator LMS algorithm is proposed. To be able to adapt to the changing echo noise. For improving the performance of a continuous speech recognition, CHMM models were constructed using echo noise cancellation average estimator LMS algorithm. As a results, SNR of speech obtained by removing Changing environment noise is improved as average 1.93dB, recognition rate improved as 2.1%.

A Study on Environment Parameter Compensation Method for Robust Speech Recognition (잡음에 강인한 음성 인식을 위한 환경 파라미터 보상에 관한 연구)

  • Hong, Mi-Jung;Lee, Ho-Woong
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.5 no.2 s.10
    • /
    • pp.1-10
    • /
    • 2006
  • In this paper, VTS(Vector Taylor Series) algorithm, which was proposed by Moreno at Carnegie Mellon University in 1996, is analyzed and simulated. VTS is considered to be one of the robust speech recognition techniques where model parameter conversion technique is adapted. To evaluation performance of the VTS algorithm, We used CMN(Cepstral Mean Normalization) technique which is one of the well-known noise processing methods. And the recognition rate is evaluated when white gaussian and street noise are employed as background noise. Also, the simulation result is analyzed in order to be compared with the previous one which was performed by Moreno.

  • PDF