• Title/Summary/Keyword: Noisy Speech

Search Result 395, Processing Time 0.05 seconds

Noise Reduction Using MMSE Estimator-based Adaptive Comb Filtering (MMSE Estimator 기반의 적응 콤 필터링을 이용한 잡음 제거)

  • Park, Jeong-Sik;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.60
    • /
    • pp.181-190
    • /
    • 2006
  • This paper describes a speech enhancement scheme that leads to significant improvements in recognition performance when used in the ASR front-end. The proposed approach is based on adaptive comb filtering and an MMSE-related parameter estimator. While adaptive comb filtering reduces noise components remarkably, it is rarely effective in reducing non-stationary noises. Furthermore, due to the uniformly distributed frequency response of the comb-filter, it can cause serious distortion to clean speech signals. This paper proposes an improved comb-filter that adjusts its spectral magnitude to the original speech, based on the speech absence probability and the gain modification function. In addition, we introduce the modified comb filtering-based speech enhancement scheme for ASR in mobile environments. Evaluation experiments carried out using the Aurora 2 database demonstrate that the proposed method outperforms conventional adaptive comb filtering techniques in both clean and noisy environments.

  • PDF

Preprocessing Technique for Improvement of Speech Recognition in a Car (차량에서의 음성인식율 향상을 위한 전처리 기법)

  • Kim, Hyun-Tae;Park, Jang-Sik
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.1
    • /
    • pp.139-146
    • /
    • 2009
  • This paper addresses a modified spectral subtraction schemes which is suitable to speech recognition under low signal-to-noise ratio (SNR) noisy environment such as the automatic speech recognition (ASR) system in car. The conventional spectral subtraction schemes rely on the SNR such that attenuation is imposed on that part of the spectrum that appears to have low SNR, and accentuation is made on that part of high SNR. However, such postulation is adequate for high SNR environment, it is grossly inadequate for low SNR scenarios such as that of car environment. Proposed methods focused specifically to low SNR noisy environment by using weighting function for enhancing speech dominant region in speech spectrum. Experimental results by using voice commands for car show the superior performance of the proposed method over conventional methods.

Feature Compensation Method Based on Parallel Combined Mixture Model (병렬 결합된 혼합 모델 기반의 특징 보상 기술)

  • 김우일;이흥규;권오일;고한석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.603-611
    • /
    • 2003
  • This paper proposes an effective feature compensation scheme based on speech model for achieving robust speech recognition. Conventional model-based method requires off-line training with noisy speech database and is not suitable for online adaptation. In the proposed scheme, we can relax the off-line training with noisy speech database by employing the parallel model combination technique for estimation of correction factors. Applying the model combination process over to the mixture model alone as opposed to entire HMM makes the online model combination possible. Exploiting the availability of noise model from off-line sources, we accomplish the online adaptation via MAP (Maximum A Posteriori) estimation. In addition, the online channel estimation procedure is induced within the proposed framework. For more efficient implementation, we propose a selective model combination which leads to reduction or the computational complexities. The representative experimental results indicate that the suggested algorithm is effective in realizing robust speech recognition under the combined adverse conditions of additive background noise and channel distortion.

Nose Estimation and Suppression methods based on Normalized Variance in Time-Frequency for Speech Enhancement (음성강화를 위한 시간 및 주파수 도메인의 분산정규화 기반 잡음예측 및 저감방법)

  • Lee, Soo-Jeong;Kim, Soon-Hyob
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.1
    • /
    • pp.87-94
    • /
    • 2009
  • Noise estimation and suppression are a crucial factor of many speech communication and recognition systems. In this paper, proposed algorithm is based on the ratio of variance normalized of noisy power spectrum in time-frequency domain. Our proposed algorithm tracks the threshold and controls the trade-off between residual noise and distortion. This algorithm is evaluated by the ITU-T P.835 signal distortion (SIG) and segment signal to noise ratio (SNR), and is superior to the conventional methods.

Speaker Identification Based on Vowel Classification and Vector Quantization (모음 인식과 벡터 양자화를 이용한 화자 인식)

  • Lim, Chang-Heon;Lee, Hwang-Soo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.8 no.4
    • /
    • pp.65-73
    • /
    • 1989
  • In this paper, we propose a text-independent speaker identification algorithm based on VQ(vector quantization) and vowel classification, and its performance is studied and compared with that of a conventional speaker identification algorithm using VQ. The proposed speaker identification algorithm is composed of three processes: vowel segmentation, vowel recognition and average distortion calculation. The vowel segmentation is performed automatlcally using RMS energy, BTR(Back-to-Total cavity volume Ratio)and SFBR(Signed Front-to-Back maximum area Ratio) extracted from input speech signal. If the Input speech signal Is noisy, particularity when the SNR is around 20dB, the proposed speaker identification algorithm performs better than the reference speaker identification algorithm when the correct vowel segmentation is done. The same result is obtained when we use the noisy telephone speech signal as an input, too.

  • PDF

Speech Enhancement Based on IMCRA Incorporating noise classification algorithm (잡음 환경 분류 알고리즘을 이용한 IMCRA 기반의 음성 향상 기법)

  • Song, Ji-Hyun;Park, Gyu-Seok;An, Hong-Sub;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.12
    • /
    • pp.1920-1925
    • /
    • 2012
  • In this paper, we propose a novel method to improve the performance of the improved minima controlled recursive averaging (IMCRA) in non-stationary noisy environment. The conventional IMCRA algorithm efficiently estimate the noise power by averaging past spectral power values based on a smoothing parameter that is adjusted by the signal presence probability in frequency subbands. Since the minimum of smoothing parameter is defined as 0.85, it is difficult to obtain the robust estimates of the noise power in non-stationary noisy environments that is rapidly changed the spectral characteristics such as babble noise. For this reason, we proposed the modified IMCRA, which adaptively estimate and updata the noise power according to the noise type classified by the Gaussian mixture model (GMM). The performances of the proposed method are evaluated by perceptual evaluation of speech quality (PESQ) and composite measure under various environments and better results compared with the conventional method are obtained.

Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

  • Chung, Hoon;Lee, Sung Joo;Lee, Yun Keun
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.714-720
    • /
    • 2014
  • In this paper, we propose the use of data-driven probabilistic utterance-level decision logic to improve Weighted Finite State Transducer (WFST)-based endpoint detection. In general, endpoint detection is dealt with using two cascaded decision processes. The first process is frame-level speech/non-speech classification based on statistical hypothesis testing, and the second process is a heuristic-knowledge-based utterance-level speech boundary decision. To handle these two processes within a unified framework, we propose a WFST-based approach. However, a WFST-based approach has the same limitations as conventional approaches in that the utterance-level decision is based on heuristic knowledge and the decision parameters are tuned sequentially. Therefore, to obtain decision knowledge from a speech corpus and optimize the parameters at the same time, we propose the use of data-driven probabilistic utterance-level decision logic. The proposed method reduces the average detection failure rate by about 14% for various noisy-speech corpora collected for an endpoint detection evaluation.

User-customized Interaction using both Speech and Face Recognition (음성인식과 얼굴인식을 사용한 사용자 환경의 상호작용)

  • Kim, Sung-Ill;Oh, Se-Jin;Lee, Sang-Yong;Hwang, Seung-Gook
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.04a
    • /
    • pp.397-400
    • /
    • 2007
  • In this paper, we discuss the user-customized interaction for intelligent home environments. The interactive system is based upon the integrated techniques using both speech and face recognition. For essential modules, the speech recognition and synthesis were basically used for a virtual interaction between user and proposed system. In experiments, particularly, the real-time speech recognizer based on the HM-Net(Hidden Markov Network) was incorporated into the integrated system. Besides, the face identification was adopted to customize home environments for a specific user. In evaluation, the results showed that the proposed system was easy to use for intelligent home environments, even though the performance of the speech recognizer did not show a satisfactory results owing to the noisy environments.

  • PDF

Filtering of a Dissonant Frequency for Speech Enhancement

  • Kang, Sang-Ki;Baek, Seong-Joon;Lee, Ki-Yong;Sun, Koeng-Mo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3E
    • /
    • pp.110-112
    • /
    • 2003
  • There have been numerous studies on the enhancement of the noisy speech signal. In this paper, we propose a completely new speech enhancement scheme, that is, a filtering of a dissonant frequency (especially F# in each octave of the tempered scale) based on the fundamental frequency which is developed in frequency domain. In order to evaluate the performance of the proposed enhancement scheme, subjective tests (MOS tests) were conducted. The subjective test results indicate that the proposed method provides a significant gain in audible improvement especially for speech contaminated by colored noise and speaking in a husky voice. Therefore when the filter is employed as a pre-filter for speech enhancement, the output speech quality and intelligibility is greatly enhanced.

Improved Bimodal Speech Recognition Study Based on Product Hidden Markov Model

  • Xi, Su Mei;Cho, Young Im
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.13 no.3
    • /
    • pp.164-170
    • /
    • 2013
  • Recent years have been higher demands for automatic speech recognition (ASR) systems that are able to operate robustly in an acoustically noisy environment. This paper proposes an improved product hidden markov model (HMM) used for bimodal speech recognition. A two-dimensional training model is built based on dependently trained audio-HMM and visual-HMM, reflecting the asynchronous characteristics of the audio and video streams. A weight coefficient is introduced to adjust the weight of the video and audio streams automatically according to differences in the noise environment. Experimental results show that compared with other bimodal speech recognition approaches, this approach obtains better speech recognition performance.