• Title/Summary/Keyword: Noisy Speech

Search Result 395, Processing Time 0.023 seconds

Noise Reduction using Spectral Subtraction in the Discrete Wavelet Transform Domain (이산 웨이브렛 변환영역에서의 스펙트럼 차감법을 이용한 잡음제거)

  • 김현기;이상운;홍재근
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.4
    • /
    • pp.306-315
    • /
    • 2001
  • In noise reduction method from noisy speech for speech recognition in noisy environments, conventional spectral subtraction method has a disadvantage which distinction of noise and speech is difficult, and characteristic of noise can't be estimated accurately. Also, noise reduction method in the wavelet transform domain has a disadvantage which loss of signal is generated in the high frequency domain. In order to compensate theme disadvantage, this paper propose spectral subtraction method in continuous wavelet transform domain which speech and non- speech intervals is distinguished by standard deviation of wavelet coefficient, and signal is divided three scales at different scale. The proposed method extract accurately characteristic of noise in order to apply spectral subtraction method by end detection and band division. The proposed method shows better performance than noise reduction method using conventional spectral subtraction and wavelet transform from viewpoint signal to noise ratio and Itakura-Saito distance by experimental.

  • PDF

A Novel Two-Level Pitch Detection Approach for Speaker Tracking in Robot Control

  • Hejazi, Mahmoud R.;Oh, Han;Kim, Hong-Kook;Ho, Yo-Sung
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.89-92
    • /
    • 2005
  • Using natural speech commands for controlling a human-robot is an interesting topic in the field of robotics. In this paper, our main focus is on the verification of a speaker who gives a command to decide whether he/she is an authorized person for commanding. Among possible dynamic features of natural speech, pitch period is one of the most important ones for characterizing speech signals and it differs usually from person to person. However, current techniques of pitch detection are still not to a desired level of accuracy and robustness. When the signal is noisy or there are multiple pitch streams, the performance of most techniques degrades. In this paper, we propose a two-level approach for pitch detection which in compare with standard pitch detection algorithms, not only increases accuracy, but also makes the performance more robust to noise. In the first level of the proposed approach we discriminate voiced from unvoiced signals based on a neural classifier that utilizes cepstrum sequences of speech as an input feature set. Voiced signals are then further processed in the second level using a modified standard AMDF-based pitch detection algorithm to determine their pitch periods precisely. The experimental results show that the accuracy of the proposed system is better than those of conventional pitch detection algorithms for speech signals in clean and noisy environments.

  • PDF

Voice Activity Detection in Noisy Environment using Speech Energy Maximization and Silence Feature Normalization (음성 에너지 최대화와 묵음 특징 정규화를 이용한 잡음 환경에 강인한 음성 검출)

  • Ahn, Chan-Shik;Choi, Ki-Ho
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.169-174
    • /
    • 2013
  • Speech recognition, the problem of performance degradation is the difference between the model training and recognition environments. Silence features normalized using the method as a way to reduce the inconsistency of such an environment. Silence features normalized way of existing in the low signal-to-noise ratio. Increase the energy level of the silence interval for voice and non-voice classification accuracy due to the falling. There is a problem in the recognition performance is degraded. This paper proposed a robust speech detection method in noisy environments using a silence feature normalization and voice energy maximize. In the high signal-to-noise ratio for the proposed method was used to maximize the characteristics receive less characterized the effects of noise by the voice energy. Cepstral feature distribution of voice / non-voice characteristics in the low signal-to-noise ratio and improves the recognition performance. Result of the recognition experiment, recognition performance improved compared to the conventional method.

Performance Improvement of Speech Recognition Using Context and Usage Pattern Information (문맥 및 사용 패턴 정보를 이용한 음성인식의 성능 개선)

  • Song, Won-Moon;Kim, Myung-Won
    • The KIPS Transactions:PartB
    • /
    • v.13B no.5 s.108
    • /
    • pp.553-560
    • /
    • 2006
  • Speech recognition has recently been investigated to produce more reliable recognition results in a noisy environment, by integrating diverse sources of information into the result derivation-level or producing new results through post-processing the prior recognition results. In this paper we propose a method which uses the user's usage patterns and the context information in speech command recognition for personal mobile devices to improve the recognition accuracy in a noisy environment. Sequential usage (or speech) patterns prior to the current command spoken are used to adjust the base recognition results. For the context information, we use the relevance between the current function of the device in use and the spoken command. Our experiment results show that the proposed method achieves about 50% of error correction rate over the base recognition system. It demonstrates the feasibility of the proposed method.

Speech Enhancement Using the Adaptive Noise Canceling Technique with a Recursive Time Delay Estimator (재귀적 지연추정기를 갖는 적응잡음제거 기법을 이용한 음성개선)

  • 강해동;배근성
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.7
    • /
    • pp.33-41
    • /
    • 1994
  • A single channel adaptive noise canceling (ANC) technique with a recursive time delay estimator (RTDE) is presented for removing effects of additive noise on the speech signal. While the conventional method makes a reference signal for the adaptive filter using the pitch estimated on a frame basis from the input speech, the proposed method makes the reference signal using the delay estimated recursively on a sample-by-sample basis. As the RTDEs, the recursion formulae of autocorrelation function (ACF) and average magnitude difference function (AMDF) are derived. The normalized least mean square (NLMS) and recursive least square (RLS) algorithms are applied for adaptation of filter coefficients. Experimental results with noisy speech demonstrate that the proposed method improves the perceived speech quality as well as the signal-to-noise ratio and cepstral distance when compared with the conventional method.

  • PDF

Robustness of Bimodal Speech Recognition on Degradation of Lip Parameter Estimation Performance (음성인식에서 입술 파라미터 열화에 따른 견인성 연구)

  • Kim Jinyoung;Shin Dosung;Choi Seungho
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.205-208
    • /
    • 2002
  • Bimodal speech recognition based on lip reading has been studied as a representative method of speech recognition under noisy environments. There are three integration methods of speech and lip modalities as like direct identification, separate identification and dominant recording. In this paper we evaluate the robustness of lip reading methods under the assumption that lip parameters are estimated with errors. We show that the dominant recording approach is more robust than other methods with lip reading experiments. Also, a measure of lip parameter degradation is proposed. This measure can be used in the determination of weighting values of video information.

  • PDF

Spectral Subtraction Using Spectral Harmonics for Robust Speech Recognition in Car Environments

  • Beh, Jounghoon;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2E
    • /
    • pp.62-68
    • /
    • 2003
  • This paper addresses a novel noise-compensation scheme to solve the mismatch problem between training and testing condition for the automatic speech recognition (ASR) system, specifically in car environment. The conventional spectral subtraction schemes rely on the signal-to-noise ratio (SNR) such that attenuation is imposed on that part of the spectrum that appears to have low SNR, and accentuation is made on that part of high SNR. However, these schemes are based on the postulation that the power spectrum of noise is in general at the lower level in magnitude than that of speech. Therefore, while such postulation is adequate for high SNR environment, it is grossly inadequate for low SNR scenarios such as that of car environment. This paper proposes an efficient spectral subtraction scheme focused specifically to low SNR noisy environment by extracting harmonics distinctively in speech spectrum. Representative experiments confirm the superior performance of the proposed method over conventional methods. The experiments are conducted using car noise-corrupted utterances of Aurora2 corpus.

A Study on Design and Implementation of Embedded System for speech Recognition Process

  • Kim, Jung-Hoon;Kang, Sung-In;Ryu, Hong-Suk;Lee, Sang-Bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.201-206
    • /
    • 2004
  • This study attempted to develop a speech recognition module applied to a wheelchair for the physically handicapped. In the proposed speech recognition module, TMS320C32 was used as a main processor and Mel-Cepstrum 12 Order was applied to the pro-processor step to increase the recognition rate in a noisy environment. DTW (Dynamic Time Warping) was used and proven to be excellent output for the speaker-dependent recognition part. In order to utilize this algorithm more effectively, the reference data was compressed to 1/12 using vector quantization so as to decrease memory. In this paper, the necessary diverse technology (End-point detection, DMA processing, etc.) was managed so as to utilize the speech recognition system in real time

Denoising of Speech Signal Using Wavelet Transform (웨이브렛 변환을 이용한 음성신호의 잡음제거)

  • 한미경;배건성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.27-34
    • /
    • 2000
  • This paper deals with speech enhancement methods using the wavelet transform. A cycle-spinning scheme and undecimated wavelet transform are used for denoising of speech signals, and then their results are compared with that of the conventional wavelet transform. We apply soft-thresholding technique for removing additive background noise from noisy speech. The symlets 8-tap wavelet and pyramid algorithm are used for the wavelet transform. Performance assessments based on average SNR, cepstral distance and informal subjective listening test are carried out. Experimental results demonstrate that both cycle-spinning denoising(CSD) method and undecimated wavelet denoising(CWD) method outperform conventional wavelet denoising(UWD) method in objective performance measure as welt as subjective listening test. The two methods also show less "clicks" that usually appears in the neighborhood of signal discontinuities.

  • PDF

Robust Speech Recognition by Utilizing Class Histogram Equalization (클래스 히스토그램 등화 기법에 의한 강인한 음성 인식)

  • Suh, Yung-Joo;Kim, Hor-Rin;Lee, Yun-Keun
    • MALSORI
    • /
    • no.60
    • /
    • pp.145-164
    • /
    • 2006
  • This paper proposes class histogram equalization (CHEQ) to compensate noisy acoustic features for robust speech recognition. CHEQ aims to compensate for the acoustic mismatch between training and test speech recognition environments as well as to reduce the limitations of the conventional histogram equalization (HEQ). In contrast to HEQ, CHEQ adopts multiple class-specific distribution functions for training and test environments and equalizes the features by using their class-specific training and test distributions. According to the class-information extraction methods, CHEQ is further classified into two forms such as hard-CHEQ based on vector quantization and soft-CHEQ using the Gaussian mixture model. Experiments on the Aurora 2 database confirmed the effectiveness of CHEQ by producing a relative word error reduction of 61.17% over the baseline met-cepstral features and that of 19.62% over the conventional HEQ.

  • PDF