• Title/Summary/Keyword: 음성검출

Search Result 725, Processing Time 0.026 seconds

Robust Distributed Speech Recognition under noise environment using MESS and EH-VAD (멀티밴드 스펙트럼 차감법과 엔트로피 하모닉을 이용한 잡음환경에 강인한 분산음성인식)

  • Choi, Gab-Keun;Kim, Soon-Hyob
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.1
    • /
    • pp.101-107
    • /
    • 2011
  • The background noises and distortions by channel are major factors that disturb the practical use of speech recognition. Usually, noise reduce the performance of speech recognition system DSR(Distributed Speech Recognition) based speech recognition also bas difficulty of improving performance for this reason. Therefore, to improve DSR-based speech recognition under noisy environment, this paper proposes a method which detects accurate speech region to extract accurate features. The proposed method distinguish speech and noise by using entropy and detection of spectral energy of speech. The speech detection by the spectral energy of speech shows good performance under relatively high SNR(SNR 15dB). But when the noise environment varies, the threshold between speech and noise also varies, and speech detection performance reduces under low SNR(SNR 0dB) environment. The proposed method uses the spectral entropy and harmonics of speech for better speech detection. Also, the performance of AFE is increased by precise speech detections. According to the result of experiment, the proposed method shows better recognition performance under noise environment.

Statistical Voice Activity Defector Based on Signal Subspace Model (신호 준공간 모델에 기반한 통계적 음성 검출기)

  • Ryu, Kwang-Chun;Kim, Dong-Kook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.7
    • /
    • pp.372-378
    • /
    • 2008
  • Voice activity detectors (VAD) are important in wireless communication and speech signal processing, In the conventional VAD methods, an expression for the likelihood ratio test (LRT) based on statistical models is derived in discrete Fourier transform (DFT) domain, Then, speech or noise is decided by comparing the value of the expression with a threshold, This paper presents a new statistical VAD method based on a signal subspace approach, The probabilistic principal component analysis (PPCA) is employed to obtain a signal subspace model that incorporates probabilistic model of noisy signal to the signal subspace method, The proposed approach provides a novel decision rule based on LRT in the signal subspace domain, Experimental results show that the proposed signal subspace model based VAD method outperforms those based on the widely used Gaussian distribution in DFT domain.

Voice Activity Detection Based on SVM Classifier Using Likelihood Ratio Feature Vector (우도비 특징 벡터를 이용한 SVM 기반의 음성 검출기)

  • Jo, Q-Haing;Kang, Sang-Ki;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.397-402
    • /
    • 2007
  • In this paper, we apply a support vector machine(SVM) that incorporates an optimized nonlinear decision rule over different sets of feature vectors to improve the performance of statistical model-based voice activity detection(VAD). Conventional method performs VAD through setting up statistical models for each case of speech absence and presence assumption and comparing the geometric mean of the likelihood ratio (LR) for the individual frequency band extracted from input signal with the given threshold. We propose a novel VAD technique based on SVM by treating the LRs computed in each frequency bin as the elements of feature vector to minimize classification error probability instead of the conventional decision rule using geometric mean. As a result of experiments, the performance of SVM-based VAD using the proposed feature has shown better results compared with those of reported VADs in various noise environments.

Voice Activity Detection Algorithm base on Radial Basis Function Networks with Dual Threshold (Radial Basis Function Networks를 이용한 이중 임계값 방식의 음성구간 검출기)

  • Kim Hong lk;Park Sung Kwon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.12C
    • /
    • pp.1660-1668
    • /
    • 2004
  • This paper proposes a Voice Activity Detection (VAD) algorithm based on Radial Basis Function (RBF) network using dual threshold. The k-means clustering and Least Mean Square (LMS) algorithm are used to upade the RBF network to the underlying speech condition. The inputs for RBF are the three parameters in a Code Exited Linear Prediction (CELP) coder, which works stably under various background noise levels. Dual hangover threshold applies in BRF-VAD for reducing error, because threshold value has trade off effect in VAD decision. The experimental result show that the proposed VAD algorithm achieves better performance than G.729 Annex B at any noise level.

Speaker Change Detection by Removing Phonetic Information (음성학적 정보의 제거를 통한 화자변화 구간 검출)

  • Park Sun Young;Kim Hyung Soon
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.187-190
    • /
    • 2002
  • 본 논문에서는 음성 신호에서 발성 화자가 바뀌는 시점을 자동적으로 찾아내는 화자변화 구간 검출에 대하여 연구하였다. 화자변화 검출을 위해서는 음성 신호에 나타나는 화자 개별성에 의한 차이만 비교해야 하는데 실제 환경에서는 화자들이 동일한 내용의 발성을 하지 않으므로 다른 발성내용에 의한 정보가 포함되어 검출 성능을 저하시킨다. 그러므로 각 화자의 개별특성만 강조되도록 발성내용에 포함된 음성학적 정보의 영향을 제거하는 방법을 통해 검출 성능을 향상시켰다.

  • PDF

On a Pitch Point Detection by Preserving the Phase Component of the Autocorrelation Function (자기상관함수에서 위상 성분의 보존에 의한 피치 시점 검출에 관한 연구)

  • 함명규;최성영;박종철;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.799-802
    • /
    • 2000
  • 음성신호처리 분야에서 음성신호의 기본 주파수를 정확히 검출 할 수 있다면 음성인식을 할 때 화자에 따른 영향을 줄일 수 있으므로 인식의 정확도를 높일 수 있고, 음성합성을 할 때 자연성과 개성을 쉽게 변경하거나 유지할 수 있다. 또한 분석을 할 때 피치에 동기시켜 분석하면 성문의 영향이 제거된 정확한 성도 파라미터를 얻을 수 있다. 위와 같은 피치검출의 중요성 때문에 피치검출에 대하여 다양한 방법 이 제안되었다〔1〕. 본 논문에서는 음성신호의 분석 시 불안정한 구간에 대해 피치 시점을 검출하는 방법을 연구하였다. 음성신호의 분석에 있어서 기존의 자기상관함수법(Autocorrelation Function)은 주기성을 강조할 수 있다는 장점을 가지고 있다. 그러나 자기상관함수는 위상성분을 보존하지 못한다는 단점을 가지고 있다. 따라서, 자기상관함수를 사용하면서 위상성분을 보존할 수 있는 알고리즘을 제안하고자 한다. 실험결과 피치시점을 수동으로 찾은 경우와 비교하였을 때 약 98% 정도의 정확도를 얻을 수 있었다. 위의 결과와 같이 위상 성분이 보존된 자기상관함수를 사용할 경우 음성합성, 코딩, 인식에서 유용하게 쓰일 수 있다.

  • PDF

Robust Endpoint Detection for Bimodal System in Noisy Environments (잡음환경에서의 바이모달 시스템을 위한 견실한 끝점검출)

  • 오현화;권홍석;손종목;진성일;배건성
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.5
    • /
    • pp.289-297
    • /
    • 2003
  • The performance of a bimodal system is affected by the accuracy of the endpoint detection from the input signal as well as the performance of the speech recognition or lipreading system. In this paper, we propose the endpoint detection method which detects the endpoints from the audio and video signal respectively and utilizes the signal to-noise ratio (SNR) estimated from the input audio signal to select the reliable endpoints to the acoustic noise. In other words, the endpoints are detected from the audio signal under the high SNR and from the video signal under the low SNR. Experimental results show that the bimodal system using the proposed endpoint detector achieves satisfactory recognition rates, especially when the acoustic environment is quite noisy.

Voice Activity Detection Based on Non-negative Matrix Factorization (비음수 행렬 인수분해 기반의 음성검출 알고리즘)

  • Kang, Sang-Ick;Chang, Joon-Hyuk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.8C
    • /
    • pp.661-666
    • /
    • 2010
  • In this paper, we apply a likelihood ratio test (LRT) to a non-negative matrix factorization (NMF) based voice activity detection (VAD) to find optimal threshold. In our approach, the NMF based VAD is expressed as Euclidean distance between noise basis vector and input basis vector which are extracted through NMF. The optimal threshold each of noise environments depend on NMF results distribution in noise region which is estimated statistical model-based VAD. According to the experimental results, the proposed approach is found to be effective for statistical model-based VAD using LRT.

Discriminative Weight Training for a Statistical Model-Based Voice Activity Detection (통계적 모델 기반의 음성 검출기를 위한 변별적 가중치 학습)

  • Kang, Sang-Ick;Jo, Q-Haing;Park, Seung-Seop;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.5
    • /
    • pp.194-198
    • /
    • 2007
  • In this paper, we apply a discriminative weight training to a statistical model-based voice activity detection(VAD). In our approach, the VAD decision rule is expressed as the geometric mean of optimally weighted likelihood ratios(LRs) based on a minimum classification error(MCE) method which is different from the previous works in that different weights are assigned to each frequency bin which is considered more realistic. According to the experimental results, the proposed approach is found to be effective for the statistical model-based VAD using the LR test.

Reduction of Background Noise using FFT cepstrum (FFT 켑스트럼을 사용한 배경잡음의 제거)

  • Choi, Jae-Seung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.264-267
    • /
    • 2010
  • 본 논문에서는 오차역전파 학습 알고리즘을 사용하여 신경회로망을 학습시켜, 각 프레임에서의 음성 및 잡음 구간의 검출에 의한 음성인식 알고리즘을 제안한다. 그리고 신경회로망에 의하여 음성 및 잡음 구간의 검출에 따라서 각 프레임에서 잡음을 제거하는 스펙트럼 차감법을 제안한다. 본 실험에서는 원음성에 백색잡음 및 자동차잡음을 부가하여 음성인식의 인식율을 평가한다. 또한 인식시스템에 의하여 검출된 음성 및 잡음 구간을 이용하여 각 프레임에서의 스펙트럼 차감법에 의한 잡음제거의 실험결과를 나타낸다.

  • PDF