• Title/Summary/Keyword: speech parameter

Search Result 373, Processing Time 0.027 seconds

An Improved Speech Absence Probability Estimation based on Environmental Noise Classification (환경잡음분류 기반의 향상된 음성부재확률 추정)

  • Son, Young-Ho;Park, Yun-Sik;An, Hong-Sub;Lee, Sang-Min
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.7
    • /
    • pp.383-389
    • /
    • 2011
  • In this paper, we propose a improved speech absence probability estimation algorithm by applying environmental noise classification for speech enhancement. The previous speech absence probability required to seek a priori probability of speech absence was derived by applying microphone input signal and the noise signal based on the estimated value of a posteriori SNR threshold. In this paper, the proposed algorithm estimates the speech absence probability using noise classification algorithm which is based on Gaussian mixture model in order to apply the optimal parameter each noise types, unlike the conventional fixed threshold and smoothing parameter. Performance of the proposed enhancement algorithm is evaluated by ITU-T P.862 PESQ (perceptual evaluation of speech quality) and composite measure under various noise environments. It is verified that the proposed algorithm yields better results compared to the conventional speech absence probability estimation algorithm.

Speech Parameters for the Robust Emotional Speech Recognition (감정에 강인한 음성 인식을 위한 음성 파라메터)

  • Kim, Weon-Goo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.12
    • /
    • pp.1137-1142
    • /
    • 2010
  • This paper studied the speech parameters less affected by the human emotion for the development of the robust speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient and frequency warped mel-cepstral coefficient were used as feature parameters. And CMS (Cepstral Mean Subtraction) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using vocal tract length normalized mel-cepstral coefficient, its derivatives and CMS as a signal bias removal showed the best performance of 0.78% word error rate. This corresponds to about a 50% word error reduction as compare to the performance of baseline system using mel-cepstral coefficient, its derivatives and CMS.

A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments (네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계)

  • Lee, Gil-Ho;Yoon, Jae-Sam;Oh, Yoo-Rhee;Kim, Hong-Kook
    • MALSORI
    • /
    • no.54
    • /
    • pp.27-43
    • /
    • 2005
  • Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

  • PDF

Separation of Periodic and Aperiodic Components of Pathological Speech Signal (장애음성의 주기성분과 잡음성분의 분리 방법에 관하여)

  • Jo Cheolwoo;Li Tao
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.25-28
    • /
    • 2003
  • The aim of this paper is to analyze the pathological voice by separating signal into periodic and aperiodic part. Separation was peformed recursively from the residual signal of voice signal. Based on initial estimation of aperiodic part of spectrum, aperiodic part is decided from the extrapolation method. Periodic part is decided by subtracting aperiodic part from the original spectrum. A parameter HNR is derived based on the separation. Parameter value statistics are compared with those of Jitter and Shimmer for normal, benign and malignant cases.

  • PDF

Adoption of Support Vector Machine and Independent Component Analysis for Implementation of Speech Recognizer (음성인식기 구현을 위한 SVM과 독립성분분석 기법의 적용)

  • 박정원;김평환;김창근;허강인
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2164-2167
    • /
    • 2003
  • In this paper we propose effective speech recognizer through recognition experiments for three feature parameters(PCA, ICA and MFCC) using SVM(Support Vector Machine) classifier In general, SVM is classification method which classify two class set by finding voluntary nonlinear boundary in vector space and possesses high classification performance under few training data number. In this paper we compare recognition result for each feature parameter and propose ICA feature as the most effective parameter

  • PDF

A study on Effective Feature Parameters Comparison for Speaker Recognition (화자인식에 효과적인 특징벡터에 관한 비교연구)

  • Park TaeSun;Kim Sang-Jin;Kwang Moon;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.145-148
    • /
    • 2003
  • In this paper, we carried out comparative study about various feature parameters for the effective speaker recognition such as LPC, LPCC, MFCC, Log Area Ratio, Reflection Coefficients, Inverse Sine, and Delta Parameter. We also adopted cepstral liftering and cepstral mean subtraction methods to check their usefulness. Our recognition system is HMM based one with 4 connected-Korean-digit speech database. Various experimental results will help to select the most effective parameter for speaker recognition.

  • PDF

A Study of Energy Parameter without Windowing Influence in Speech Signal (윈도우의 영향이 제거된 에너지 파라미터에 관한 연구)

  • 조태수;신동성;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2001.06d
    • /
    • pp.277-280
    • /
    • 2001
  • The preprocessing is very important course in speech signal processing. It influence the compression-rate in speech coding and the recognition-rate in speech recognition etc. In this paper, we propose that minimizing window-influence method with pitch period and start points. The proposed method is available for voiced detection and word labeling.

  • PDF

Vector Quantization by N-ary Search of a Codebook (코우드북의 절충탐색에 의한 벡터양자화)

  • Lee, Chang-Young
    • Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.143-148
    • /
    • 2001
  • We propose a new scheme for VQ codebook search. The procedure is in between the binary-tree-search and full-search and thus might be called N-ary search of a codebook. Through the experiment performed on 7200 frames spoken by 25 speakers, we confirmed that the best codewords as good as by the full-search were obtained at moderate time consumption comparable to the binary-tree-search. In application to speech recognition by HMM/VQ with Bakis model, where appearance of a specific codeword is essential in the parameter training phase, the method proposed here is expected to provide an efficient training procedure.

  • PDF

Voice Activity Detection Based on Signal Energy and Entropy-difference in Noisy Environments (엔트로피 차와 신호의 에너지에 기반한 잡음환경에서의 음성검출)

  • Ha, Dong-Gyung;Cho, Seok-Je;Jin, Gang-Gyoo;Shin, Ok-Keun
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.32 no.5
    • /
    • pp.768-774
    • /
    • 2008
  • In many areas of speech signal processing such as automatic speech recognition and packet based voice communication technique, VAD (voice activity detection) plays an important role in the performance of the overall system. In this paper, we present a new feature parameter for VAD which is the product of energy of the signal and the difference of two types of entropies. For this end, we first define a Mel filter-bank based entropy and calculate its difference from the conventional entropy in frequency domain. The difference is then multiplied by the spectral energy of the signal to yield the final feature parameter which we call PEED (product of energy and entropy difference). Through experiments. we could verify that the proposed VAD parameter is more efficient than the conventional spectral entropy based parameter in various SNRs and noisy environments.