• Title/Summary/Keyword: MFCC

Search Result 270, Processing Time 0.027 seconds

A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments (네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계)

  • Lee, Gil-Ho;Yoon, Jae-Sam;Oh, Yoo-Rhee;Kim, Hong-Kook
    • MALSORI
    • /
    • no.54
    • /
    • pp.27-43
    • /
    • 2005
  • Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

  • PDF

Extraction of MFCC feature parameters based on the PCA-optimized filter bank and Korean connected 4-digit telephone speech recognition (PCA-optimized 필터뱅크 기반의 MFCC 특징파라미터 추출 및 한국어 4연숫자 전화음성에 대한 인식실험)

  • 정성윤;김민성;손종목;배건성
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.279-283
    • /
    • 2004
  • In general, triangular shape filters are used in the filter bank when we extract MFCC feature parameters from the spectrum of the speech signal. A different approach, which uses specific filter shapes in the filter bank that are optimized to the spectrum of training speech data, is proposed by Lee et al. to improve the recognition rate. A principal component analysis method is used to get the optimized filter coefficients. Using a large amount of 4-digit telephone speech database, in this paper, we get the MFCCs based on the PCA-optimized filter bank and compare the recognition performance with conventional MFCCs and direct weighted filter bank based MFCCs. Experimental results have shown that the MFCC based on the PCA-optimized filter bank give slight improvement in recognition rate compared to the conventional MFCCs but fail to achieve better performance than the MFCCs based on the direct weighted filter bank analysis. Experimental results are discussed with our findings.

Features for Figure Speech Recognition in Noise Environment (잡음환경에서의 숫자음 인식을 위한 특징파라메타)

  • Lee, Jae-Ki;Koh, Si-Young;Lee, Kwang-Suk;Hur, Kang-In
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.473-476
    • /
    • 2005
  • This paper is proposed a robust various feature parameters in noise. Feature parameter MFCC(Mel Frequency Cepstral Coefficient) used in conventional speech recognition shows good performance. But, parameter transformed feature space that uses PCA(Principal Component Analysis)and ICA(Independent Component Analysis) that is algorithm transformed parameter MFCC's feature space that use in old for more robust performance in noise is compared with the conventional parameter MFCC's performance. The result shows more superior performance than parameter and MFCC that feature parameter transformed by the result ICA is transformed by PCA.

  • PDF

Utilization of Phase Information for Speech Recognition (음성 인식에서 위상 정보의 활용)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.9
    • /
    • pp.993-1000
    • /
    • 2015
  • Mel-Frequency Cepstral Coefficients(: MFCC) is one of the noble feature vectors for speech signal processing. An evident drawback in MFCC is that the phase information is lost by taking the magnitude of the Fourier transform. In this paper, we consider a method of utilizing the phase information by treating the magnitudes of real and imaginary components of FFT separately. By applying this method to speech recognition with FVQ/HMM, the speech recognition error rate is found to decrease compared to the conventional MFCC. By numerical analysis, we show also that the optimal value of MFCC components is 12 which come from 6 real and imaginary components of FFT each.

Speech/Mixed Content Signal Classification Based on GMM Using MFCC (MFCC를 이용한 GMM 기반의 음성/혼합 신호 분류)

  • Kim, Ji-Eun;Lee, In-Sung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.2
    • /
    • pp.185-192
    • /
    • 2013
  • In this paper, proposed to improve the performance of speech and mixed content signal classification using MFCC based on GMM probability model used for the MPEG USAC(Unified Speech and Audio Coding) standard. For effective pattern recognition, the Gaussian mixture model (GMM) probability model is used. For the optimal GMM parameter extraction, we use the expectation maximization (EM) algorithm. The proposed classification algorithm is divided into two significant parts. The first one extracts the optimal parameters for the GMM. The second distinguishes between speech and mixed content signals using MFCC feature parameters. The performance of the proposed classification algorithm shows better results compared to the conventionally implemented USAC scheme.

Comparison of feature parameters for emotion recognition using speech signal (음성 신호를 사용한 감정인식의 특징 파라메터 비교)

  • 김원구
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.5
    • /
    • pp.371-377
    • /
    • 2003
  • In this paper, comparison of feature parameters for emotion recognition using speech signal is studied. For this purpose, a corpus of emotional speech data recorded and classified according to the emotion using the subjective evaluation were used to make statical feature vectors such as average, standard deviation and maximum value of pitch and energy and phonetic feature such as MFCC parameters. In order to evaluate the performance of feature parameters speaker and context independent emotion recognition system was constructed to make experiment. In the experiments, pitch, energy parameters and their derivatives were used as a prosodic information and MFCC parameters and its derivative were used as phonetic information. Experimental results using vector quantization based emotion recognition system showed that recognition system using MFCC parameter and its derivative showed better performance than that using the pitch and energy parameters.

Speaker Independent Recognition Algorithm based on Parameter Extraction by MFCC applied Wiener Filter Method (위너필터법이 적용된 MFCC의 파라미터 추출에 기초한 화자독립 인식알고리즘)

  • Choi, Jae-Seung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.6
    • /
    • pp.1149-1154
    • /
    • 2017
  • To obtain good recognition performance of speech recognition system under background noise, it is very important to select appropriate feature parameters of speech. The feature parameter used in this paper is Mel frequency cepstral coefficient (MFCC) with the human auditory characteristics applied to Wiener filter method. That is, the feature parameter proposed in this paper is a new method to extract the parameter of clean speech signal after removing background noise. The proposed method implements the speaker recognition by inputting the proposed modified MFCC feature parameter into a multi-layer perceptron network. In this experiments, the speaker independent recognition experiments were performed using the MFCC feature parameter of the 14th order. The average recognition rates of the speaker independent in the case of the noisy speech added white noise are 94.48%, which is an effective result. Comparing the proposed method with the existing methods, the performance of the proposed speaker recognition is improved by using the modified MFCC feature parameter.

Whale Sound Reconstruction using MFCC and L2-norm Minimization (MFCC와 L2-norm 최소화를 이용한 고래소리의 재생)

  • Chong, Ui-Pil;Jeon, Seo-Yun;Hong, Jeong-Pil;Jo, Se-Hyung
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.19 no.4
    • /
    • pp.147-152
    • /
    • 2018
  • Underwater transient signals are complex, variable and nonlinear, resulting in a difficulty in accurate modeling with reference patterns. We analyze one type of underwater transient signals, in the form of whale sounds, using the MFCC(Mel-Frequency Cepstral Constant) and synthesize them from the MFCC and the weighted $L_2$-norm minimization techniques. The whales in this experiments are Humpback whales, Right whales, Blue whales, Gray whales, Minke whales. The 20th MFCC coefficients are extracted from the original signals using the MATLAB programming and reconstructed using the weighted $L_2$-norm minimization with the inverse MFCC. Finally, we could find the optimum weighted factor, 3~4 for reconstruction of whale sounds.

Analysis of Feature Parameter Variation for Korean Digit Telephone Speech according to Channel Distortion and Recognition Experiment (한국어 숫자음 전화음성의 채널왜곡에 따른 특징파라미터의 변이 분석 및 인식실험)

  • Jung Sung-Yun;Son Jong-Mok;Kim Min-Sung;Bae Keun-Sung
    • MALSORI
    • /
    • no.43
    • /
    • pp.179-188
    • /
    • 2002
  • Improving the recognition performance of connected digit telephone speech still remains a problem to be solved. As a basic study for it, this paper analyzes the variation of feature parameters of Korean digit telephone speech according to channel distortion. As a feature parameter for analysis and recognition MFCC is used. To analyze the effect of telephone channel distortion depending on each call, MFCCs are first obtained from the connected digit telephone speech for each phoneme included in the Korean digit. Then CMN, RTCN, and RASTA are applied to the MFCC as channel compensation techniques. Using the feature parameters of MFCC, MFCC+CMN, MFCC+RTCN, and MFCC+RASTA, variances of phonemes are analyzed and recognition experiments are done for each case. Experimental results are discussed with our findings and discussions

  • PDF

Voice Recognition Based on Adaptive MFCC and Neural Network (적응 MFCC와 Neural Network 기반의 음성인식법)

  • Bae, Hyun-Soo;Lee, Suk-Gyu
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.5 no.2
    • /
    • pp.57-66
    • /
    • 2010
  • In this paper, we propose an enhanced voice recognition algorithm using adaptive MFCC(Mel Frequency Cepstral Coefficients) and neural network. Though it is very important to extract voice data from the raw data to enhance the voice recognition ratio, conventional algorithms are subject to deteriorating voice data when they eliminate noise within special frequency band. Differently from the conventional MFCC, the proposed algorithm imposed bigger weights to some specified frequency regions and unoverlapped filterbank to enhance the recognition ratio without deteriorating voice data. In simulation results, the proposed algorithm shows better performance comparing with MFCC since it is robust to variation of the environment.