A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments

네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계

  • 이길호 (광주과학기술원 정보통신공학과) ;
  • 윤재삼 (광주과학기술원 정보통신공학과) ;
  • 오유리 (광주과학기술원 정보통신공학과) ;
  • 김홍국 (광주과학기술원 정보통신공학과)
  • Published : 2005.06.01

Abstract

Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

Keywords