DOI QR코드

DOI QR Code

Preprocessing Technique for Improvement of Speech Recognition in a Car

차량에서의 음성인식율 향상을 위한 전처리 기법

  • 김현태 (동의대학교 멀티미디어공학과) ;
  • 박장식 (동의과학대학 디지털정보전자과)
  • Published : 2009.01.28

Abstract

This paper addresses a modified spectral subtraction schemes which is suitable to speech recognition under low signal-to-noise ratio (SNR) noisy environment such as the automatic speech recognition (ASR) system in car. The conventional spectral subtraction schemes rely on the SNR such that attenuation is imposed on that part of the spectrum that appears to have low SNR, and accentuation is made on that part of high SNR. However, such postulation is adequate for high SNR environment, it is grossly inadequate for low SNR scenarios such as that of car environment. Proposed methods focused specifically to low SNR noisy environment by using weighting function for enhancing speech dominant region in speech spectrum. Experimental results by using voice commands for car show the superior performance of the proposed method over conventional methods.

본 논문에서는 차량에서의 자동 음성인식 시스템과 같이 신호대잡음비가 낮은 잡음 환경에서의 음성인식에 적합한 변형된 스펙트럼 차감법을 제안한다. 기존의 스펙트럼 차감법은 스펙트럼에서 낮은 신호대 잡음비(SNR)를 갖는 부분은 감쇄되고, 신호대잡음비가 높은 부분은 강조되는 신호대잡음비에 의존한다. 그러나 이러한 구성은 높은 신호대잡음비를 갖는 환경에서는 적절하나 차량 환경과 같이 낮은 신호대잡음비를 나타내는 환경에서는 매우 부적절하다. 제안하는 방법은 낮은 신호대잡음비를 갖는 잡음 환경을 위해 음성우세영역을 강조하여 불필요하게 음성영역이 과차감되지 않도록 방지한다. 차량용 음성명령어 어휘를 대상으로 한 실험 결과에서 제안하는 방법이 기존의 방법에 비해 우수한 것을 확인하였다.

Keywords

References

  1. 김남수, "잡음 환경에서의 음성인식", Telecommunications Review, 제13권, 제5호, pp.650-661, 2003.
  2. M. K. Hasan, S. Salahuddin, and M. R. Khan, "A Modified A Priori SNR for Speech Enhancement Using Spectral Subtraction Rules," IEEE Signal Processing Letters, Vol.11, No.4, pp.450-453, 2004(4). https://doi.org/10.1109/LSP.2004.824017
  3. A. Blin, S. Araki, and S. Makino, "Underdetermined Blind Separation of Convolutive Mixtures of Speech Using Time-Frequency Mask and Mixing Matrix Estimation," IEICE Transactions on Fundamentals of Electronics, Vol.E88-A, pp.1693-1700, 2005. https://doi.org/10.1093/ietfec/e88-a.7.1693
  4. J. Jensen and J. Hansen, "Speech Enhancement Using a Con-strained Iterative Sinusoidal Model," IEEE Transactions on Speech and Audio Processing, Vol.9, No.7, pp.731-740, 2001(10). https://doi.org/10.1109/89.952491
  5. D. Ealey, H. Kellher, and D. Pearce, "Harmonic tunneling: track-ing non-stationary noises during speech,"Eurospeech, pp.437-440, 2001.
  6. M. Berouti, R. Schwartz, and J. Makhoul, "Enhancement of speech corrupted by additive noise," Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing, pp.208-211, 1979(4).
  7. N. Virag, "Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System," IEEE Transactions on Speech and Audio Processing, Vol.7, No.2, pp.126-137, 1999(3). https://doi.org/10.1109/89.748118
  8. P. Lockwood and J. Boudy, "Experiments with a Nonlinear Spectral Subtractor(NSS), Hidden Markov Models and the pro-jection, for robust speech recognition in cars," Speech Communication, Vol.11, pp.215-228, 1992. https://doi.org/10.1016/0167-6393(92)90016-Z
  9. B. Jounghoon and K. Hanseok, "Spectral Subtraction Using Spectral Harmonics for Robust Speech Recognition in Car Environments," ICCS2003, LNCS Vol.2660, pp.1109-1116, 2003.
  10. W. Hess, Pitch Determination of Speech Signals, Springer-Verlag Berlin Heidelberg New York Tokyo 1983.
  11. L. Rabiner and R. Schafer, Digital Processing of Speech Signals, Prentice-Hall 1978.
  12. R. W. Aldhaheri and F. E. Al-Saadi, "Text-Independent Speaker Identification in Noisy Environment Using Singular Value Decomposition," ICICS-PCM, pp.1624-1627, 2003(12).