A Spectral Compensation Method for Noise Robust Speech Recognition

잡음에 강인한 음성인식을 위한 스펙트럼 보상 방법

  • 조정호 (동서울대학교 디지털전자과)
  • Received : 2012.01.25
  • Accepted : 2012.05.30
  • Published : 2012.06.25

Abstract

One of the problems on the application of the speech recognition system in the real world is the degradation of the performance by acoustical distortions. The most important source of acoustical distortion is the additive noise. This paper describes a spectral compensation technique based on a spectral peak enhancement scheme followed by an efficient noise subtraction scheme for noise robust speech recognition. The proposed methods emphasize the formant structure and compensate the spectral tilt of the speech spectrum while maintaining broad-bandwidth spectral components. The recognition experiments was conducted using noisy speech corrupted by white Gaussian noise, car noise, babble noise or subway noise. The new technique reduced the average error rate slightly under high SNR(Signal to Noise Ratio) environment, and significantly reduced the average error rate by 1/2 under low SNR(10 dB) environment when compared with the case of without spectral compensations.

음성 인식 시스템의 용용에서 실제 문제점의 하나는 음성신호의 왜곡에 의한 인식성능의 저하이다. 음성신호의 왜곡에 가장 중요한 원인은 부가적인 잡음이다. 이 논문은 잡음에 강인한 음성인식을 위하여, 스펙트럼 피크 향상 기법과 효과적인 잡음 차감 기법에 기초한 스펙트럼 보상 방법을 기술한다. 제안한 방법은 음성 스펙트럼의 포먼트 구조를 향상시키고 스펙트럼 기울기를 보상하면서도 광 대역폭 스펙트럼 요소는 그대로 유지한다. 백색 가우스 잡음, 자동차 잡음, 음성 잡음 또는 지하철 잡음에 의해 왜곡된 음성을 이용한 인식실험을 수행한 결과, 새로운 방법은 스펙트럼 보상을 하지 않은 경우에 비해, 높은 SNR(Signal to Noise Ratio) 환경에서는 평균 오인식율을 약간 줄였으며, 낮은 SNR(10 dB) 환경에서는 평균 오인식율을 1/2로 크게 줄였다.

Keywords

References

  1. H. Bourlard and S. Dupont, A new ASR approach based on independent processing and recombination of partial frequency bands, ICSLP'96, Philadelphia, vol. 1,pp. 426-429, Oct. 1996.
  2. H. Hermansky, Perceptual linear prediction (PLP) analysis of speech, Proc. JASA, pp. 1738-1752, Apr. 1990.
  3. H. Hermansky, RASTA processing of speech, IEEE Trans. Speech Audio processing, vol. 2, pp. 578-589, Oct. 1994. https://doi.org/10.1109/89.326616
  4. M. J. Gales, S. Young, Robust speech recognition using parallel model combination, IEEE Trans. Speech Audio processing, vol. 4. pp. 352-359, Sep. 1996. https://doi.org/10.1109/89.536929
  5. P. J. Moreno, B. Raj and R. M. Stern, A vector Taylor series approach for environmentindependent speech recognition, ICASSP'96, Atlanta, vol. 2, pp. 733-736, May, 1996.
  6. S. V. Vaseghi and B. P. Milner, Noise compensation methods for hidden Markov model speech recognition in adverse environments, IEEE Trans. Speech and Audio Processing, vol. 5, No. 1, pp. 11-21, Jan. 1997. https://doi.org/10.1109/89.554264
  7. D. C. Popescu and I. Zeljkovic, Kalman filtering of colored noise for speech enhancement, ICSLP'96, Philadelphia, vol. 1, pp.426-429, Oct. 1996.
  8. S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979.
  9. D. Naik, Pole-filtered cepstral mean subtraction, ICSLP'95, Detroit, vol. 1, pp. 157-160, May, 1995.
  10. J. Chen, K. K. Paliwal and S. Nakamura, Sub-band based additive noise removal for robust speech recognition, Proc. Eurospeech, pp. 70-73, 2001.
  11. L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, 1978.
  12. A. Acero and R. M. Stern, Environmental robustness in automatic speech recognition, Proc. ICASSP90, pp. 849-852, Apr. 1990.
  13. A. Oppenheim and D. Johnson, Discrete representation of signals, Pro. of IEEE, vol. 60, no. 6, pp. 681-691, June, 1972. https://doi.org/10.1109/PROC.1972.8727
  14. P. A. Regalia, S. K. Mitra and P. P. Vaidyanathan, The digital all-pass filter: A versatile signal processing building block, Pro. of IEEE, vol. 76, no. 1, pp. 19-37, Jan. 1988. https://doi.org/10.1109/5.3286