Noisy Speech Recognition Based on Noise-Adapted HMMs Using Speech Feature Compensation

  • Chung, Yong-Joo (Department of Electronics Engineering, Keimyung University)
  • Received : 2014.03.08
  • Accepted : 2014.05.02
  • Published : 2014.04.30

Abstract

The vector Taylor series (VTS) based method usually employs clean speech Hidden Markov Models (HMMs) when compensating speech feature vectors or adapting the parameters of trained HMMs. It is well-known that noisy speech HMMs trained by the Multi-condition TRaining (MTR) and the Multi-Model-based Speech Recognition framework (MMSR) method perform better than the clean speech HMM in noisy speech recognition. In this paper, we propose a method to use the noise-adapted HMMs in the VTS-based speech feature compensation method. We derived a novel mathematical relation between the train and the test noisy speech feature vector in the log-spectrum domain and the VTS is used to estimate the statistics of the test noisy speech. An iterative EM algorithm is used to estimate train noisy speech from the test noisy speech along with noise parameters. The proposed method was applied to the noise-adapted HMMs trained by the MTR and MMSR and could reduce the relative word error rate significantly in the noisy speech recognition experiments on the Aurora 2 database.

Keywords

References

  1. S.F Ball, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process. Vol. 27, No. 2, pp. 113-120, 1979. https://doi.org/10.1109/TASSP.1979.1163209
  2. M.J.F Gales, Model based techniques for noise-robust speech recognition, Ph.D. Dissertation, University of Cambridge, 1996.
  3. W Kim, J.H.L Hansen, "Feature compensation in the cepstral domain employing model combination," Speech Communication, Vol. 51, No. 2, pp. 83-96, 2009.
  4. P.J Moreno, Speech Recognition in noisy environments, Ph.D. Dissertation, Carnegie Mellon University, 1996.
  5. D.Y Kim, C.K Un, N.S Kim, "Speech recognition in noisy environments using first-order vector Taylor series." Speech Communication, Vol. 24, No. 1, pp. 39-49, 1998. https://doi.org/10.1016/S0167-6393(97)00061-7
  6. H.G Hirsch, D Pearce, "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions." in Proceedings of the International Conference on Spoken Language Processing, Bejing, China, pp. 18-20, 2000.
  7. H Xu, Z.H Tan, P Dalsgaard, B Lindberg, "Robust speech recognition on noise and SNR classification -a multiple-model framework." in Proceedings of INTERSPEECH, Lisboa, Portugal, pp. 977-980, 2005.
  8. H Xu, X.H Tan, P Dalsgaard, B Lindberg, "Noise condition dependent training based on noise classification and SNR estimation." IEEE Trans. Audio, Speech, Language Process. Vol. 15, No. 8, pp. 2431-2443, 2007. https://doi.org/10.1109/TASL.2007.906188
  9. Y. Chung and J.H.L. Hansen, "Compensation of SNR and noise type mismatch using an environmental sniffing based speech recognition solution," EURASIP Journal on Audio, Speech, and Music Processing, 2013:12, (2013), pp. 1-14, 2013. https://doi.org/10.1186/1687-4722-2013-1
  10. ETSI draft standard doc., Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithm. ETSI Standard ES 202 050, 2002
  11. S Young, HTK: Hidden Markov Model Toolkit V3.4.1. Cambridge Univ. Eng. Dept. Speech Group, 1993.