DOI QR코드

DOI QR Code

Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering

  • Received : 2015.11.18
  • Accepted : 2016.08.25
  • Published : 2016.12.01

Abstract

We propose a new bandpass filter (BPF)-based online channel normalization method to dynamically suppress channel distortion when the speech and channel noise components are unknown. In this method, an adaptive modulation frequency filter is used to perform channel normalization, whereas conventional modulation filtering methods apply the same filter form to each utterance. In this paper, we only normalize the two mel frequency cepstral coefficients (C0 and C1) with large dynamic ranges; the computational complexity is thus decreased, and channel normalization accuracy is improved. Additionally, to update the filter weights dynamically, we normalize the learning rates using the dimensional power of each frame. Our speech recognition experiments using the proposed BPF-based blind channel normalization method show that this approach effectively removes channel distortion and results in only a minor decline in accuracy when online channel normalization processing is used instead of batch processing

Keywords

References

  1. H.J. Song, Y.K. Lee, and H.S. Kim, "Probabilistic Bilinear Transformation Space-Based Joint Maximum a Posteriori Adaptation," ETRI J., vol. 34, no. 5, Oct. 2010, pp. 783-786. https://doi.org/10.4218/etrij.12.0212.0054
  2. S.J. Lee et al., "Intra-and Inter-frame Features for Automatic Speech Recognition," ETRI J., vol. 36, no. 3, June 2014, pp. 514-517. https://doi.org/10.4218/etrij.14.0213.0181
  3. H.-Y. Jung, "On-line Blind Channel Normalization for Noise-Robust Speech Recognition," IEIE Trans. Smart Process. Comput., vol. 1, no. 3, Dec. 2012, pp. 143-151.
  4. Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech Signal Process., vol. 32, no. 6, Dec. 1984, pp. 1109-1121. https://doi.org/10.1109/TASSP.1984.1164453
  5. S. Sigurdsson, K.B. Petersen, and T. Lehn-Schiole, "Mel Frequency Cepstral Coefficients: an Evaluation of Robustness of mp3 Encoded Music," Proc. Int. Conf. Music Inform. Retrieval, Victoria, Canada, Oct. 8-12, 2006.
  6. M.M. Rahman et al., "Performance Evaluation of CMN for Mel-LPC based Speech Recognition in Different Noisy Environments," Int. J. Comput. Appl., vol. 58, no. 10, 2012, pp. 6-10. https://doi.org/10.5120/9316-3548
  7. H. Hermansky and N. Morgan, "RASTA Processing of Speech," IEEE Trans. Speech Audio Process., vol. 2, no. 4, Oct.1994, pp. 578-589. https://doi.org/10.1109/89.326616
  8. H. You and A. Alwan, "Temporal Modulation Processing of Speech Signals for Noise Robust ASR," Auun. Conf. Int. Speech Commun. Associateion, Brighton, UK, Sept. 6-10, 2009, pp. 36-39.
  9. J.A. Cadzow, "Blind Deconvolution via Cumulant Extrema," IEEE Signal Process. Mag., vol. 13, no. 3, May 1993, pp. 24-42. https://doi.org/10.1109/79.489267
  10. A.J. Bell and T.J. Sejnowski, "An Information-Maximization Approach to Blind Separation and Blind Deconvolution," Neural Comput., vol. 7, no. 6, Apr. 1995, pp. 1129-1159. https://doi.org/10.1162/neco.1995.7.6.1129
  11. H.H. Yang and S. Amari, "Adaptive On-line Learning Algorithms for Blind Separation - Maximum Entropy and Minimum Mutual Information," Neural Comput., vol. 9, no. 7, 1997, pp. 1457-1482. https://doi.org/10.1162/neco.1997.9.7.1457
  12. P.C. Loizou, Speech enhancement, Boca Raton, FL, USA: CRC Press, 2007, pp. 97-289.
  13. Papoulis, Probability, Random Variables, and Stochastic Processes, Chicago IL, USA: McGraw-Hill, 1991.
  14. A.V. Oppenheim and R.W. Schaefer, Digital signal processing, Upper Saddle River, NJ, USA: Prentice-Hall, 1989.
  15. H. Shen, G. Liu, and J. Guo, "Two-Stage Model-based Feature Compensation for Robust Speech Recognition," Comput., vol. 94, no. 1, 2012, pp. 1-20. https://doi.org/10.1007/s00607-011-0152-1