DOI QR코드

DOI QR Code

Recognition Performance Improvement of Unsupervised Limabeam Algorithm using Post Filtering Technique

  • Nguyen, Dinh Cuong (Department of information and communication engineering, Yeungnam university) ;
  • Choi, Suk-Nam (Department of information and communication engineering, Yeungnam university) ;
  • Chung, Hyun-Yeol (Department of the information and communication, Yeungnam University)
  • Received : 2013.02.19
  • Accepted : 2013.04.18
  • Published : 2013.08.31

Abstract

Abstract- In distant-talking environments, speech recognition performance degrades significantly due to noise and reverberation. Recent work of Michael L. Selzer shows that in microphone array speech recognition, the word error rate can be significantly reduced by adapting the beamformer weights to generate a sequence of features which maximizes the likelihood of the correct hypothesis. In this approach, called Likelihood Maximizing Beamforming algorithm (Limabeam), one of the method to implement this Limabeam is an UnSupervised Limabeam(USL) that can improve recognition performance in any situation of environment. From our investigation for this USL, we could see that because the performance of optimization depends strongly on the transcription output of the first recognition step, the output become unstable and this may lead lower performance. In order to improve recognition performance of USL, some post-filter techniques can be employed to obtain more correct transcription output of the first step. In this work, as a post-filtering technique for first recognition step of USL, we propose to add a Wiener-Filter combined with Feature Weighted Malahanobis Distance to improve recognition performance. We also suggest an alternative way to implement Limabeam algorithm for Hidden Markov Network (HM-Net) speech recognizer for efficient implementation. Speech recognition experiments performed in real distant-talking environment confirm the efficacy of Limabeam algorithm in HM-Net speech recognition system and also confirm the improved performance by the proposed method.

Acknowledgement

Supported by : Yeungnam University

References

  1. M.F. Font, "Multi-microphone signal processing for automatic speech recognition in meeting rooms," Master thesis, Berkeley, California, 2005.
  2. P.J. Moreno, "Speech recognition in noisy environments," Doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA, 1996.
  3. F.H. Liu, "Environmental adaptation for robust speech recognition," Doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA, 1994.
  4. A. Acero, "Acoustical and environmental robustness in automatic speech recognition," Doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA, 1990.
  5. L.J. Griffiths, C.W. Jim, "An alternative approach to linearly constrained adaptive beamforming," IEEE Transaction on Antennas and Propagation, Vol. AP-30, No. 1, pp.27-34, 1982.
  6. M. Seltzer, "Microphone Array Processing for Robust Speech Recognition," Doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA, 2003.
  7. P. Placeway, S. Chen, M. Eskenazi, U. Jain, V. Parikh, B. Raj, M. Ravishankar, R. Ronsenfeld, K. Seymore, M. Siegler, R. Stern, E. Thayer, "The 1996 hub-4 sphinx-3 system," Proceedings on the DARPA Speech Recognition workshop, Vol. 1, pp.243-252, 1997.
  8. S.J. Oh, C.J. Hwang, H.Y. Jung, H.Y. Chung, "A study on statistical language models for large vocabulary continuous speech recognition system," Proceedings on ICSP, Vol. 1, pp.113-119, 1999.
  9. W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, "Numerical Recipes in C: The Art of Scientific Computin," New York: Cambridge University Press, 1998.
  10. L. Rabiner, B.H. Juang, "Fundamentals of Speech Recognition," New Jersey: Prentice Hall, 1993.
  11. A.J. Viterbi, "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm," IEEE Transactions on Information Theory, Vol. 13, No. 2, pp.260-269, 1967. https://doi.org/10.1109/TIT.1967.1054010
  12. P. Morento, B. Raj, R.M. Stern, "A Unified Approach for Robust Speech Recognition," Proceedings of Eurospeech, Vol. 1, pp.481-485, 1995.
  13. R. Zenlinski, "A microphone array with adaptive post-filtering for noise reduction in reverberant room," Proceedings on International Conference of Acoustics, Speech, and Signal Processing, Vol. 5, pp.2578-2581, 1988.
  14. I.A. McCowan, "Robust speech recognition using microphone arrays," Doctoral dissertation, Queen land University of Technology, Australia, 2001.
  15. C.H. Knapp, C. Carter, "The generalized correlation method for estimation of time delay," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 24, No. 4, pp.320-327, 1976. https://doi.org/10.1109/TASSP.1976.1162830
  16. D.C. Nguyen, H.Y. Chung, "Performance Improvement of Microphone Array Speech Recognition using Feature Weighted Mahalanobis Distance," The Journal of the Acoustical Society of Korea, Vol. 29, No. 1E, pp.45-53, 2010.
  17. N.D. Cuong, S. Guanghu, J.H. Youl, C.H. Yeol, "Performace improvement of speech recognition system using microphone array," Proceedings on IEEE International Conference of Research, Innovation and Vision for the Future, pp.91-95, 2008.
  18. L. Brayda, C. Wellekens, M. Omologo, "N-Best Parallel Maximum Likelihood Beamformers for Robust Speech Recognition," Proceedings of European Signal Processing Conference, 2006.
  19. K. Bahram, B. Hamidreza, R. Farbod, "Improvement in speech recognition using phone-based filter and sum parameter optimuization," IEICE Electronics Express, Vol. 6, No. 8, pp.437-442, 2009. https://doi.org/10.1587/elex.6.437