HMM-Based Bandwidth Extension Using Baum-Welch Re-Estimation Algorithm

Baum-Welch 학습법을 이용한 HMM 기반 대역폭 확장법

  • Published : 2007.08.31

Abstract

This paper contributes to an improvement of the statistical bandwidth extension(BWE) system based on Hidden Markov Model(HMM). First, the existing HMM training method for BWE, which is suggested originally by Jax, is analyzed in comparison with the general Baum-Welch training method. Next, based on this analysis, a new HMM-based BWE method is suggested which adopts the Baum-Welch re-estimation algorithm instead of the Jax's to train HMM model. Conclusionally speaking, the Baum-Welch re-estimation algorithm is a generalized form of the Jax's training method. It is flexible and adaptive in modeling the statistical characteristic of training data. Therefore, it generates a better model to the training data, which results in an enhanced BWE system. According to experimental results, the new method performs much better than the Jax's BWE systemin all cases. Under the given test conditions, the RMS log spectral distortion(LSD) scores were improved ranged from 0.31dB to 0.8dB, and 0.52dB in average.

본 논문에서는 HMM 기반 통계적인 대역폭 확장(Bandwidth Extension, BWE) 방법의 개선에 대해 다룬다. 이를 위해 우선, HMM 모델 학습을 위한 기존의 Jax의 학습법과 일반적인 Baum-Welch 학습법의 관계를 비교 검토하고, Jax의 학습법의 한계점 및 문제점을 검토한다. 그리고 이를 바탕으로 Baum-Welch학습법을 이용한 새로운 HMM 기반 BWE 방법을 제시한다. 결론적으로, Baum-Welch 학습법은 Jax의 학습법의 일반화된 형태로 볼 수 있으며, 보다 유연하고 적응적인 학습능력을 가진 알고리즘임을 알 수 있다. 따라서 학습 데이터에 대한 보다 정확한 HMM 모델링이 가능하며 아울러, 이와 같이 개선된 HMM 모델을 활용함으로써 BWE 시스템의 성능향상을 가져 올 수 있었다. 실험결과에 의하면, 제시된 새로운 방법이 기존의 Jax의 방법에 비해 실험의 모든 경우에서 우수한 성능을 보임을 알 수 있다. 주어진 실험조건하에서 근제곱평균(root-mean-square, RMS) 로그 스펙트럴 왜곡(Log Spectral Distortion, LSD) 값이 전체적으로 평균 0.52dB 그리고, 최소 0.31dB에서 최대 0.8dB까지 개선되었다.

Keywords

References

  1. N. Enbom and W. B. Kleijn, 'Bandwidth expansion of speech based on vector quantization of the Mel frequency cepstral coefficients,' IEEE Workshop on Speech Coding, 171-173, June 1999
  2. K. -Y Park and H. S. Kim, 'Narrowband to Wideband Conversion of Speech Using GMM Based Transformation,' ICASSP 3, 1843-1846, June 2000
  3. S. Chennoukh, A. Gerrits, and R. Sluijter. 'Speech Enhancement via Frequency Bandwidth Extension Using Line Spectral Frequencies,' ICASSP 1, 665-668, May 2001
  4. S. Jaisirrha and I. Y. Soon, 'Bandwidth Extension of Narrow Band Speech Using Cepstral Linear Prediction,' Joint Conference of the Fourth International Conference on Multimedia 3, 1404-1407, Dec. 2003
  5. P. Jax and P. Vary, 'Wideband extension of telephone speech using a hidden Markov model,' IEEE Workshop on Speech Coding, 133-135, Sept. 2000
  6. P. Jax and P. Vary, 'On artificial bandwidth extension of telephone speech,' Signal Processing 83 (8), 1707-1719, Aug. 2003
  7. P. Jax and P. Vary, 'Artificial Bandwidth Extension of Speech Signals Using MMSE Estimation Based on a Hidden Markov Model,' ICASSP 1, 680-683, April 2003
  8. L. R. Rabiner, 'A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,' Proceedings of the IEEE 77 (2), 257-286, Feb. 1989
  9. Y. Linde, A. Buzo, R. M. Gray, 'An algorithm for vector quantizer design,' IEEE Trans. Commun. 28 (1), 84-95, 1980 https://doi.org/10.1109/TCOM.1980.1094577
  10. T. K. Moon, 'The expectation-maximization algorithm,' IEEE Signal Process. Mag 13 (6), 47-60, Nov. 1996 https://doi.org/10.1109/79.543975
  11. Wei-shou Hsu, Robust bandwidth Extension of narrowband speech, M.A. thesis, McGill Univ., Dept. of Electrical & Computer Engineering, 26-29, Nov. 2004
  12. J. S. Garofolo, L. F. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, 'DARPA-TIMlT: Acoustic-Phonetic Continuous Speech Corpus,' 1990
  13. M. Nilsson, S. V. Andersen, and W. B. Kleijn, 'On the mutual information between frequency bands in speech,' ICASSP 3, 1327-1330, June 2000
  14. M. Nilsson, H. Gustafsson, S, V, Andersen, and W. B, Kleijn, 'Gaussian mixture model based mutual information estimation between frequency bands in speech,' ICASSP 1, 525-528, June 2002
  15. Y. Agiomyriannakis and Y. Stylianou, 'Combined estimation/ coding of highband spectral envelopes for speech spectrum expansion,' ICASSP 1, 496-472, May 2004