Performance Comparison of GMM and HMM Approaches for Bandwidth Extension of Speech Signals

음성신호의 대역폭 확장을 위한 GMM 방법 및 HMM 방법의 성능평가

  • Published : 2008.04.30

Abstract

This paper analyzes the relationship between two representative statistical methods for bandwidth extension (BWE): Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) ones, and compares their performances. The HMM method is a memory-based system which was developed to take advantage of the inter-frame dependency of speech signals. Therefore, it could be expected to estimate better the transitional information of the original spectra from frame to frame. To verify it, a dynamic measure that is an approximation of the 1st-order derivative of spectral function over time was introduced in addition to a static measure. The comparison result shows that the two methods are similar in the static measure, while, in the dynamic measure, the HMM method outperforms explicitly the GMM one. Moreover, this difference increases in proportion to the number of states of HMM model. This indicates that the HMM method would be more appropriate at least for the 'blind BWE' problem. On the other hand, nevertheless, the GMM method could be treated as a preferable alternative of the HMM one in some applications where the static performance and algorithm complexity are critical.

본 논문에서는 대역폭 확장 (Bandwidth Extension, BWE)을 위한 대표적인 통계적 방법인 가우스 혼합 모델 (Gaussian Mixture Model, GMM) 방법과 은닉마코프 모델 (Hidden Markov Model, HMM) 방법의 관계를 분석하고 성능을 비교한다. HMM 방법은 GMM 방법과 달리 기억능력을 가진 시스템으로서 인접한 음성 프레임간의 상관성을 모델링하고 이를 BWE 시스템에 활용한다는 장점을 가진다. 따라서 원래 신호의 프레임간 스펙트럼 변화특성을 보다 잘 추정할 수 있으리라 예상할 수 있다. 이 점을 확인하기 위해 정적 측도 외에 음성 스펙트럼의 일차 도 함수와 관련된 동적 측도를 적용하였다. 성능평가 결과, 정적 측도 관점에서는 두 방법은 대등한 성능을 보였지만 동적 측도 관점에서는 HMM 방법이 우수한 성능을 보였다. 또한 이러한 차이는 HMM 모델의 상태 수에 비례하여 증가함을 확인할 수 있었다. 이와 같은 실험결과는 HMM 방법이 적어도 'blind BWE' 문제에 있어서 적절한 해법임을 시사한다. 한편, 동적 측도의 관점에서는 비록 열세로 나타났지만 GMM 방법은 상대적으로 단순하다는 장점을 가지고 있으며 특히, 정적 측도에 있어서 HMM 방법과 대등하다는 사실은 응용분야에 따라서는 HMM 방법의 효과적인 대안이 될 수 있음을 시사한다.

Keywords

References

  1. Y. Nakatoh, M. Tsushima, and T. Norimatsu, "Generation of broadband speech from narrowband speech using piecewise linear mapping," Proc. European Conf. Speech Commun., Technol., 1643-1646, Sept. 1997
  2. S. Chennoukh, A. Gerrits, and R. Sluijter, "Speech enhancement via frequency bandwidth extension using line spectral frequencies," ICASSP 1, 665-668, May 2001
  3. N. Enbom and W. B. Kleijn, "Bandwidth expansion of speech based on vector quantization of the Mel frequency cepstral coefficients," IEEE Workshop on Speech Coding, 171-173, June 1999
  4. K. -Y. Park and H. S. Kim, "Narrowband to wideband conversion of speech using GMM based transformation," ICASSP 3, 1843-1846, June 2000
  5. P. Jax and P. Vary, "Wideband extension of telephone speech using a hidden Markov model," IEEE Workshop on Speech Coding, 133-135, Sept. 2000
  6. P. Jax and P. Vary, "On artificial bandwidth extension of telephone speech," Signal Processing 83(8), 1707-1719, Aug. 2003 https://doi.org/10.1016/S0165-1684(03)00082-3
  7. P. Jax and P. Vary, "Artificial bandwidth extension of speech signals using MMSE estimation based on a Hidden Markov Model," ICASSP 1, 680-683, April 2003
  8. G. Chen and V. Parsa, "HMM-Based frequency bandwidth extension for speech enhancement using line spectra frequencies," ICASSP 1, 17-21, May 2004
  9. E. Larson and R. M. Aarts, Audio Bandwidth Extension (John Wiley & Sons, Ltd., 2004), Chap. 6, 226-235
  10. J. A. Bilmes, "A gentle tutorial of the EM Algorithm and its application to parameter estimation for Gaussian Mixture and Hidden Markov Models," U. C. Berkely, TR-97-021, April. 1998
  11. L. R. Rabiner, "A tutorial on Hidden Markov Models and selected applications in speech recognition," Proceedings of the IEEE 77(2), 257-286, Feb. 1989
  12. 송근배, 김석호, "Baum-Welch 학습법을 이용한 HMM 기반 대역폭 확장법", 한국음향학회지, 26(6), 207-213, 2007
  13. A. Rao and K. Rose, "Deterministically annealed design of Hidden Markov Model speech recognizers," IEEE Trans. Speech, Audio Processing 9(6), 111-126, Feb. 2001 https://doi.org/10.1109/89.902278
  14. Y. Linde, A. Buzo, and R.M. Gray, "An algorithm for vector quantizer design," IEEE Trans. Commun. 28(1), 84-95, 1980 https://doi.org/10.1109/TCOM.1980.1094577
  15. Wei-shou Hsu, "Robust bandwidth extension of narrowband speech," M.A. thesis, McGill Univ., Dept. of Electrical & Computer Engineering, 26-29, Nov. 2004
  16. J. S. Garofolo, L. F. Fisher, J. G. Fiscus, D. S. Pallett,and N. L. Dahlgren, DARPA-TIMIT: Acoustic-Phonetic Continuous Speech Corpus, (1990)
  17. H. P. Knagenhjelm and W. B. Kleijn, "Spectral dynamics is more important than spectral distortion," ICASSP 1, 665-668, May 1995
  18. F. Norden and T. Eriksson, "A speech spectral distortion measure with interframe memory," ICASSP 1, 665-668, May 2001
  19. F. Norden and T. Eriksson, "Time evolution in LPC spectrum coding," IEEE Trans. Speech, Audio Processing 12(3), 290-301, May 2004 https://doi.org/10.1109/TSA.2004.825664
  20. F. K. Soong and A. E. Rosenberg, "On the use of instantaneous and transitional spectral information in speaker recognition," IEEE Trans. Acoust., Speech, Signal Processing 36(6), 871-879, June 1998 https://doi.org/10.1109/29.1598
  21. B. Geiser and P. Vary, "Backwards compatiblewideband telephony in mobile networks: CELP watermarking and bandwidth extension," ICASSP 4, 533-536, April 2007