A Study on the Speaker Adaptation in CDHMM

CDHMM의 화자적응에 관한 연구

  • Kim, Gwang-Tae (Dept.of Electronics Electric Engineering, Sangmyung University)
  • 김광태 (상주대학교 전자전기공학부)
  • Published : 2002.03.01

Abstract

A new approach to improve the speaker adaptation algorithm by means of the variable number of observation density functions for CDHMM speech recognizer has been proposed. The proposed method uses the observation density function with more than one mixture in each state to represent speech characteristics in detail. The number of mixtures in each state is determined by the number of frames and the determinant of the variance, respectively. The each MAP Parameter is extracted in every mixture determined by these two methods. In addition, the state segmentation method requiring speaker adaptation can segment the adapting speech more Precisely by using speaker-independent model trained from sufficient database as a priori knowledge. And the state duration distribution is used lot adapting the speech duration information owing to speaker's utterance habit and speed. The recognition rate of the proposed methods are significantly higher than that of the conventional method using one mixture in each state.

본 논문에서는 CDHMM 음성인식기의 인식성능을 향상시키기 위해 상태 당 관측밀도함수 수 변화에 의한 화자적응 알고리듬을 제안하였다. 제안한 방법은 CDHMM의 각 상태마다 관측 확률밀도함수의 가지 수가 두 개 이상이 릴 수도 있게 하여 발음특성의 다양성을 반영할 수 있게 하였다. 가지 수는 각 상태에 속하는 적응음성의 프레임 수에 따라 정하는 방법과 특징벡터 행렬식에 따라 정하는 방법으로 하였다 이두 방법중의 어느 하나로 관측 확률밀도함수의 가지가 결정되면, 세분화된 각 가지로부터 MAP 파라미터를 추출함으로써 정밀한 화자적응모델의 파라미터를 구할 수 있었다. 아울러 적응음성을 상태분할 할 때 기존의 화자독립모델을 사전정보로 이용함으로써 ML 추정시의 초기 상태분할 오류의 영향을 줄여 기존 상태분 할 방법의 단점을 개선하였다 그리고 상태지속분포를 화자에 적응시킴으로써 화자 고유의 발음속도와 발음 패턴 등의 음성특성을 흡수하도록 하였다. 제안한 방법들의 타당성을 확인하기 위한 실험에서 제안한 방법이 기존 방법에 비해 높은 인식률을 얻음을 확인하였다.

Keywords

References

  1. B. S. Atal and L. R. Rabiner, 'Speech research directions,' AT&T Tech. J., vol. 65, pp. 75-88, Sep.-Oct., 1986 https://doi.org/10.1002/j.1538-7305.1986.tb00381.x
  2. H. Sakoe and S. Chiba, 'Dynamic programming algorithm optimization for spoken word recognition,' IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-26, pp, 43-49, Feb. 1978
  3. 'A neural network approach to speech recognition,' Tech. Report, ETRI. Advanced Research Dept., Mar. 1990
  4. L. R. Rabiner and B. H. Juang, 'An introduction to hidden Markov models,' IEEE ASSP Mag., pp. 4-16, Jan. 1986
  5. K. F. Lee and R. Rebby, Automatic speech recognition, Kluwer Academic, 1989
  6. C. H. Lee, C. H. Lin, and B. H. Juang, 'A study on speaker adaptation of the parameters of continuous density hidden Markov models,' IEEE Trans. on Signal Processing, vol. 39, no. 4, pp. 803-814, Apr. 1991 https://doi.org/10.1109/78.80902
  7. V. V. Digalakis, D. Rtischev, and L. G. Neumeyer, 'Speaker adaptation using cons-trained estimation of Gaussian mixtures,' IEEE Trans. on Speech and Audio Processing, vol. 3, no. 5, pp. 357-365, Sep, 1995 https://doi.org/10.1109/89.466659
  8. Y. Shiraki and M. Honda, 'Speaker adaptation algorithms for segment vocoder,' IEICE, vol. SP87-67, pp. 49-56, Oct. 1987
  9. S. Furui, 'Unsupervised speaker adaptation method based on hierarchical spectral clus-tering,' Proc. ICASSP89, pp. 286-289, May 1989 https://doi.org/10.1109/ICASSP.1989.266421
  10. Y. Hao and D. Fang, 'Speech recognition using speaker adaptation by system parameter transformation,' IEEE Trans. on Speech and Audio Processing, vol. 2, no. 1, pp. 63-68, Jan. 1994 https://doi.org/10.1109/89.260335
  11. K. Shikano, K. F. Lee, and R. Reddy, 'Speaker adaptation through vector quantization,' Proc. ICASSP86, pp. 2643-2646, Apr. 1986
  12. R. M. Stern and M. J. Lasry, 'Dynamic speaker adaptation for feature-based isolated word recognition,' IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, no. 6, June 1987
  13. P. F. Brown, C. H. Lee, and J. C. Spohrer, 'Bayesian adaptation in speech recognition,' Proc. ICASSP83, pp. 761-764, Apr. 1983
  14. J. Chien, H. Wang, and C. Lee, 'Improved Bayesian learning of hidden Markov models for speaker adaptation,' Proc. ICASSP97, pp. 1027-1030, Apr. 1997 https://doi.org/10.1109/ICASSP.1997.596115
  15. K. Ohkura, M. Sugiyama, and S. Sagayama, 'Speaker adaptation based on transfer vector field smoothing with continuous mixture density HMMs,' Proc. ICSLP, pp. 369-372, 1992
  16. J. Takahashi and S. Sagayama, 'Telephone line characteristic adaptation using vector field smoothing technique,' Proc. ICSLP, pp. 991-994, 1994
  17. S. Cox, 'Predictive speaker adaptation in speech recognition', Computer Speech and Language, vol. 9, pp. 1-17, 1995 https://doi.org/10.1006/csla.1995.0001
  18. V. Nagesha and L. Gillick, 'Studies in transfor-mation-based adaptation,' Proc. ICASSP97, pp. 1031-1034, Apr. 1997
  19. 김광태, 서정일, 홍재근, 'CDHMM의 상태당 가지수를 가변시키는 화자적응에 관한 연구,' 대한전자공학회 논문집, 제35C권 제3호, pp. 166-175, 1998
  20. Proc. ICSLP Speaker adaptation based on transfer vector field smoothing with continuous mixture density HMMs K.Ohkura;M.Sugiyama;S.Sagayama
  21. Proc. ICSLP Telephone line characteristics adaptation using vector field smoothing technique J.Takahashi;S.Sagayama
  22. Computer Speech and Language v.9 Predictive speaker adaptation in speech recongnition S.Cox https://doi.org/10.1006/csla.1995.0001
  23. Proc. ICASSP97 Studies in transformation-based adaptation V.Nagesha;L.Gillick
  24. 대한전자공학회 논문집 v.35C no.3 CDHMM의 상태당 가지수를 가변시키는 화자적응에 관한 연구 김광태;서정일;홍재근