Improvements in Speaker Adaptation Using Weighted Training

가중 훈련을 이용한 화자 적응 시스템의 향상

  • 장규철 (한국과학기술원 전자전산학과) ;
  • 우수영 (한국과학기술원 전자전산학과) ;
  • 진민호 (한국과학기술원 전자전산학과) ;
  • 박용규 (한국과학기술원 전자전산학과) ;
  • 유창동 (한국과학기술원 전자전산학과)
  • Published : 2003.04.01

Abstract

Regardless of the distribution of the adaptation data in the testing environment, model-based adaptation methods that have so far been reported in various literature incorporates the adaptation data undiscriminatingly in reducing the mismatch between the training and testing environments. When the amount of data is small and the parameter tying is extensive, adaptation based on outlier data can be detrimental to the performance of the recognizer. The distribution of the adaptation data plays a critical role on the adaptation performance. In order to maximally improve the recognition rate in the testing environment using only a small number of adaptation data, supervised weighted training is applied to the structural maximum a posterior (SMAP) algorithm. We evaluate the performance of the proposed weighted SMAP (WSMAP) and SMAP on TIDIGITS corpus. The proposed WSMAP has been found to perform better for a small amount of data. The general idea of incorporating the distribution of the adaptation data is applicable to other adaptation algorithms.

이전의 여러 가지 화자 적응을 위한 모델 적응 방법은 훈련 환경과 테스트 환경의 불일치를 보상하기 위한 방법으로 적응데이터의 테스트 환경에서의 분포를 고려하지 않은 보상 방법이었다. 적은 적응 데이터에 대해서 보상을 극대화하기 위한 파라미터 변환 방법들은 고르지 못한 적응 데이터에 의해 시스템의 성능이 저하 될 가능성이 있다 즉, 데이터가 적을 경우에는 적응 데이터의 분포가 적응 결과에 중대한 영향을 미치게 된다. 적은 데이터에 대해서도 높은 인식률 향상을 가져오기 위한 supervised 훈련과정을 구조적 사후확률 최대화(SMAP: Structural Maximum a Posterior) 알고리듬에 적용하였다. 제안된 가중치 SMAP (Weighted SMAP) 알고리듬과 SMAP알고리듬을 TIDIGITS 코퍼스를 사용해서 비교해 보았다. 제안된 WSMAP은 적은 양의 데이터에 대해서 SMAP보다 좋은 성능을 나타내었다. 환경 적응에 적응 데이터의 분포를 고려하는 이와 같은 방법은 다른 적응 알고리듬에도 적용될 수 있다.

Keywords

References

  1. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1-257- 1-260 Efficient joint compensation of speech for the effects of additive noise and linear filtering F.H.Liu;A.Acero;R.Stern
  2. IEEE Trans. Speech Audio Processing v.2 Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains Q.Huo;C.H.Lee
  3. IEEE Trans. Acoust. Speech, Signal Processing v.37 Unsupervised speaker adaptation method based on hierarchical spectral clustering S.Furui
  4. Comput. Speech Lang. v.9 Maximum likelihood linear regression for speaker adaptation of continuous-density hidden markov models C.J.Leggetter;P.C.Woodland
  5. Proc. ICASSP-95 Vector-field smoothed Bayesian learning for incremental speaker adaptation J.I.Takahashi;S.Sagayama
  6. Proc. Workshop Robust Methods for Speech Recognition in Adverse Conditions Hidden Markov model adaptation using maximum a posteriori linear regression O.Siohan;C.Chesta;C.H.Lee
  7. Proc. IEEE Workshop Speech Recognition Understanding Structural MAP speaker adaptation using hierachical priors K.Shinoda;C.H.Lee
  8. Speech Commun. v.25 On stochastic feature and model compensation approaches to robust speech recognition C.H.Lee
  9. IEEE Trans. Speech and Audio Processing v.7 no.1 Selective training for hidden markov models with applications to speech classification L.M.Arsian;J.H.L.Hansen
  10. Bull. Amer. Math. Soc v.73 An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology L.E.Baum;J.A.Eagon
  11. IEEE Trans. Signal Processing v.40 Discriminative learning for minimum error classification B.H.Juang;S.Katagirl
  12. ICASSP v.3 A database for speaker-independent digit recognition R.G.Leonard