DOI QR코드

DOI QR Code

Text Independent Speaker Verficiation Using Dominant State Information of HMM-UBM

HMM-UBM의 주 상태 정보를 이용한 음성 기반 문맥 독립 화자 검증

  • Received : 2014.12.24
  • Accepted : 2015.01.29
  • Published : 2015.03.31

Abstract

We present a speaker verification method by extracting i-vectors based on dominant state information of Hidden Markov Model (HMM) - Universal Background Model (UBM). Ergodic HMM is used for estimating UBM so that various characteristic of individual speaker can be effectively classified. Unlike Gaussian Mixture Model(GMM)-UBM based speaker verification system, the proposed system obtains i-vectors corresponding to each HMM state. Among them, the i-vector for feature is selected by extracting it from the specific state containing dominant state information. Relevant experiments are conducted for validating the proposed system performance using the National Institute of Standards and Technology (NIST) 2008 Speaker Recognition Evaluation (SRE) database. As a result, 12 % improvement is attained in terms of equal error rate.

본 논문에서는 Hidden Markov Model(HMM) - Universal Background Model(UBM)의 주 상태 정보 기반의 i-vector 추출 기술을 제안한다. Ergodic HMM이 UBM을 추정하는데 쓰였으며, 이를 통해 동일 화자 음성에도 다양하게 존재하는 특성을 HMM states로 분류할 수 있다. 제안한 방법을 이용하면 HMM의 state 개수에 따라 i-vector 들이 추출되는데, 주 상태 정보 방법을 통해 이들 중 하나를 선택한다. 제안한 방법을 검증하기 위해 National Institute of Standards and Technology(NIST) Speaker Recognition Evaluation(SRE) database를 이용하여 실험을 하였으며, Equal Error Rate(EER) 성능 수치에서 12 %의 성능 향상을 확인할 수 있었다.

Keywords

References

  1. N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans. on Audio, Speech, and Lang. Process. 19, 788-798 (2011). https://doi.org/10.1109/TASL.2010.2064307
  2. P. Kenny, "Bayesian speaker verification with heavy tailed priors," Odyssey Speker and Language Recognition Workshop, Brno, Czech Republic, (2010).
  3. T. H. Kwon and H. S. Ko, "Performance improvement in speech recognition by weighting HMM likelihood" (in Korean), J. Acoust. Soc. Kr. 22, 145-152, 2003.
  4. D. a. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital Signal Processing 10, 19-41 (2000). https://doi.org/10.1006/dspr.1999.0361
  5. A. Poritz, "Linear predictive hidden Markov models and the speech signal," ICASSP, 1291-1294 (1982).
  6. N. Z. Tishby, "On the application of mixture AR hidden markov models to text independent speaker recognition," IEEE Transactions on Signal Processing 39, 563-570 (1991). https://doi.org/10.1109/78.80876
  7. T. Matsui, and S. Furui, "Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMM's," IEEE Transactions on Speech and Audio Processing 2, 1992-1995 (1994).
  8. M. F. BenZeghiba, and H. Bourlard, "User-customized password speaker verification using multiple reference and background models," Speech Communication 48, 1200-1213 (2006). https://doi.org/10.1016/j.specom.2005.08.008
  9. R. Gajsek, F. Mihelic, and S. Dobrisek, "Speaker state recognition using an HMM-based feature extraction method," Computer Speech & Language 27, 135-150 (2013). https://doi.org/10.1016/j.csl.2012.01.007
  10. P. Kenny, "Joint factor analysis of speaker and session variability: Theory and algorithms," CRIM, Montreal, (Report) CRIM-06/08-13, 1-17 (2005).
  11. P. Kenny, G. Boulianne, and P. Dumouchel, "Eigenvoice modeling with sparse training data," IEEE Transactions on Speech and Audio Processing 13, 345-354 (2005). https://doi.org/10.1109/TSA.2004.840940
  12. L. R. Rabiner, "A tutorial on hidden markov-models and selected applications in speech recognition," Proceedings of the Ieee 77, 257-286 (1989). https://doi.org/10.1109/5.18626
  13. J. Pelecanos, and S. Sridharan, "Feature warping for robust speaker verification," Interspeech, 213-218 (2001).
  14. D. Garcia-romero, and C. Y. Espy-wilson, "Analysis of i-vector length normalization in speaker recognition systems.," Interspeech, 249-252 (2011).