Sequential Speaker Classification Using Quantized Generic Speaker Models

양자화 된 범용 화자모델을 이용한 연속적 화자분류

  • Kwon, Soon-Il (Division of Systems Technology, Korea Institute of Science and Technology)
  • 권순일 (한국과학기술연구원 시스템연구부)
  • Published : 2007.01.25


In sequential speaker classification, the lack of prior information about the speakers poses a challenge for model initialization. To address the challenge, a predetermined generic model set, called Sample Speaker Models, was previously proposed. This approach can be useful for accurate speaker modeling without requiring initial speaker data. However, an optimal method for sampling the models from a generic model pool is still required. To solve this problem, the Speaker Quantization method, motivated by vector quantization, is proposed. Experimental results showed that the new approach outperformed the random sampling approach with 25% relative improvement in error rate on switchboard telephone conversations.


  1. J. P. Campbell, 'Speaker recognition: A tutorial,' in Proc. of IEEE, Vol. 85, pp. 1436-1462, 1997
  2. T. M. Cover and J.~A. Thomas, 'Elements of Information Theory, Wiley Interscience, New York, pp. 18- 19, 1991
  3. M. Do, 'Fast Approximation of Kullback-Leibler Distance for Dependence Trees and Hidden Markov Models,' IEEE Signal Processing Letters, Vol. 10, pp. 115-118, 2003
  4. R.M. Gray and D. L. Neuhoff, 'Quantization,' IEEE Trans. on Information Theory, Vol. 44, pp. 2325-2383, 1998
  5. T. Hastie, H. R. Tibshirani and J. Friedman, 'The Elements of Statistical Learning,' Springer, New York, pp. 496-498, 2001
  6. R. V. Hogg and E. A. Tanis, 'Probability and Statistical Inference,' 6th ed. Prentice Hall, New Jersey, pp.85-102, 2001
  7. A. Jain, P. Moulin, M. I. Miller and K. Ramchandran, 'Information-Theoretic Bounds on Target Recognition Performance Based on Degraded Image Data,' IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 24, pp. 1153-1166, 2002
  8. T. Kinnunen, T. Kilpelainen and P. Franti, 'Comparison of Clustering Algorithms in Speaker Identification,' in Proc. of International Conf. of Signal Processing and Communications (SPC 2000), pp. 222-227, 2000
  9. S. Kwon and S. Narayanan, 'A Study of Generic Models for Unsupervised On-Line Speaker Indexing,' in Proc. of IEEE Automatic Speech Recognition and Understanding Workshop, pp. 423-428, St. Thomas, U.S. Virgin Islands, 2003
  10. S. Kwon and S. Narayanan, 'Speaker Model Quantization for Unsupervised Speaker Indexing,' in Proc. of International Conf. Spoken Language Processing, WeC2102p.18, Jeju, Korea, 2004
  11. S. Kwon and S. Narayanan, 'Unsupervised Speaker Indexing Using Generic Models,' IEEE Trans. on Speech and Audio Processing, Vol. 13, Issue 5, Part 2, pp.1004-1013, 2005
  12. M. Liu, E. Chang and B. Q. Dai, 'Hierarchical Gaussian Mixture Model for Speaker Verification,' in Proc. of International Conf. on Spoken Language Processing, Vol. 2, pp. 1353-1356, Denver, U.S.A., 2002
  13. L. Lu, H. J. Zhang and H. Jiang, 'Content Analysis for Audio Classification and Segmemtation,' IEEE Trans. on Speech and Audio Processing, Vol. 10, pp. 504-516, 2002
  14. M. Nishida and T. Kawahara, 'Unsupervised Speaker Indexing Using Speaker Model Selection Based on Bayesian Information Criterion,' in Proc. of IEEE International Conf. on Acoustics, Speech and Signal Processing, Vol. 1, pp. 172-175, Hong Kong, China, 2003
  15. J. Wu and E. Chang, 'Cohorts Based Custom Models for Rapid Speaker and Dialect Adaptation,' in Proc. of Eurospeech, pp. 1261-1264, Aalborg, Denmark, 2001
  16. T. Wu, L. Lu, K. Chen and H. Zhang, 'UBM-Based Real-Time Speaker Segmentation for Broadcasting News,' in Proc. of IEEE International Conf. on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 193-196, Hong Kong, China, 2003
  17. J. Yang, X. Zhu, R. Gross, J. Kominek, Y. Pan and A. Waibel, 'Multimodal People ID for a Multimedia Meeting Browser,' in Proc. of 7th ACM International Conf. on Multimedia, Part 1, pp. 159-168, 1999