DOI QR코드

DOI QR Code

A Speaker Pruning Method for Real-Time Speaker Identification System

  • 투고 : 2014.10.14
  • 심사 : 2015.01.15
  • 발행 : 2015.04.30

초록

It has been known that GMM (Gaussian Mixture Model) based speaker identification systems using ML (Maximum Likelihood) and WMR (Weighting Model Rank) demonstrate very high performances. However, such systems are not so effective under practical environments, in terms of real time processing, because of their high calculation costs. In this paper, we propose a new speaker-pruning algorithm that effectively reduces the calculation cost. In this algorithm, we select 20% of speaker models having higher likelihood with a part of input speech and apply MWMR (Modified Weighted Model Rank) to these selected speaker models to find out identified speaker. To verify the effectiveness of the proposed algorithm, we performed speaker identification experiments using TIMIT database. The proposed method shows more than 60% improvement of reduced processing time than the conventional GMM based system with no pruning, while maintaining the recognition accuracy.

키워드

참고문헌

  1. S. Furui, "Speaker-dependent feature extraction, recognition and processing techniques," Speech Communication, Vol. 10, No. 5-6 pp. 505-520, 1991. https://doi.org/10.1016/0167-6393(91)90054-W
  2. T. Matsui, S. Furui, "Comparison of text independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs," Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 157-160, 1992.
  3. D.A. Reynolds, R.C. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," IEEE Trans. Speech and Audio Processing, Vol. 3, No. 1, pp. 72-83, 1995. https://doi.org/10.1109/89.365379
  4. D.A. Reynolds, "Speaker identification and verification using Gaussian mixture speaker models," Speech Communication, Vol. 17, No. 1-2, pp. 91-108, 1995. https://doi.org/10.1016/0167-6393(95)00009-D
  5. M.J. Kim, J.H. Jeong "A realization of injurious moving picture filtering system with Gaussian mixture model and frame-level likelihood estimation", Journal of Korean Institute of Intelligent System, Vol 23, No 2, pp. 184-189, 2013 (in Korean). https://doi.org/10.5391/JKIIS.2013.23.2.184
  6. K. Markov, S. Nakagawa, "Text-independent speaker identification on TIMIT database," Proceedings of Acoust. Soc. Jap., Vol. 1, pp. 83-84, 1995.
  7. K. Markov, S. Nakagawa, "Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture Models," Proceedings of International Conference on Speech and Language Processing, pp. 1764-1767, 1996.
  8. M.J. Kim, S.J. Oh, H.Y. Jung, H Y. Chung, "Frame selection, hybrid, modified weighting model rank method for robust text-independent speaker identification," J. Acoust. Soc. Kor., Vol. 21, No. 8, pp. 735-743, 2002 (in Korean).
  9. M.J. Kim, S.J. Oh, S.Y. Suk, H.Y. Jung, H.Y. Chung, "Modified weighting model rank method for improving the performance of real-time text-independent speaker recognition system," Proceedings of Acous. Soc. Kor., pp. 107-110, 2002 (in Korean).
  10. V. Zue, S. Seneff, J. Glass, "Speech database development at MIT: TIMIT and beyond," Speech Communication, Vol. 9, No. 4, pp. 351-356, 1990. https://doi.org/10.1016/0167-6393(90)90010-7
  11. D.A. Reynolds, "Large population speaker identification using clean and telephone speech," IEEE Signal Processing Letters, Vol. 2, No. 3, pp. 46-48, 1995. https://doi.org/10.1109/97.372913