DOI QR코드

DOI QR Code

Histogram Equalization Using Centroids of Fuzzy C-Means of Background Speakers' Utterances for Majority Voting Based Speaker Identification

다수 투표 기반의 화자 식별을 위한 배경 화자 데이터의 퍼지 C-Means 중심을 이용한 히스토그램 등화기법

  • 김명재 (서울시립대학교 컴퓨터과학부) ;
  • 양일호 (서울시립대학교 컴퓨터과학부) ;
  • 유하진 (서울시립대학교 컴퓨터과학부)
  • Received : 2013.10.23
  • Accepted : 2013.11.22
  • Published : 2014.01.31

Abstract

In a previous work, we proposed a novel approach of histogram equalization using a supplement set which is composed of centroids of Fuzzy C-Means of the background utterances. The performance of the proposed method is affected by the size of the supplement set, but it is difficult to find the best size at the point of recognition. In this paper, we propose a histogram equalization using a supplement set for majority voting based speaker identification. The proposed method identifies test utterances using a majority voting on the histogram equalization methods with various sizes of supplement sets. The proposed method is compared with the conventional feature normalization methods such as CMN(Cepstral Mean Normalization), MVN(Mean and Variance Normalization), and HEQ(Histogram Equalization) and the histogram equalization method using a supplement set.

이전 연구에서 퍼지 C-Means의 중심 데이터로 이루어진 보조 데이터를 이용한 히스토그램 등화기법을 제안하였다. 보조 데이터를 이용한 히스토그램 등화기법은 사용하는 참조 집합의 크기에 따라 화자 식별 성능에 영향을 받는다. 그러나 인식 시점에서 최적의 파라미터를 찾기는 어렵다. 이 문제를 해결하기 위해 본 논문에서는 화자 식별을 위한 다수 투표 방식에 기반을 둔 보조 데이터를 이용한 히스토그램 등화기법을 제안한다. 다수 투표 기반의 제안한 방법은 여러 종류의 보조 데이터를 이용한 히스토그램 등화기법으로 입력 음성을 분류한다. 본 연구에서 제안한 방법을 CMN(Cepstral Mean Normalization), MVN(Mean and Variance Normalization), HEQ(Histogram Equalization)와 같은 기존의 특징 정규화 방법 및 보조 데이터를 이용한 히스토그램 등화기법과 비교한다.

Keywords

References

  1. B. Atal, "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification," J. Acoust. Soc. Am. 55, 1304 (1974). https://doi.org/10.1121/1.1914702
  2. O. Viikki and K. Laurila, "Cepstral domain segmental feature vector normalization for noise robust speech recognition," Speech Commun. 25, 133-147 (1998). https://doi.org/10.1016/S0167-6393(98)00033-8
  3. P. J. Moreno, B. Raj, and R. M. Stern, "A vector taylor series approach for environment-independent speech recognition," in Proc. IEEE Int. Conf. Acoust. Speech Signal Proc. 2, 733-736 (1996).
  4. N. S. Kim, "Statistical linear approximation for environment compensation," IEEE Signal Proc. Lett. 5, 8-10 (1998). https://doi.org/10.1109/97.654866
  5. R. C. Gonzalez and P. wintz, Digital Image Processing (Addision-Wesley Publishing Company, Reading, 1987), pp. 275-281.
  6. J. C. Segura, C. Benitez, A. De La Torre, A. J. Rubio, and J. Ramirez, "Cepstral domain segmental nonlinear feature transformations for robust speech recognition," IEEE Signal Proc. Lett. 11, 517-520 (2004). https://doi.org/10.1109/LSP.2004.826648
  7. A. de la Torre, A. M. Peinado, J. C. Segura, J. L. Perez-Cordoba, M. C. Benitez, and A. J. Rubio, "Histogram equalization of speech representation for robust speech recognition," IEEE Trans. Speech Audio Proc. 13, 355-366 (2005). https://doi.org/10.1109/TSA.2005.845805
  8. J. Pelecanos and S. Sridharan, "Feature warping for robust speaker verification," in Proc. Speaker Odyssey, 213-218 (2001).
  9. M. Skosan and D. Mashao, "Modified segmental histogram equalization for robust speaker verification," Pattern Recognit. Lett. 27, 479-486 (2006). https://doi.org/10.1016/j.patrec.2005.09.009
  10. M. Skosan and D. Mashao, "Matching feature distributions for robust speaker verification," in Proc. PRASA, 42-47 (2004).
  11. R. O. Duda and P. E. Hart, D. G. Stork, Pattern Classification (John Wiley & Sons, New York, 2012), pp. 528-530.
  12. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted gaussian mixture models," Digit. Signal Process. 10, 19-41 (2000). https://doi.org/10.1006/dspr.1999.0361
  13. M. Kim, I. Yang, and H. Yu, "Histogram Equalization Using Centroids of Fuzzy C-Means of Background Speakers' Utterances for Speaker Identification," in Proc. Stat. Lang. and Speech Proc. 1, 143-151 (2013).
  14. I. Oh, Pattern Recognition, (in korean, Kyobobook, Seoul, 2008), pp. 438-441.
  15. G.729 [online], http://www.itu.int/rec/T-REC-G.729-200701-S/en.
  16. SILK [online], https://developer.skype.com/silk.
  17. Speex [online], http://www.speex.org/.