Statistical Extraction of Speech Features Using Independent Component Analysis and Its Application to Speaker Identification

  • Jang, Gil-Jin (Department of Computer Science, Korea Advanced Institute of Science and Technology) ;
  • Oh, Yung-Hwan (Department of Computer Science, Korea Advanced Institute of Science and Technology)
  • Published : 2002.12.01

Abstract

We apply independent component analysis (ICA) for extracting an optimal basis to the problem of finding efficient features for representing speech signals of a given speaker The speech segments are assumed to be generated by a linear combination of the basis functions, thus the distribution of speech segments of a speaker is modeled by adapting the basis functions so that each source component is statistically independent. The learned basis functions are oriented and localized in both space and frequency, bearing a resemblance to Gabor wavelets. These features are speaker dependent characteristics and to assess their efficiency we performed speaker identification experiments and compared our results with the conventional Fourier-basis. Our results show that the proposed method is more efficient than the conventional Fourier-based features in that they can obtain a higher speaker identification rate.

Keywords

References

  1. R. J. Mammone, X. Zhang, and R. P. Ramachandran, 'Robust speaker recognition: a feature-based approach,' IEEE signal processing magazaine, 58-71, 9, 1996
  2. P. Comon, 'Independent component analysis, A new concept?,' Signal Processing, 36, 287-314, 1994 https://doi.org/10.1016/0165-1684(94)90029-9
  3. A. J. Bell and T. J. Sejnowski, 'An information-maximization approach to blind separation and blind deconvolution,' Neural Computation, 7 (6), 1004-1034, 1995
  4. G.-J. Jang, S.-J. Yun, and Yung-Hwan, 'Feature vector transformation using independent component analysis and its application to speaker identification,' In Proceedings of Eurospeech, (Budapest Hungary), 767-760, Sept 1999
  5. H. Hermansky, S. Sharma, and P. Jain, 'Data-drived non-linear mapping for feature extraction in HMM,' In Proceeding of the Workshop on Automatic Speech Recognition and Understanding, (Keystone, CO., USA), December 1999
  6. J.-H. Lee, H.-Y. Jung, T.-W. Lee, and S.-Y. Lee, 'Speech feature extraction using independent component analysis,' In Proc. ICASSP, 3, (Istanbul, Turkey), 1631-1634, Jun 2000
  7. B. A. Olshausen and D. J. Field, 'Emergence of simple-cell receptive-field properties by learning a sparse code for natural images,' Nature, 381, 607-609, 1996 https://doi.org/10.1038/381607a0
  8. A. Hyvaerinen, 'Sparse code shrinkage: denoising of non-gaussian data by maximum likelihood estimation,' Neural Computation, 11 (7), 1739-1768, 1999 https://doi.org/10.1162/089976699300016214
  9. T.-W. Lee and M. S. Lewicki, 'The generalized Gaussian mixture model using ICA,' In International Workshop on Independent Component Analysis (ICA'00), (Helsinki), 239-244, Jun 2000
  10. C. Jutten and J. Herault, 'Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture,' Signal Processing, 24, 1-10, 1991 https://doi.org/10.1016/0165-1684(91)90079-X
  11. D. T. Pham and P. Garrat, 'Blind source separation of mixture of independent sources through a quasi-maximum likelihood approach,' IEEE Trans. on Signal Proc., 45 (7), 1712-1725, 1997 https://doi.org/10.1109/78.599941
  12. T.-W. Lee and G.-J. Jang, 'The statistical structures of male and female speech signals,' In Proc. ICASSP, (Salt Lake City, Utah), May 2001