Parts-Based Feature Extraction of Spectrum of Speech Signal Using Non-Negative Matrix Factorization

  • Park, Jeong-Won (Department of Electronic Engineering, Dong-A University) ;
  • Kim, Chang-Keun (Department of Electronic Engineering, Dong-A University) ;
  • Lee, Kwang-Seok (Department of Electronic Engineering, Jinju National University) ;
  • Koh, Si-Young (School of Electronic Information and Communication Engineering, Kyungil University) ;
  • Hur, Kang-In (Department of Electronic Engineering, Dong-A University)
  • Published : 2003.12.01

Abstract

In this paper, we proposed new speech feature parameter through parts-based feature extraction of speech spectrum using Non-Negative Matrix Factorization (NMF). NMF can effectively reduce dimension for multi-dimensional data through matrix factorization under the non-negativity constraints, and dimensionally reduced data should be presented parts-based features of input data. For speech feature extraction, we applied Mel-scaled filter bank outputs to inputs of NMF, than used outputs of NMF for inputs of speech recognizer. From recognition experiment result, we could confirm that proposed feature parameter is superior in recognition performance than mel frequency cepstral coefficient (MFCC) that is used generally.

Keywords

References

  1. Daniel D. Lee and H. Sebastian Seung, 'Learning the parts of objects by non-negative matrix factorization,' Nature vol. 401, Oct. 21, 1999, pp-788-791 https://doi.org/10.1038/44565
  2. Daniel D. Lee, H. Sebastian Seung, 'Algorithms for Non-Negative Matrix Factorization', in Advances in Neural Information Processing System 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds., 2001
  3. H. Y. Choi, S. J. Choi, 'Learning the Sparse Codes of Speeches via Non-Negative Matrix Factorization, CVPR 2002
  4. Sven Behnke, 'Discovering hierarchical speech features using convolutional non-negative matrix factorization', IJCNN'03, vol. 4, Oct. 14, 2003, pp. 2758-2763
  5. Hoyer. P. O, 'Non-Negative Sparse Coding', Neural Networks for Signal Processing, 2002. Proceedings of the 2002 $12^{th}$ IEEE Workshop on, 2002, pp. 557-565
  6. S. Tsuge, M. Shishibori, S. Kurojwa, K. Kita, 'Dimensionally Reduction Using Non-Negative Matrix Factorization for Information Retrieval', Systems, Man, and Cybermetics, 2001 IEEE International Conference on, vol. 2, 2001, pp. 960-965
  7. D. Guillamet, B. Schiele, J. Vitria, 'Analyzing nonnegative matrix factorization for image classification', Pattern Recognition, 2002. Proceedings. 16th international Conference on, vol. 2, Aug. 2002, pp. 116-119
  8. L. R. Rabiner, R. W. Schafer, 'Digital Processing of Speech Signals', Prentice Hall, 1978
  9. L. R. Rabiner, B. H. Juang, 'Fundamentals of Speech Recognition', Prentice Hall, 1993
  10. Simon Haykin, 'Neural Networks a Comprehensive Foundation', Prentice Hall, 1999
  11. J. W. Park, P. W. Kim, C. K. Kim, K. I. Hur, 'Adoption of Support Vector Machine and Independent Component Analysis for Implementation of Speech Recognizer', Summer Conference of lEEK, vol. 26, no.1, July, 2003, pp. 2164-2167