Dimensionality Reduction in Speech Recognition by Principal Component Analysis

음성인식에서 주 성분 분석에 의한 차원 저감

  • 이창영 (동서대학교 산업경영공학과)
  • Received : 2013.07.03
  • Accepted : 2013.09.23
  • Published : 2013.09.30


In this paper, we investigate a method of reducing the computational cost in speech recognition by dimensionality reduction of MFCC feature vectors. Eigendecomposition of the feature vectors renders linear transformation of the vectors in such a way that puts the vector components in order of variances. The first component has the largest variance and hence serves as the most important one in relevant pattern classification. Therefore, we might consider a method of reducing the computational cost and achieving no degradation of the recognition performance at the same time by dimensionality reduction through exclusion of the least-variance components. Experimental results show that the MFCC components might be reduced by about half without significant adverse effect on the recognition error rate.

이 논문에서 우리는 MFCC 특징벡터의 차원 저감을 통해 음성 인식에서의 계산량을 줄이는 방법을 조사한다. 특징벡터의 특성분해는 벡터의 성분을 분산의 크기에 따라 배치되도록 선형 변환 시켜준다. 첫 번째 성분은 가장 큰 분산을 가져서 패턴 분류에서 가장 중요한 역할을 한다. 따라서, 분산이 작은 성분들을 제외시키는 차원 저감을 통하여, 계산량을 줄이면서 동시에 음성 인식 성능을 저하시키지 않는 방법을 생각할 수 있다. 실험 결과, MFCC 특징벡터의 성분을 절반 정도로 줄여도 음성인식 오류율에 큰 악영향이 없음이 확인되었다.


  1. M. Pleva, "Speech and Mobile Technologies for Cognitive Communication and Information Systems", 2011 2nd International Conference on Cognitive Infocommunications, pp. 1-5, 2011.
  2. S. Primorac & M. Russo, "Android Application for Sending SMS Messages With Speech Recognition Interface", 2012 Proceedings of the 35th International Convention, pp. 1763-1767, 2012.
  3. G. Nemeth, "Speech-Enhanced Interaction with TV", 2011 2nd International Conference on Cognitive Infocommunications, pp. 1-5, 2011.
  4. O. Viikki, I. Kiss, & J. Tian, "Speaker- and Language-Independent Speech Recognition in Mobile Communication Systems", ICASSP '01, Vol. 1, pp. 5-8, 2001.
  5. M. Kang, "A Study on the Design of Multimedia Service Platform on Wireless Intelligent Technology", The Journal of the Korea Institute of Electronic Communication Sciences, Vol. 4, No. 1, pp. 24-30, 2009.
  6. J. Yoo, H. Park, H. Shin, & Y. Shin, "A Study of the Communication Infrastructure Construction for u-City in Korea", The Journal of the Korea Institute of Electronic Communication Sciences, Vol. 1, No. 2, pp. 127-135, 2006.
  7. Y. Kim & H. Lee, "A Study on Improved Method of Voice Recognition Rate", The Journal of the Korea Institute of Electronic Communication Sciences, Vol. 8, No. 1, pp. 77-83, 2013.
  8. I. Spiro, G. Taylor, G. Williams, & C. Bregler, "Hands by Hand: Crowd-Sourced Motion Tracking for Gesture Annotation", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 17-24, 2010.
  9. W. Sun, Z. Wu, H. Hu, & Y. Zeng, "Multi-Band Maximum a Posteriori Multi- Transformation Algorithm Based on the Discriminative Combination", International Conference on Machine Learning and Cybernetics, Vol. 8, pp. 4876-4880, 2005.
  10. H. Tohidypour, S. Seyyedsalehi, H. Roshandel, & H. Behbood, "Speech Recognition Using Three Channel Redundant Wavelet Filterbank", 2nd International Conference on Industrial Mechatronics and Automation (ICIMA), Vol. 2, pp. 325-328, 2010.
  11. M. Paulik & A. Waibel, "Spoken Language Translation from Parallel Speech Audio: Simultaneous Interpretation as SLT Training Data", ICASSP, pp. 5210-5213, 2010.
  12. D. Pisoni, H. Nusbaum, & B. Greene, "Perception of Synthetic Speech Generated by Rule", Proceedings of the IEEE, Vol. 73, No. 11, pp. 1665-1676, 1985.
  13. S. Alizadeh, R. Boostani, & V. Asadpour, "Lip Feature Extraction and Reduction for HMM-Based Visual Speech Recognition Systems", 9th International Conference on Signal Processing (ICSP), pp. 561-564, 2008.
  14. V. Estellers, M. Gurban, & J. Thiran, "Selecting Relevant Visual Features for Speech Reading", IEEE International Conference on Image Processing (ICIP), pp. 1433-1436, 2009.
  15. J. Deller, J. Proakis, & J. Hansen, "Discrete-Time Processing of Speech Signals", Macmillan, New York, pp. 246-251, 1994.
  16. S. Haykin, "Neural Networks", Prentice Hall, New Jersey, pp. 392-440, 1999.
  17. L. Fausett, "Fundamentals of Neural Networks", Prentice Hall, New Jersey, p. 298, 1994.
  18. M. Dehghan, K. Faez, M. Ahmadi, & M. Shridhar, "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov Models", Pattern Recognition Letters, Vol. 22, pp. 209- 214, 2001.
  19. J. Hung, "Optimization of Filter-Bank to Improve the Extraction of MFCC Features in Speech Recognition", Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 675-678, 2004.
  20. A. Martin, D. Charlet, & A. Mauuary, "Robust Speech/Non-Speech Detection Using LDA Applied to MFCC", 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing," Vol. 1, pp. 237-240, 2001.
  21. H. Hu & S. Zahorian, "Dimensionality Reduction Methods for HMM Phonetic Recognition", 2010 ICASSP, pp. 4854-4857, 2010.