DOI QR코드

DOI QR Code

Dimensionality Reduction in Speech Recognition by Principal Component Analysis

음성인식에서 주 성분 분석에 의한 차원 저감

  • 이창영 (동서대학교 산업경영공학과)
  • Received : 2013.07.03
  • Accepted : 2013.09.23
  • Published : 2013.09.30

Abstract

In this paper, we investigate a method of reducing the computational cost in speech recognition by dimensionality reduction of MFCC feature vectors. Eigendecomposition of the feature vectors renders linear transformation of the vectors in such a way that puts the vector components in order of variances. The first component has the largest variance and hence serves as the most important one in relevant pattern classification. Therefore, we might consider a method of reducing the computational cost and achieving no degradation of the recognition performance at the same time by dimensionality reduction through exclusion of the least-variance components. Experimental results show that the MFCC components might be reduced by about half without significant adverse effect on the recognition error rate.

References

  1. M. Pleva, "Speech and Mobile Technologies for Cognitive Communication and Information Systems", 2011 2nd International Conference on Cognitive Infocommunications, pp. 1-5, 2011.
  2. S. Primorac & M. Russo, "Android Application for Sending SMS Messages With Speech Recognition Interface", 2012 Proceedings of the 35th International Convention, pp. 1763-1767, 2012.
  3. G. Nemeth, "Speech-Enhanced Interaction with TV", 2011 2nd International Conference on Cognitive Infocommunications, pp. 1-5, 2011.
  4. O. Viikki, I. Kiss, & J. Tian, "Speaker- and Language-Independent Speech Recognition in Mobile Communication Systems", ICASSP '01, Vol. 1, pp. 5-8, 2001.
  5. M. Kang, "A Study on the Design of Multimedia Service Platform on Wireless Intelligent Technology", The Journal of the Korea Institute of Electronic Communication Sciences, Vol. 4, No. 1, pp. 24-30, 2009.
  6. J. Yoo, H. Park, H. Shin, & Y. Shin, "A Study of the Communication Infrastructure Construction for u-City in Korea", The Journal of the Korea Institute of Electronic Communication Sciences, Vol. 1, No. 2, pp. 127-135, 2006.
  7. Y. Kim & H. Lee, "A Study on Improved Method of Voice Recognition Rate", The Journal of the Korea Institute of Electronic Communication Sciences, Vol. 8, No. 1, pp. 77-83, 2013. https://doi.org/10.13067/JKIECS.2013.8.1.077
  8. I. Spiro, G. Taylor, G. Williams, & C. Bregler, "Hands by Hand: Crowd-Sourced Motion Tracking for Gesture Annotation", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 17-24, 2010.
  9. W. Sun, Z. Wu, H. Hu, & Y. Zeng, "Multi-Band Maximum a Posteriori Multi- Transformation Algorithm Based on the Discriminative Combination", International Conference on Machine Learning and Cybernetics, Vol. 8, pp. 4876-4880, 2005.
  10. H. Tohidypour, S. Seyyedsalehi, H. Roshandel, & H. Behbood, "Speech Recognition Using Three Channel Redundant Wavelet Filterbank", 2nd International Conference on Industrial Mechatronics and Automation (ICIMA), Vol. 2, pp. 325-328, 2010.
  11. M. Paulik & A. Waibel, "Spoken Language Translation from Parallel Speech Audio: Simultaneous Interpretation as SLT Training Data", ICASSP, pp. 5210-5213, 2010.
  12. D. Pisoni, H. Nusbaum, & B. Greene, "Perception of Synthetic Speech Generated by Rule", Proceedings of the IEEE, Vol. 73, No. 11, pp. 1665-1676, 1985. https://doi.org/10.1109/PROC.1985.13346
  13. S. Alizadeh, R. Boostani, & V. Asadpour, "Lip Feature Extraction and Reduction for HMM-Based Visual Speech Recognition Systems", 9th International Conference on Signal Processing (ICSP), pp. 561-564, 2008.
  14. V. Estellers, M. Gurban, & J. Thiran, "Selecting Relevant Visual Features for Speech Reading", IEEE International Conference on Image Processing (ICIP), pp. 1433-1436, 2009.
  15. J. Deller, J. Proakis, & J. Hansen, "Discrete-Time Processing of Speech Signals", Macmillan, New York, pp. 246-251, 1994.
  16. S. Haykin, "Neural Networks", Prentice Hall, New Jersey, pp. 392-440, 1999.
  17. L. Fausett, "Fundamentals of Neural Networks", Prentice Hall, New Jersey, p. 298, 1994.
  18. M. Dehghan, K. Faez, M. Ahmadi, & M. Shridhar, "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov Models", Pattern Recognition Letters, Vol. 22, pp. 209- 214, 2001. https://doi.org/10.1016/S0167-8655(00)00090-8
  19. J. Hung, "Optimization of Filter-Bank to Improve the Extraction of MFCC Features in Speech Recognition", Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 675-678, 2004.
  20. A. Martin, D. Charlet, & A. Mauuary, "Robust Speech/Non-Speech Detection Using LDA Applied to MFCC", 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing," Vol. 1, pp. 237-240, 2001.
  21. H. Hu & S. Zahorian, "Dimensionality Reduction Methods for HMM Phonetic Recognition", 2010 ICASSP, pp. 4854-4857, 2010.