The Effect of FIR Filtering and Spectral Tilt on Speech Recognition with MFCC

FIR 필터링과 스펙트럼 기울이기가 MFCC를 사용하는 음성인식에 미치는 효과

  • 이창영 (동서대학교 정보시스템공학부)
  • Received : 2010.06.17
  • Accepted : 2010.08.05
  • Published : 2010.08.31

Abstract

In an effort to enhance the quality of feature vector classification and thereby reduce the recognition error rate for the speaker-independent speech recognition, we study the effect of spectral tilt on the Fourier magnitude spectrum en route to the extraction of MFCC. The effect of FIR filtering on the speech signal on the speech recognition is also investigated in parallel. Evaluation of the proposed methods are performed by two independent ways of the Fisher discriminant objective function and speech recognition test by hidden Markov model with fuzzy vector quantization. From the experiments, the recognition error rate is found to show about 10% relative improvements over the conventional method by an appropriate choice of the tilt factor.

특징벡터의 분류를 개선시켜 화자독립 음성인식의 오류율을 줄이려는 노력의 일환으로서, 우리는 MFCC의 추출에 있어서 푸리에 스펙트럼을 기울이는 방법이 미치는 효과를 연구한다. 음성신호에 FIR 필터링을 적용하는 효과의 조사도 병행된다. 제안된 방법은 두 가지 독립적인 방법에 의해 평가된다. 즉, 피셔의 차별함수에 의한 방법과 은닉 마코브 모델 및 퍼지 벡터양자화를 사용한 음성인식 오류율 조사 방법이다. 실험 결과, 적절한 파라미터의 선택에 의해 기존의 방법에 비해 10% 정도 낮은 인식 오류율이 얻어짐을 확인하였다.

Keywords

References

  1. G. Kaplan, "Words Into Action I," IEEE Spectrum, vol. 17, pp. 22-26, 1980.
  2. K. H. Davis, R. Biddulph, & S. Balashek, "Automatic Recognition of Spoken Digits," J. Acoust. Soc. Am., vol. 24, no. 6, pp. 637-642, 1952. https://doi.org/10.1121/1.1906946
  3. J. W. Picone, "Signal Modeling Techniques in Speech Recognition." Proc. IEEE, vol. 81, no. 9, pp. 1215-1247, 1993. https://doi.org/10.1109/5.237532
  4. J.-C. Wang, J.-F. Wang, & Y. Weng, "Chip Design of MFCC Extraction For Speech Recognition." The VLSI Journal, vol. 32, pp. 111-131, 2002. https://doi.org/10.1016/S0167-9260(02)00045-7
  5. E. Zwicker & E. Terhardt, "Analytical Expressions for Critical Band Rate and Critical Bandwidth As a Function of Frequency." J. Acoust. Soc. America, vol. 68, no. 5, pp. 1523-1525, 1980. https://doi.org/10.1121/1.385079
  6. W. Han, C. Chan, C. Choy, & K. Pun, "An Efficient MFCC Extraction Method in Speech Recognition." 2006 IEEE International Symposium on Circuits and Systems, pp. 145-148, 2006.
  7. Wikipedia Encyclopedia on Pre-emphasis.
  8. R. A. Fisher, "The Use of Multiple Measurements in Taxonomic Problems." Annals of Eugenics, vol. 7, pp. 179-188, 1936. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
  9. L. Rabiner and B. Juang, "Fundamentals of Speech Recognition," Prentice Hall, New Jersey, pp. 112-113, 1993.
  10. J. Hung, "Optimization of Filter-Bank to Improve the Extraction of MFCC Features in Speech Recognition", Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 675-678, 2004.
  11. A. Martin, D. Charlet, & A. Mauuary, "Robust Speech/Non-Speech Detection Using LDA Applied to MFCC", 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing," vol. 1, pp. 237-240, 2001.
  12. R. Hecht-Nielsen, "Neurocomputing," Reading, Massachusetts, Addison-Wesley, 1990.
  13. M. Dehghan, K. Faez, M. Ahmadi, & M. Shridhar, "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov Models," Pattern Recognition Letters, vol. 22, pp. 209-214, 2001. https://doi.org/10.1016/S0167-8655(00)00090-8
  14. S. E. Levinson, L. R. Rabiner, & M. M. Sondhi, "An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition," Bell Systems Tech. J., vol. 62, no. 4, pp. 1035-1074, 1983. https://doi.org/10.1002/j.1538-7305.1983.tb03114.x