Voice Recognition-Based on Adaptive MFCC and Deep Learning for Embedded Systems

임베디드 시스템에서 사용 가능한 적응형 MFCC 와 Deep Learning 기반의 음성인식

  • Bae, Hyun Soo (Department of Electrical Engineering, Yeungnam University) ;
  • Lee, Ho Jin (Department of Electrical Engineering, Yeungnam University) ;
  • Lee, Suk Gyu (Department of Electrical Engineering, Yeungnam University)
  • Received : 2016.07.03
  • Accepted : 2016.08.30
  • Published : 2016.10.01


This paper proposes a noble voice recognition method based on an adaptive MFCC and deep learning for embedded systems. To enhance the recognition ratio of the proposed voice recognizer, ambient noise mixed into the voice signal has to be eliminated. However, noise filtering processes, which may damage voice data, diminishes the recognition ratio. In this paper, a filter has been designed for the frequency range within a voice signal, and imposed weights are used to reduce data deterioration. In addition, a deep learning algorithm, which does not require a database in the recognition algorithm, has been adapted for embedded systems, which inherently require small amounts of memory. The experimental results suggest that the proposed deep learning algorithm and HMM voice recognizer, utilizing the proposed adaptive MFCC algorithm, perform better than conventional MFCC algorithms in its recognition ratio within a noisy environment.


Grant : BK21플러스

Supported by : 영남대학교


  1. Y. Ephraim, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Signal processing Society, vol. 32, no. 6, pp. 1109-1121, 1984.
  2. Y. Ephraim, "Speech enhancement using a minimum mean-square error log-spectral amplitude estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443-445, 1985.
  3. M. Berouti, "Enhancement of speech corrupted by acoustic noise," IEEE International Conference on ICASSP, vol. 4, pp. 208-211, 1979.
  4. S. D. Kamath and P. C. Loizou, "A multi-band spectral subtraction method for enhancing speech corrupted by colored noise," IEEE International Conference on Acoustics Speech and Signal Processing, vol. 4, pp. 4164-4164, 2002.
  5. B. Steven, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on Acoustic, and Signal Precessing, vol. 27, no. 2, pp. 113-120, 1979.
  6. E. Yariv and H. L. Van Trees, "A signal subspace approach for speech enhancement," IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, 1995.
  7. S. P. Ghael, A. M. Sayeed, and R. G. Baraniuk, "Improved wavelet denoising via empirical Wiener filtering," Optical Science, Engineering and Instrumentation 97. International Society for Optics and Photonics, pp. 389-399, 1997
  8. R. Martin, "Speech Enhancement based on Minimum mean-square error estimation and supergaussian priors," IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 845-856, 2005.
  9. J. S. Erkelens, R. C. Hendriks, R. Heusdens, and J. Jensen, "Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors," IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 6, pp. 17441-1752, 2007.
  10. T. Takiguchi, S. Nakamura, Q. Hou, and K. Shikano, "Model adaptation based on HMM decomposition for reverberant speech recognition," Acoustics, Speech, and Signal Processing, vol. 2, pp. 827-830, 1997.
  11. H.-S. Cho, M.-G. Park, H.-J. Lee, and M.-C. Lee, "Development of autonomous mobile robot with speech teaching command recognition based on hidden markov model," Journal of Institute of Control, Robotics and Systems, vol. 13, no. 8, pp. 726-734, 2007.
  12. M. J. F. Gales and S. J. Young, "Robust continuous speech recognition using parallel model combination," IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5, pp. 352-359, 1996.
  13. C.-H. Park and K.-B. Sim, "The pattern recognition methods for emotion recognition with speech signal," Journal of Institute of Control, Robotics and Systems, vol. 12, no. 3, pp. 284-288, 2006.
  14. L. Muda, M. Begam, I. Elamvazuthi, "Voice recognition algorithms using mel frequency cepstral coefficient(MFCC) and dynamic time warping(DTW) techniques," Journal of Computing, vol. 2, pp. 138-143, 2010.
  15. Logan, Beth, Mel Frequency Cepstral Coefficients for Music Modeling, ISMIR, 2000.
  16. S. Sigurdsson, K. B. Petersen, and T. Lehn-Schioler, "Mel-Frequency cepstral coefficients: An evaluation of robustness of mp3 encoded music," Proc. of Seventh International Conference on Music Information Retrieval (ISMIR), 2006.
  17. Hemansky, Hynek, "Perceptual linear predictive(PLP) analysis of speech," The Journal of the Acoustical Society of America, vol. 87, no. 4, 1990.
  18. F.-M. Wang, P. Kabal, R. P. Ramachandran, and D. O'Shaughnessy, "Frequency domain adaptive post filtering for enhancement of noisy speech," Speech Communication, vol. 12, no. 1, pp. 41-56, 1993.
  19. B. Raj, E. B. Gouvea, P. J. Moreno, and R. M. Stern, "Cepstral compensation by polynomial approximation for environmentindependent speech recognition," Spoken Language ICSLP Proceedings, vol. 4, pp. 2340-2343, 1996.
  20. H. S. Bae and S. G. Lee, "Voice recognition based on adaptive MFCC and neural network," IEMEK Journal of Embedded Systems and Applications, vol. 2, pp. 57-66, 2010.
  21. M. S. Kim, S. Y. Jo, J. H. Kim, Y. G. Jung, and S. H. Han, "A study on real-time implementation of robot working command by voice recognition," Journal of Control, Automation, and Systems Engineering, pp. 69-70, 2016.
  22. M. Jo and Y. Jung, "Performance comparison of speech recognition in real and test environment," Journal of Control, Automation, and Systems Engineering, pp. 498-499, 2015.