DOI QR코드

DOI QR Code

HMM-Based Automatic Speech Recognition using EMG Signal

  • Lee Ki-Seung (Department of Electronic Engineering, Konkuk University)
  • Published : 2006.06.01

Abstract

It has been known that there is strong relationship between human voices and the movements of the articulatory facial muscles. In this paper, we utilize this knowledge to implement an automatic speech recognition scheme which uses solely surface electromyogram (EMG) signals. The EMG signals were acquired from three articulatory facial muscles. Preliminary, 10 Korean digits were used as recognition variables. The various feature parameters including filter bank outputs, linear predictive coefficients and cepstrum coefficients were evaluated to find the appropriate parameters for EMG-based speech recognition. The sequence of the EMG signals for each word is modelled by a hidden Markov model (HMM) framework. A continuous word recognition approach was investigated in this work. Hence, the model for each word is obtained by concatenating the subword models and the embedded re-estimation techniques were employed in the training stage. The findings indicate that such a system may have a capacity to recognize speech signals with an accuracy of up to 90%, in case when mel-filter bank output was used as the feature parameters for recognition.

Keywords

References

  1. F. Grandori, P. Pinelli, P. Ravazzani, F. Ceriani, G. Miscio, F. Pisano, R. Colombo, S. Insalaco, and G. Tognola, 'Multiparametric analysis ofspeech production mechanisms,' IEEE EMB Magazine, vol. 13, issue 2, pp. 203-209, 1995
  2. E.A. Goldstein, J.T. Heaton, J.B. Kobler, G.B. Stanley, and R.E. Hiiman, 'Design and implementation of ahands-free electolarynxesign device controlled by neck strap muscle electromyographic activity,' IEEE Trans. Biomed. Eng., vol. 51, no. 2, pp. 325-332, 2004 https://doi.org/10.1109/TBME.2003.820373
  3. H. Manabe, and Z. Zhang, 'Multi-streamHMM for EMG-based speech recognition,' in Proc. 26th Annual International Conference of the IEEE EMBS, San Francisco, CA, USA, 2004, pp.4389-4392
  4. S. Kumar, D.K. Kumar, M. Alemu, and M. Burry, 'EMG based voice recognition,' in Proc. 2004 Intelligent Sensor, Sensor Networks and Information Processing Conference, 2004, pp. 597-596
  5. H.-J. Park, S.-H. Kwon, H.-C. Kim, and K.-S. Park, 'Adaptive EMG-driven communication for the disability,' in Proc. 1st Joint BMES/EMBS Conference, Atlanta, GA, USA, 1999, pp. 656
  6. A.D.C. Chan, K. Englehart, B. Hudgins, and D.F. Lovely, ''Hidden Markov Model classification of myoelectics signals in speech,' IEEE EMB Magazine, vol. 9, pp. 143-146, 2002
  7. A.D.C. Chan, K. Englehart, B. Hudgins, and D.F. Lovely, 'Myoelectric signals to augment speech recognition' Med. Biol. Eng. Comput., pp. 500-504, 2001 https://doi.org/10.1007/BF02345373
  8. R.S. Kumaran, K. Narayanan, and J.N. Gowdy, 'Myoelectric signals for multimodal speech recognition,' in Proc. 2005 EUROSPEECH, Lisboa, Portugal, 2005, pp. 1189-1192
  9. L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Englewood Cliffs, NJ, USA: Prentice-Hall, 1993
  10. G.M. White and R.B. Neely, 'Speech recognition experiments with linear prediction, bandpass filtering, and dynamic programming,' IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-24, no. 2, pp. 183-188, 1976
  11. C. Jorgensen and D.D. Lee, and S. Agabon, 'Sub auditory speech recognition based on EMG signals,' in Proc. the International Joint Conference on Neural Network, vol. 4, 2003, pp. 3128-3133
  12. L.R. Rabiner, and R.W. Schafer, Digital Processing of Speech Signal, Englewood Cliffs, NJ, USA: Prentice Hall, 1978
  13. A. Dempster, N. Laird, and D. Rubin, 'Maximum likelihood from incomplete data via the EM algorithm,' Journal of Royal Statistical Society, vol. 39, pp. 1-38, 1977
  14. M. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou, and A. Syrdal, 'The AT&T Next-Gen TTS system,' in Proc. the Joint Meeting of ASA, EAA, and DAGA, Berlin, Germany, March 1999
  15. L.R. Rabiner, J.G. Wilpon and F.K. Soong, ''High performance connected digit recognition using hidden Markov models,' IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, issue 8, pp. 1214-1225, 1989 https://doi.org/10.1109/29.31269
  16. K. Ogino and W.M. Kozak, 'Spectrum analysis of surface electromyogram,' in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Boston, MA, USA, 1983, pp. 1114-1117
  17. B. Fisher, Tsylb2-1.1 Syllabification software, Available: http://www.nist.gov/speech/tools, August, 1996
  18. S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev and P. Woodland, HTK Speech Recognition Toolkit, Available: http://htk.eng.cam.ac.uk