DOI QR코드

DOI QR Code

Improved Melody Recognition Performance of a Cochlear Implant Speech Processing Strategy Using Instantaneous Frequency Encoding Based on Teager Energy Operator

  • Choi, Sung-Jin (Department of Biomedical Engineering, College of Health Science, Yonsei University) ;
  • Ryu, Sang-Baek (Department of Biomedical Engineering, College of Health Science, Yonsei University) ;
  • Kim, Kyung-Hwan (Department of Biomedical Engineering, College of Health Science, Yonsei University)
  • Received : 2010.09.01
  • Accepted : 2010.11.21
  • Published : 2010.12.31

Abstract

We present a speech processing strategy incorporating instantaneous frequency (IF) encoding for the enhancement of melody recognition performance of cochlear implants. For the IF extraction from incoming sound, we propose the use of a Teager energy operator (TEO), which is advantageous for its lower computational load. From time-frequency analysis, we verified that the TEO-based method provides proper IF encoding of input sound, which is crucial for melody recognition. Similar benefit could be obtained also from the use of a Hilbert transform (HT), but much higher computational cost was required. The melody recognition performance of the proposed speech processing strategy was compared with those of a conventional strategy using envelope extraction, and the HT-based IF encoding. Hearing tests on normal subjects were performed using acoustic simulation and a musical contour identification task. Insignificant difference in melody recognition performance was observed between the TEO-based and HT-based IF encodings, and both were superior to the conventional strategy. However, the TEO-based strategy was advantageous considering that it was approximately 35% faster than the HT-based strategy.

Keywords

References

  1. B. Wilson, and C. Finley, "Improved speech recognition with cochlear implants," Nature, vol. 352, pp. 236-238, 1991. https://doi.org/10.1038/352236a0
  2. P.C. Loizou, "Mimicking the human ear," Signal Processing Magazine, IEEE, vol. 15, pp. 101-130, 1998. https://doi.org/10.1109/79.708543
  3. P. Loizou, M. Dorman, and Z. Tu, "On the number of channels needed to understand speech," J. Acoust. Soc. AM., vol. 106, pp. 2097-2103, 1999. https://doi.org/10.1121/1.427954
  4. B.S. Wilson, R. Schatzer, E.A. Lopez-Poveda, X.A. Sum, D.T. Lawson, and R.D. Wolford, "Two new directions in speech processor design for cochlear implants, "Ear. Hearing, vol. 26, pp. 73-81, 2005. https://doi.org/10.1097/00003446-200508001-00009
  5. Z.M. Smith, B. Delgutte, and A.J. Oxenham, "Chimaeric sounds reveal dichotomies in auditory perception," Nature, vol. 416, pp. 87-90, 2002. https://doi.org/10.1038/416087a
  6. S. Brill, A. Moltner, W. Harnisch, J Muller ,and R. Hagen, "Temporal fine structure coding in low frequency channels: speech and prosody understanding, pitch and music perception and subjective benefit evaluated in a prospective randomized study," presented at 2007 Conference on Implantable Auditory Prostheses, Lake Tahoe, California, 2007.
  7. W.R. Drennan, J.K. Longnion, C. Ruffin, and J.T. Rubinstein, "Discrimination of Schroeder-phase harmonic complexes by normal-hearing and cochlear-implant listeners," J. Assoc. Res. Otolaryngol., vol. 9, pp. 138-149, 2008. https://doi.org/10.1007/s10162-007-0107-6
  8. W.R. Drennan, J.H. Won, V.K. Dasika, and J.T. Rubinstein, "Effects of temporal fine structure on the lateralization of speech and on speech understanding in noise," J. Assoc. Res. Otolaryngol., vol. 8, pp. 373-383, 2007. https://doi.org/10.1007/s10162-007-0074-y
  9. F. Zeng, K. Nie, G.S. Stickney, Y. Kong, M. Vongphoe, A. Bhargave, C. Wei, and K. Cao, "Speech recognition with amplitude and frequency modulations," Proc. Natl. Acad. Sci. USA, vol. 102, pp. 2293-2298, 2004.
  10. K. Nie, G. Sticney, and F. Zeng, "Encoding frequency modulation to improve cochlear implant performance in noise," IEEE Trans. Biomed. Eng., vol. 52, pp. 64-73, 2004.
  11. R.E. Ziemer, and W.H. Tranter, Principles of Communications, 5th Ed., New York: Wiley, 2001.
  12. J.F. Kaiser, "On a simple algorithm to calculate the energy of a signal, "in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 381-384, 1990.
  13. P. Maragos, J.F. Kaiser, and T.F. Quatieri, "Energy separation in signal modulations with application to speech analysis," IEEE Trans. Signal Process., vol. 41, pp. 3024-3050, 1993. https://doi.org/10.1109/78.277799
  14. A. Potamianos, and P. Maragos, "A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation," Signal Process., vol. 37, pp. 95-120, 1994. https://doi.org/10.1016/0165-1684(94)90169-4
  15. J.H. Galvin, Q. Fu, and G. Nogaki, "Melodic contour identification by cochlear implant listeners," Ear and Hear., vol. 28, pp. 302-310, 2007. https://doi.org/10.1097/01.aud.0000261689.35445.20
  16. S. Bandyopadhyay, E.D. Young, "Discrimination of voiced stop consonants based on auditory nerve discharges," J. Neurosci., vol. 24, pp. 531-541, Jan. 2004. https://doi.org/10.1523/JNEUROSCI.4234-03.2004
  17. E.D. Young, M.B. Sachs, "Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers," J. Acoust. Soc. Am., vol. 66, pp. 1381-1403, Nov. 1979. https://doi.org/10.1121/1.383532
  18. A.R. Palmer, I.M. Winter, C.J. Darwin, "The representation of steady-state vowel sounds in the temporal discharge patterns of the guinea pig cochlear nerve and primary-like cochlear nucleus neurons," J. Acoust. Soc. Am., vol. 79, pp. 100-113, Jan. 1986. https://doi.org/10.1121/1.393633
  19. B.K. Yang, "An Acoustical Study of Korean Diphthongs," MALSORI (in Korean), vol. 25, pp. 3-26, 1993.
  20. Y. Kong, and F. Zeng, "Temporal and spectral cues in Mandarin tone recognition," J. Acoust. Soc. AM., vol. 120, pp. 2830-2840, 2006. https://doi.org/10.1121/1.2346009
  21. K. Nie, G.S. Stickney, and F.G. Zeng, "Encoding frequency modulation to improve cochlear implant performance in noise," IEEE Trans. on Biomedical Engineering, vol. 52, pp. 64-73, 2005. https://doi.org/10.1109/TBME.2004.839799
  22. C.S. Throckmortona, M. S. Kucukoglua, J. J. Remusa, L. M. Collins, "Acoustic model investigation of a multiple carrier frequency algorithm for encoding fine frequency structure: Implications for cochlear implants," Hearing Research, vol. 218, Issues 1-2, pp. 30-42, 2006. https://doi.org/10.1016/j.heares.2006.03.020
  23. R Schatzer, A Krenmayr, D. K. Au, M. Kals, C. Zierhofer, "Temporal fine structure in cochlear implants: preliminary speech perception results in Cantonese-speaking implant users," Acta Otolaryngol., vol. 130, no. 9, pp. 1031-1039, 2010. https://doi.org/10.3109/00016481003591731