A review of speech perception: The first step for convergence on speech engineering

말소리지각에 대한 종설: 음성공학과의 융복합을 위한 첫 단계

  • Received : 2017.10.30
  • Accepted : 2017.12.20
  • Published : 2017.12.28


People observe a lot of events in our environment and we do not have any difficulty to perceive events including speech perception. Like perception of biological motion, two main theorists have debated on speech perception. The purpose of this review article is to briefly describe speech perception and compare these two theories of speech perception. Motor theorists claim that speech perception is special to human because we both produce and perceive articulatory events that are processed by innate neuromotor commands. However, direct perception theorists claim that speech perception is not different from nonspeech perception because we only need to detect information directly like all other kinds of event. It is important to grasp the fundamental idea of how human perceive articulatory events for the convergence on speech engineering. Thus, this basic review of speech perception is expected to be able to used for AI, voice recognition technology, speech recognition system, etc.


Convergence;Event perception;Speech perception;Direct perception theory;Motor theory


  1. Y. Lee, "A review of event perception: The first step for convergence on robotics", Journal of Digital Convergence, Vol. 13, No. 4, pp. 357-368, 2015.
  2. A. Yim, D. Kim, and S. Rhee, "Korean ESL learners' perception of English segments: a cochlear implant simulation study", Phonetics and Speech Sciences, Vol. 6, No. 3, pp. 91-99, 2014.
  3. R. L. Diehl, and K. R. Kluender, "On the objects of speech perception", Ecological Psychology, Vol. 1, pp. 121-144, 1989.
  4. C. A. Fowler, "An event approach to the study of speech perception from a direct-realist perspective", Journal of Phonetics, Vol. 14, pp. 3-28, 1986.
  5. J. A. S. Helpso, B. Tuller, E. Vatikiotis-Bateson, and C. A. Fowler, "Functionally specific articulatory cooperation following jaw perturbations during speech: Evidence for coordinative structures", J. of Experimental Psychology: Human Perception and Performance, Vol. 10, pp. 812-832, 1984.
  6. P. B. Denes, and E. N. Pinson, "The speech chain: The physics and biology of spoken language", New York: W. H. Freeman and Company, 1996.
  7. R. E. Remez, P. E. Rubin, D. B. Pisoni, and T. D. Carrell, "Speech perception without traditional speech cues", Science, Vol. 212, pp. 947-950, 1981.
  8. G. Johansson, "Visual perception of biological motion and a model for its analysis", Perception & Psychophysics, Vol. 14, pp. 201-211, 1973.
  9. C. A. Fowler, and B. Rakerd, "Work group on speech and sign language", In W. H. Warren & R. E. Shaw (Eds.), Persistence and Change, Hillsdale, NJ: Erlbaum, pp.283-298, 1985.
  10. A. M. Liberman, and I. G. Mattingly, "The motor theory of speech perception revised", Cognition, Vol. 21, pp. 1-36, 1985.
  11. P. Eiman, E. R. Siqueland, P. Jusczyk, and J. Vigorito, "Speech perception in infants", Science, Vol. 171, pp. 125-138, 1985.
  12. R. L. Diehl, and M. A. Walsh, "An auditory basis for the stimulus-length effect in the perception of stops and glides", J. of Acoustical Society of America, Vol. 85, pp. 2154-2164, 1989.
  13. A. M. Liberman, K. S. Harris, J. Kinney, and H. Lane, "The discrimination of relative onset-time of the components of certain speech and nonspeech patterns", J. of Experimental Psychology, Vol. 61, pp. 379-388.
  14. D. B. Pisoni, T. D. Carrell, and S. J. Gans, "Perception of the duration of rapid spectrum changes in speech and nonspeech signals", Perception & Psychophysics, Vol. 34, pp. 314-322, 1983.
  15. J. L. Miller, and A. M. Liberman, "Some effects of later-occurring information on the perception of stop consonant and semivowel", Perception & Psychophysics, Vol. 25, pp. 457-465, 1979.
  16. P. Eimas, "The equivalence of cues in the perception of speech by infants", Infant Behavior and Development, Vol. 8, pp. 125-138, 1985.
  17. C. T. Best, M. Studdert-Kennedy, S. Manuel, and J. Rubin-Spitz, "Discovering phonetic coherence in acoustic patterns", Perception & Psychophysics, Vol. 45, pp. 237-250, 1989.
  18. C. A. Fowler, and B. Galantucci, "The relation of speech perceptio nand speech production", In D. B. Pisoni & R. E. Remez (Eds.), The Handbook of Speech Perception, Oxford, UK: Blackwell, pp. 633-652, 2005.
  19. J. J. Gibson, "A theory of direct visual perception" In J. Royce & W. Rozeboom (Eds.), The Psychology of Knowing, New York and London: Gordon and Breach, pp. 215-227, 1972.
  20. C. A. Fowler, "Sound-producing sources as objects of perception: Rate normalization and nonspeech perception", J. of Acoustical Society of America, Vol. 88, pp. 1236-1249, 1990.
  21. E. Yoon, "The effects of perceptual training on speech production: Focusing on Korean vowels", Studies in Foreign Language Education, Vol. 22, No. 2, pp. 1-27, 2013.
  22. J. Hwang, "Voice recognition performance improvement using the convergence of Bayesian method and selective speech feature extraction", J. of the Korea Convergence Society, Vol. 7, No. 6, pp. 7-11, 2016.
  23. J. Lee, J. Lee, and J. Lee, "Speech recognition of Korean phonemes 'ㅅ','ㅈ','ㅊ' based on sign distribution volatility", J. of KIISE: Computing Practices and Letters, Vol. 19, No. 7, pp. 377-382, 2013.
  24. S. Nam, E. Jean, and I. Park, "A real-time embedded speech recognition system", The Institute of Electronics Engineers of Korea-Computer and Information, Vol. 40, No. 1, pp. 74-81, 2003.
  25. Y. Lee, and S. Kim, "Study on the situational satisfaction survey of smart phone based on voice recognition technology", J. of Digital Convergence, Vol. 15, No. 8, pp. 351-357, 2017.
  26. H. Yuk, and B. Cho, "A study on the humanistic measure about cultural changes of voice recognition technology", J. of Digital Convergence, Vol. 13, No. 8, pp. 21-31, 2015.
  27. M. Chung, S. Park, B. Chae, and J. Lee, "Analyses of major research trends in artificial intelligence through analysis of thesis data", J. of Digital Convergence, Vol. 15, No. 5, pp. 225-233, 2017.


Supported by : Dankook University