DOI QR코드

DOI QR Code

A review of speech perception: The first step for convergence on speech engineering

말소리지각에 대한 종설: 음성공학과의 융복합을 위한 첫 단계

  • Received : 2017.10.30
  • Accepted : 2017.12.20
  • Published : 2017.12.28

Abstract

People observe a lot of events in our environment and we do not have any difficulty to perceive events including speech perception. Like perception of biological motion, two main theorists have debated on speech perception. The purpose of this review article is to briefly describe speech perception and compare these two theories of speech perception. Motor theorists claim that speech perception is special to human because we both produce and perceive articulatory events that are processed by innate neuromotor commands. However, direct perception theorists claim that speech perception is not different from nonspeech perception because we only need to detect information directly like all other kinds of event. It is important to grasp the fundamental idea of how human perceive articulatory events for the convergence on speech engineering. Thus, this basic review of speech perception is expected to be able to used for AI, voice recognition technology, speech recognition system, etc.

Keywords

Convergence;Event perception;Speech perception;Direct perception theory;Motor theory

References

  1. Y. Lee, "A review of event perception: The first step for convergence on robotics", Journal of Digital Convergence, Vol. 13, No. 4, pp. 357-368, 2015.
  2. A. Yim, D. Kim, and S. Rhee, "Korean ESL learners' perception of English segments: a cochlear implant simulation study", Phonetics and Speech Sciences, Vol. 6, No. 3, pp. 91-99, 2014.
  3. R. L. Diehl, and K. R. Kluender, "On the objects of speech perception", Ecological Psychology, Vol. 1, pp. 121-144, 1989. https://doi.org/10.1207/s15326969eco0102_2
  4. C. A. Fowler, "An event approach to the study of speech perception from a direct-realist perspective", Journal of Phonetics, Vol. 14, pp. 3-28, 1986.
  5. J. A. S. Helpso, B. Tuller, E. Vatikiotis-Bateson, and C. A. Fowler, "Functionally specific articulatory cooperation following jaw perturbations during speech: Evidence for coordinative structures", J. of Experimental Psychology: Human Perception and Performance, Vol. 10, pp. 812-832, 1984. https://doi.org/10.1037/0096-1523.10.6.812
  6. P. B. Denes, and E. N. Pinson, "The speech chain: The physics and biology of spoken language", New York: W. H. Freeman and Company, 1996.
  7. R. E. Remez, P. E. Rubin, D. B. Pisoni, and T. D. Carrell, "Speech perception without traditional speech cues", Science, Vol. 212, pp. 947-950, 1981. https://doi.org/10.1126/science.7233191
  8. G. Johansson, "Visual perception of biological motion and a model for its analysis", Perception & Psychophysics, Vol. 14, pp. 201-211, 1973. https://doi.org/10.3758/BF03212378
  9. C. A. Fowler, and B. Rakerd, "Work group on speech and sign language", In W. H. Warren & R. E. Shaw (Eds.), Persistence and Change, Hillsdale, NJ: Erlbaum, pp.283-298, 1985.
  10. A. M. Liberman, and I. G. Mattingly, "The motor theory of speech perception revised", Cognition, Vol. 21, pp. 1-36, 1985. https://doi.org/10.1016/0010-0277(85)90021-6
  11. P. Eiman, E. R. Siqueland, P. Jusczyk, and J. Vigorito, "Speech perception in infants", Science, Vol. 171, pp. 125-138, 1985.
  12. R. L. Diehl, and M. A. Walsh, "An auditory basis for the stimulus-length effect in the perception of stops and glides", J. of Acoustical Society of America, Vol. 85, pp. 2154-2164, 1989. https://doi.org/10.1121/1.397864
  13. A. M. Liberman, K. S. Harris, J. Kinney, and H. Lane, "The discrimination of relative onset-time of the components of certain speech and nonspeech patterns", J. of Experimental Psychology, Vol. 61, pp. 379-388.
  14. D. B. Pisoni, T. D. Carrell, and S. J. Gans, "Perception of the duration of rapid spectrum changes in speech and nonspeech signals", Perception & Psychophysics, Vol. 34, pp. 314-322, 1983. https://doi.org/10.3758/BF03203043
  15. J. L. Miller, and A. M. Liberman, "Some effects of later-occurring information on the perception of stop consonant and semivowel", Perception & Psychophysics, Vol. 25, pp. 457-465, 1979. https://doi.org/10.3758/BF03213823
  16. P. Eimas, "The equivalence of cues in the perception of speech by infants", Infant Behavior and Development, Vol. 8, pp. 125-138, 1985. https://doi.org/10.1016/S0163-6383(85)80001-1
  17. C. T. Best, M. Studdert-Kennedy, S. Manuel, and J. Rubin-Spitz, "Discovering phonetic coherence in acoustic patterns", Perception & Psychophysics, Vol. 45, pp. 237-250, 1989. https://doi.org/10.3758/BF03210703
  18. C. A. Fowler, and B. Galantucci, "The relation of speech perceptio nand speech production", In D. B. Pisoni & R. E. Remez (Eds.), The Handbook of Speech Perception, Oxford, UK: Blackwell, pp. 633-652, 2005.
  19. J. J. Gibson, "A theory of direct visual perception" In J. Royce & W. Rozeboom (Eds.), The Psychology of Knowing, New York and London: Gordon and Breach, pp. 215-227, 1972.
  20. C. A. Fowler, "Sound-producing sources as objects of perception: Rate normalization and nonspeech perception", J. of Acoustical Society of America, Vol. 88, pp. 1236-1249, 1990. https://doi.org/10.1121/1.399701
  21. E. Yoon, "The effects of perceptual training on speech production: Focusing on Korean vowels", Studies in Foreign Language Education, Vol. 22, No. 2, pp. 1-27, 2013.
  22. J. Hwang, "Voice recognition performance improvement using the convergence of Bayesian method and selective speech feature extraction", J. of the Korea Convergence Society, Vol. 7, No. 6, pp. 7-11, 2016.
  23. J. Lee, J. Lee, and J. Lee, "Speech recognition of Korean phonemes 'ㅅ','ㅈ','ㅊ' based on sign distribution volatility", J. of KIISE: Computing Practices and Letters, Vol. 19, No. 7, pp. 377-382, 2013.
  24. S. Nam, E. Jean, and I. Park, "A real-time embedded speech recognition system", The Institute of Electronics Engineers of Korea-Computer and Information, Vol. 40, No. 1, pp. 74-81, 2003.
  25. Y. Lee, and S. Kim, "Study on the situational satisfaction survey of smart phone based on voice recognition technology", J. of Digital Convergence, Vol. 15, No. 8, pp. 351-357, 2017. https://doi.org/10.14400/JDC.2017.15.4.351
  26. H. Yuk, and B. Cho, "A study on the humanistic measure about cultural changes of voice recognition technology", J. of Digital Convergence, Vol. 13, No. 8, pp. 21-31, 2015.
  27. M. Chung, S. Park, B. Chae, and J. Lee, "Analyses of major research trends in artificial intelligence through analysis of thesis data", J. of Digital Convergence, Vol. 15, No. 5, pp. 225-233, 2017. https://doi.org/10.14400/JDC.2017.15.2.225

Acknowledgement

Supported by : Dankook University