DOI QR코드

DOI QR Code

A Lip-reading Algorithm Using Optical Flow and Properties of Articulatory Phonation

광류와 조음 발성 특성을 이용한 립리딩 알고리즘

  • Lee, Mi Ae (Dept. of Information & Communication Engineering, Kyung-Sung University)
  • Received : 2018.03.09
  • Accepted : 2018.06.13
  • Published : 2018.07.31

Abstract

Language is an essential tool for verbal and emotional communication among human beings, enabling them to engage in social interactions. Although a majority of hearing-impaired people can speak; however, they are unable to receive feedback on their pronunciation most of them can speak. However, they do not receive feedback on their pronunciation. This results in impaired communication owing to incorrect pronunciation, which causes difficulties in their social interactions. If hearing-impaired people could receive continuous feedback on their pronunciation and phonation through lip-reading training, they could communicate more effectively with people without hearing disabilities, anytime and anywhere, without the use of sign language. In this study, the mouth area is detected from videos of learners speaking monosyllabic words. The grayscale information of the detected mouth area is used to estimate a velocity vector using Optical Flow. This information is then quantified as feature values to classify vowels. Subsequently, a system is proposed that classifies monosyllables by algebraic computation of geometric feature values of lips using the characteristics of articulatory phonation. Additionally, the system provides feedback by evaluating the comparison between the information which is obtained from the sample categories and experimental results.

Keywords

References

  1. A.M. Liberman and I.G. Mattingly, "The Motor Theory of Speech Perception Revised," Cognition, Vol. 21, pp. 1-36, 1985. https://doi.org/10.1016/0010-0277(85)90021-6
  2. S.H. Kim, Y.H. Lee, J.H. Hwang, M. Oh, M.G. Yee, N.H. Yee, et. al, Survey on the Actual Condition of Disabled Person at 2014, Korea Institute for Health and Social Affairs, Policy Report, 2014.
  3. P. Jourlin, J. Luettin, D. Genoud, and H. Wassner, "Acoustic-labial Speaker Verification," Pattern Recognition Letters, Vol. 18, pp. 853-858, 1997. https://doi.org/10.1016/S0167-8655(97)00070-6
  4. S.H. Cho and D.W. Yoon, "Pronunciation Learning System Development for Hearing-impaired Person Using Computer Vision," Proceeding of Korean Institute of Information Scientists and Engineers, Korea Software Congress 2016, pp. 174-176, 2016.
  5. H.J. Park, “Auditory and Language Training Service Model and Serious Game Contents Design for the Hearing-impaired,” Journal of Digital Contents Society, Vol. 12, No. 4, pp. 467-474, 2011. https://doi.org/10.9728/dcs.2011.12.4.467
  6. P. Liu and Z. Wang, "Voice Activity Detection Using Visual Information," Proceeding of International Conference on Acoustics, Speech and Signal Processing, pp. 609-612, 2004.
  7. Y.K. Kim, J.G. Lim, S.W. Lee, and M.H. Kim, "Feature Generations Analysis of Lip Images Streams for Isolate Words Recognition," International Journal of Multimedia and Ubiquitous Engineering, Vol. 10, No. 10, pp. 337-346, 2016. https://doi.org/10.14257/ijmue.2015.10.10.33
  8. Speech Mirror, http://www.speechmirror.com (accessed Mar., 08, 2018).
  9. B.J. Kim, W.T. Choe, H.M. Jung, U.J. Choe, and G.I. Jeon, "Design of Lip-reading Education App, Development Framework for the Hearing-Impaired Person," Proceeding of Korean Institute of Information Scientists and Engineers, Korea Software Congress 2015, pp. 1786-1788, 2015.
  10. H.Y. Jung, D.M. Jeong, J.H. Lee, and B.G. Kim, “Development of Smart Mirror System for Hearing Deaf’s Pronunciation Training,” Journal of Digital Contents Society, Vol. 18, No. 2, pp. 267-274, 2017. https://doi.org/10.9728/dcs.2017.18.2.267
  11. P. Viola and M.J. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518, 2001.
  12. M.A. Lee and G.S. Park, “Detection Method of Human Face, Facial Components and Rotation Angle Using Color Value and Partial Template,” J ournal of Korea Information Processing Society, Vol. 10-B, No. 4, pp. 465- 472, 2003.
  13. B. Sujatha and T. Santhanam, "A Novel Approach Integrating Geometric and Gabor Wavelet Approaches to Improvise Visual Lip-reading," International J ournal of Software Computing, Vol. 5, pp. 13-18, 2010. https://doi.org/10.3923/ijscomp.2010.13.18
  14. J.S. Yang and K.S. Hong, "A Study on Lip Reading Based on Geometric Facial Feature," Proceeding of the Institute of Electronics Engineers of Korea Conference, pp. 491-493, 2016.
  15. C.G. Lee, E.S. Lee, S.T. Jung, and S.S. Lee, “Design and Implementation of a Real-Time Lipreading System Using PCA and HMM,” Journal of Korea Multimedia Society, Vol. 7, No. 11, pp. 1597-1609, 2004.
  16. Y.J. Lee, S.M. Lim, Y.J. Choi, and B.H. Moon, "A Mobile App(See and Speech) of Correcting Pronunciation for Hearing-impaired Persons," Journal of Korean Association of Computer Education, Vol. 18, No. 4. pp. 11-18, 2015.
  17. S.Y. Kim, Y.R. Kim, D.I. Lee, and S.H. Lee, "Korean Pronunciation Learning Solution for the Hearing-impaired," Proceeding of Korean Institute of Information Scientists and Engineers, Korea Software Congress, pp. 746-748, 2016.
  18. Y.D. Lee, C.S. Choi, and K.S. Choi, "Lip Shape Synthesis of the Korean Syllable for Human Interface," Journal of Korean Institute of Communications and Information Sciences Society, Vol. 19, No. 4, pp. 614-623, 1994.
  19. K.S. Kong and C.H. Kim, "Coarticulation Model of Hangul Visual Speech for Lip Animation," Journal of Korean Institute of Information Scientists and Engineers, Vol. 26, No. 9, pp. 1031-1041, 1999.
  20. B. Suvdaa and J.P. Ko, “A Comparison of Distance Metric Learning Methods for Face Recognition,” Journal of Korea Multimedia Society, Vol. 14, No. 6, pp. 711-718, 2011. https://doi.org/10.9717/kmms.2011.14.6.711
  21. B. Lucas and T. Kanade, "An Iterative Image Registration Technique with an Application to Stereo Vision," Proceeding of the 7th International Joint Conference on Artificial Intelligence, pp. 674-679, 1981.