DOI QR코드

DOI QR Code

Lip-reading System based on Bayesian Classifier

베이지안 분류를 이용한 립 리딩 시스템

  • Kim, Seong-Woo (Department of Computer & Communication Engineering, Daegu University) ;
  • Cha, Kyung-Ae (School of ICT Convergence, Daegu University) ;
  • Park, Se-Hyun (School of ICT Convergence, Daegu University)
  • Received : 2020.06.15
  • Accepted : 2020.08.03
  • Published : 2020.08.31

Abstract

Pronunciation recognition systems that use only video information and ignore voice information can be applied to various customized services. In this paper, we develop a system that applies a Bayesian classifier to distinguish Korean vowels via lip shapes in images. We extract feature vectors from the lip shapes of facial images and apply them to the designed machine learning model. Our experiments show that the system's recognition rate is 94% for the pronunciation of 'A', and the system's average recognition rate is approximately 84%, which is higher than that of the CNN tested for comparison. Our results show that our Bayesian classification method with feature values from lip region landmarks is efficient on a small training set. Therefore, it can be used for application development on limited hardware such as mobile devices.

음성 정보를 배제하고 영상 정보만을 이용한 발음 인식 시스템은 다양한 맞춤형 서비스에 적용될 수 있다. 본 논문에서는 베이지안 분류기를 기반으로 입술 모양을 인식하여 한글 모음을 구분하는 시스템을 개발한다. 얼굴 이미지의 입술 모양에서 특징 벡터를 추출하고 설계된 기계 학습모델을 적용하여 실험한 결과 'ㅏ' 발음의 경우 94%의 인식률을 보였으며, 평균 인식률은 약 84%를 나타내었다. 또한 비교군으로 실험한 CNN 환경에서의 인식률보다 높은 결과를 보였다. 이를 통해서 입술 영역의 랜드 마크로 설계된 특징 값을 사용하는 베이지안 분류 기법이 적은 수의 훈련 데이터에서 보다 효율적일 수 있음을 알 수 있다. 따라서 모바일 디바이스와 같은 제한적 하드웨어에서 응용 가능한 어플리케이션 개발에 활용할 수 있다.

Keywords

References

  1. Choi, J. H., Kim, J. B., Kim, D. G., and Rim, K. W. (2001). Bayesian Model for Probabilistic Unsupervised Learning,
  2. Cetingül, H. E., Erzin, E., Yemez, Y., and Tekalp, A. M. (2006). Multimodal Speaker/Speech Recognition using Lip Motion, Lip Texture and Audio, Signal Processing, 86(12), 3549-3558. https://doi.org/10.1016/j.sigpro.2006.02.045
  3. Chung, J. S., and Zisserman, A. (2016). Lip Reading in the Wild, Asian Conference on Computer Vision, Springer, Cham.
  4. Dlib C++ Library (2002). General Purpose Cross-platform Software Library, http://dlib.net/ (Accessed on Aug. 10th, 2020).
  5. Gyu, S. M., Pham, T. T., Kim, J. Y., and Taek, H. S. (2009). A Study on Lip Detection based on Eye Localization for Visual Speech Recognition in Mobile Environment, International Journal of Fuzzy Logic and Intelligent Systems, 19(4), 478-484.
  6. Hwang, W. (2017). Research Trends in Deep Learning Based Face Detection, Landmark Detection and Face Recognition, Broadcasting and Media Magazine, 22(4), 41-49.
  7. Kim, Y. K., Lee, H. S., and Kim, M. H. (2014). Lip Reading Method using Bool Matrix and SVM, Proceedings of 2014 Conference on Korea, HCI, pp. 179-182.
  8. Kim, Y. K., Lim, J. G., and Kim, M. H. (2016). Lip Reading Method using CNN for Utterance Period Detection, Journal of Digital Convergence, 14(8), 233-243. https://doi.org/10.14400/JDC.2016.14.8.233
  9. Kim, D., Choi, S., and Kwak, S. (2018), Deep Learning Based Fake Face Detection, Journal of the Korea Industrial Information Systems Research, 23(5), 9-17. https://doi.org/10.9723/jksiis.2018.23.5.009
  10. Kim, S., Cha, K., and Park, S. (2019). Recognition of Korean Vowels using Bayesian Classification with Mouth Shape, Journal of Korea Multimedia Society, 22(8), 852-859. https://doi.org/10.9717/KMMS.2019.22.8.852
  11. Lee, S., Lee, Y., Hong, H., Yun, B., and Han, M. (2002), Audio-visual Integration based Multi-modal Speech Recognition System, Proceedings of KIPS Fall Conference, 707-710.
  12. Lim, D. Y., Kim, S. G., and Chong, K. T. (2018). Development of a Real-time Lip Recognition for Improving English Pronunciation using Deep Learning, Journal of Institute of Control, Robotics and Systems, 24(4), 327-333. https://doi.org/10.5302/J.ICROS.2018.18.8003
  13. Oh, I. S. (2008). Pattern Recognition, Kyobobook.
  14. Viola, P., and Jones, M. (2001). Rapid Object Detection using a Boosted Cascade of Simple Features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001(1), 511-518.
  15. Xianoyi, Y. (2017). Lipreading Recognition of English Vowels using Convolutional Neural Network and Recurrent Neural Network, Master's Thesis, Chonbuk National University, Korea.

Cited by

  1. 마이크로스코프 이미지의 딥러닝 기반 이상검출 vol.26, pp.2, 2021, https://doi.org/10.9723/jksiis.2021.26.2.001