DOI QR코드

DOI QR Code

Avatar's Lip Synchronization in Talking Involved Virtual Reality

대화형 가상 현실에서 아바타의 립싱크

  • Received : 2020.02.12
  • Accepted : 2020.07.28
  • Published : 2020.09.01

Abstract

Having a virtual talking face along with a virtual body increases immersion in VR applications. As virtual reality (VR) techniques develop, various applications are increasing including multi-user social networking and education applications that involve talking avatars. Due to a lack of sensory information for full face and body motion capture in consumer-grade VR, most VR applications do not show a synced talking face and body. We propose a novel method, targeted for VR applications, for talking face synced with audio with an upper-body inverse kinematics. Our system presents a mirrored avatar of a user himself in single-user applications. We implement the mirroring in a single user environment and by visualizing a synced conversational partner in multi-user environment. We found that a realistic talking face avatar is more influential than an un-synced talking avatar or an invisible avatar.

가상 현실(VR)에서 사용자와 동일한 아바타를 시각화하는 것은 몰입도를 증가시킨다. 가상 현실 기술이 발달함에 따라 여러 사용자들의 가상 소셜 네트워킹과 말하는 아바타를 포함하는 다양한 어플리케이션이 늘어난다. 보급형 가상현실 기기에서 사용자의 얼굴 전체와 몸 동작 캡처를 위한 디바이스 정보가 부족하기 때문에 대부분의 가상현실 애플리케이션은 대화하는 얼굴과 몸을 나타내지 않는다. 우리는 오디오와 동기화되어 대화하는 가상 얼굴을 위해, 가상현실 어플리케이션을 대상으로 하는, 새로운 방법을 제안한다. 우리의 시스템은 단일 사용자 애플리케이션에서 사용자 자신과 동기화된 아바타를 실험한다. 단일 사용자 환경에서는 자신의 아바타를 미러링 하고 다중 사용자 환경에서 동기화된 대화 파트너를 시각화 하여 동기화된 대화 아바타의 구현한다. 우리는 사실적으로 대화하는 아바타가 동기화되지 않은 말하는 아바타나 보이지 않는 아바타보다 더 영향력이 있다는 것을 사용자 스터디로 검증한다.

Keywords

References

  1. K. Kilteni, R. Groten, and M. Slater. "The Sense of Embodiment in Virtual Reality". Presence: Teleoperators and Virtual Environments, 21(4):373-87, 2012. https://doi.org/10.1162/PRES_a_00124
  2. M. Slater and A. Steed. "A virtual presence counter. Presence: Teleoperators and Virtual Environments", 9(5):413-434, 2000.
  3. M. Parger, J. H. Mueller, D. Schmalstieg, and M. Steinberger, "Human upper-body inverse kinematics for increased embodiment in consumer-grade virtual reality", Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology-VRST '18, pp. 1-10, 2018.
  4. E. Catmull. A tutorial on compensation tables. In Computer Graphics, volume 13, pages 1-7. ACM SIGGRAPH, 1979. https://doi.org/10.1016/0097-8493(89)90029-0
  5. K. Olszewski, J-J. Lim, S. Saito, and H. Li, "High-Fidelity Facial and Speech Animation for VR HMDs", ACM Trans Graph, Vol.35, No. 6,2016
  6. K. Vougioukas, S. Petridis, and M. Pantic."Realistic Speech-Driven Facial Animation with GANs", Computer Vision and Pattern Recognition, pp. 1-16, 2019.
  7. T. Frank, M. Hoch, and G. Trogemann, "Automated lip-sync for 3d-character animation", 15th IMACS World Congress on Scientific Computation Modelling and Applied Mathematics, August 1997.
  8. L. N. Hoon, K. A. A. A. Rahman, and W. Y. Chai, "Framework development of real-time lip sync animation on viseme based human speech", Jurnal Teknologi, Vol. 75, No. 4, pp. 43-48, 2015.
  9. S. H. Park, S. H. JI, D. S. Ryu, and H.G. Ch, "A new cognition-based chat system for avatar agents in virtual space", Proceedings of the 7th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry Article No. 13-VRCAI '08, 2008.
  10. Y. LEE, D. Terzopoulos, and K. Waters, "Realistic Modeling for Facial Animation", In:Computer Graphics (SIGGRAPH Proc.), pp 55-62, 1995.
  11. H. C. Yehia, T. Kuratate, and E. Vatikiotis-Bateson, "Linking facial animation, head motion and speech acoustics", Journal of Phonetics, 2002.
  12. R. Kumar, J. Sotelo, K. Kumar, A. de-Brebisson, and Y. Bengio, "Obamanet: Photo-realistic lip-sync from text". In NIPS 2017 Workshop on Machine Learning for Creativity and Design, 2017.
  13. S. Suwajanakorn, S. M. Seitz, I. Kemelmacher-Shlizerman, "Synthesizing Obama: Learning Lip Sync from Audio", ACM Transactions on Graphics (TOG), Vol. 36, No. 4, July 2017.
  14. D. Roth, J. L. Lugrin, J. Buser, G. Bente, A. Fuhrmann, and Marc E. Latoschik, "A simplified inverse kinematic approach for embodied vr applications", In in Proceedings of the 23rd IEEE Virtual Reality (IEEE VR) conference, 2016.
  15. D. Roth, J. L. Lugrin, J. Buser, G. Bente, A. Fuhrmann, and Marc E. Latoschik, "A simplified inverse kinematic approach for embodied vr applications", In in Proceedings of the 23rd IEEE Virtual Reality (IEEE VR) conference, 2016.
  16. D. Medeiros, R. K. dos Anjos, D. Mendes, J. M. Pereira, A. Raposo, and J. Jorge, "Keep my head on my shoulders!: why third-person is bad for navigation in VR", Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology-VRST '18, pp. 1-11 , 2018.
  17. D. Bijl, H.-T. Henry, "Speech to Text Conversion". U.S. Patent. No. 6,173,259, 9 January 2001.
  18. M. A. Siegler, "Measuring and Compensating for the Effects of Speech Rate in Large Vocabulary Continuous Speech Recognition", Thesis, Carnegie Mellon University,1995
  19. J. Adorf, "Web Speech API", KTH Royal Institute of Technology, 2013.
  20. J. Busby, Z. Parrish, J. V. Eenwyk, " Mastering Unreal technology. Volume II: Advanced level design concepts with Unreal Engine 3 " in, Sams Publishing, 2010.
  21. B. G. Witmer, M. J. Singer, "Measuring presence in virtual environments: A presence questionnaire", Presence: Teleoperators and Virtual Environments, Vol. 7, No. 3, pp. 225-240, 1998.