Korean Lip-Reading: Data Construction and Sentence-Level Lip-Reading

Sunyoung Cho;Soosung Yoon;

doi:10.9766/KIMST.2024.27.2.167

한국군사과학기술학회지 (Journal of the Korea Institute of Military Science and Technology)

제27권2호
/
Pages.167-176
/
2024
/
1598-9127(pISSN)
/
2636-0640(eISSN)

한국군사과학기술학회 (The Korea Institute of Military Science and Technology)

DOI QR Code

한국어 립리딩: 데이터 구축 및 문장수준 립리딩

Korean Lip-Reading: Data Construction and Sentence-Level Lip-Reading

조선영 (국방과학연구소 국방첨단과학기술연구원 인공지능자율기술센터) ;
윤수성 (국방과학연구소 국방첨단과학기술연구원 위성체계단)

Sunyoung Cho (AI & Autonomy Technology Center, Advanced Defense Science & Technology Research Institute, Agency for Defense Development) ;
Soosung Yoon (Defense Satellite Systems PMO, Advanced Defense Science & Technology Research Institute, Agency for Defense Development)

투고 : 2023.07.05
심사 : 2024.01.17
발행 : 2024.04.05

https://doi.org/10.9766/KIMST.2024.27.2.167 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Lip-reading is the task of inferring the speaker's utterance from silent video based on learning of lip movements. It is very challenging due to the inherent ambiguities present in the lip movement such as different characters that produce the same lip appearances. Recent advances in deep learning models such as Transformer and Temporal Convolutional Network have led to improve the performance of lip-reading. However, most previous works deal with English lip-reading which has limitations in directly applying to Korean lip-reading, and moreover, there is no a large scale Korean lip-reading dataset. In this paper, we introduce the first large-scale Korean lip-reading dataset with more than 120 k utterances collected from TV broadcasts containing news, documentary and drama. We also present a preprocessing method which uniformly extracts a facial region of interest and propose a transformer-based model based on grapheme unit for sentence-level Korean lip-reading. We demonstrate that our dataset and model are appropriate for Korean lip-reading through statistics of the dataset and experimental results.

키워드

과제정보

이 논문은 2024년 정부의 재원으로 수행된 연구 결과임.

참고문헌

T. Afouras, J. S. Chung, A. Senior, O. Vinyals, A. Zisserman, "Deep audio-visual speech recognition," IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 44, Issue. 12, pp. 8717-8727, 2018.
S. Petridis, M. Pantic, "Deep complementary bottleneck features for visual speech recognition," ICASSP, pp. 2304-2308, 2016.
M. Wand, J. Koutn, J. Schmidhuber, "Lipreading with long short-term memory," ICASSP, pp. 6115-6119, 2016.
S. O. Kim, K. H. Lee, "Design & implementation of speechreading system using the face feature on the korean 8 vowels," Korea Society of Computer and Information Winter Conference pp. 135-140, 2008.
M. A. Lee, "A lip-reading algorithm using optical flow and properties of articulatory phonation," Journal of Korea Multimedia Society, Vol. 21, No. 7, pp. 745-754, 2018. https://doi.org/10.9717/KMMS.2018.21.7.745
J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia, S. Zafeiriou, "RetinaFace: Single-shot multi-level dense localisation in the wild," CVPR, pp. 5203-5212, 2020.
A. Graves, S. Fernandez, F. Gomez, J. Schmidhuber, "Connectionist temporal classification: Labeling unsegmented sequence data with recurrent neural networks," ICML, pp. 369-376, 2006.
T. Stafylakis, G. Tzimiropoulos, "Combining residual networks with LSTMs for lipreading," Interspeech, pp. 3652-3656, 2017.
K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition," CVPR, pp. 770-778, 2016.
J. S. Chung, A. Zisserman, "Lip reading in the wild," ACCV, pp. 87-103, 2016.
J. S. Chung, A. Senior, O. Vinyals, A. Zisserman, "Lip reading sentences in the wild," CVPR, 2017.
J. S. Chung, A. Zisserman, "Lip reading in profile," BMVC, pp. 155.1-155.11, 2017.
T. Afouras, J. S. Chung, A. Zisserman, "LRS3-TED: a large-scale dataset for visual speech recognition," In arXiv preprint arXiv:1809.00496, 2018.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, "SSD: Single shot multibox detector," ECCV, pp. 21-37, 2016.
J. Yuan, M. Liberman, "Speaker identification on the scotus corpus," Journal of the Acoustical Society of America, Vol. 123, No. 5, pp. 3878-3882, 2008.
J. S. Chung, A. Zisserman, "Out of time: automated lip sync in the wild," In Workshop on Multi-view Lip-reading, ACCV, pp. 251-263, 2016.
T. Stafylakis, G. Tzimiropoulos, "Combining residual networks with lstms for lipreading," Interspeech, 2017.
Y. M. Assael, B. Shillingford, S. Whiteson, N. Freitas, "Lipnet: Sentence-level lipreading," arXiv: 1611.01559, 2016.
B. Shillingford, Y. Assael, M. W. Hoffman, T. Paine, C. Hughes, U. Prabhu, H. Liao, H. Sak, K. Rao, L. Bennett, M. Mulville, B. Coppin, B. Laurie, A. Senior, N. Freitas, "Large-scale visual speech recognition," Interspeech, pp. 4134-4139, 2019.
X. Zhang, F. Cheng, S. Wang, "Spatio-temporal fusion based convolutional sequence learning for lip reading," ICCV, pp. 713-722, 2019.
A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, R. Pang, "Conformer: Convolution-augmented transformer for speech recognition," Interspeech, pp. 5036-5040, 2020.
P. Ma, S. Petridis, M. Pantic, "End-to-end autiovisual speech recognition with conformers," ICASSP, pp. 7613-7617, 2021.
H. Dinkel, S. Wang, X. Xu, M. Wu, K. Yu, "Voice activity detection in the wild: a data-driven approach using teacher-student training," IEEE/ACM Trans. on Audio, Speech and Language Processing, Vol. 29, pp. 1542-1555, 2021. https://doi.org/10.1109/TASLP.2021.3073596
M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, M. Sonderegger, "Montreal Forced Aligner: Trainable text-speech alignment using Kaldi," Interspeech, pp. 498-502, 2017.
S. S. Yoon, T. Y. Chun, D.-J. Jung, H. S. Song, "A study on data preprocessing for lip-reading of national defense data," KIMST Annual Conference, pp. 367-368, 2022.

한국군사과학기술학회지 (Journal of the Korea Institute of Military Science and Technology)

한국어 립리딩: 데이터 구축 및 문장수준 립리딩

Korean Lip-Reading: Data Construction and Sentence-Level Lip-Reading

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)