과제정보
이 논문은 2024년 정부의 재원으로 수행된 연구 결과임.
참고문헌
- T. Afouras, J. S. Chung, A. Senior, O. Vinyals, A. Zisserman, "Deep audio-visual speech recognition," IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 44, Issue. 12, pp. 8717-8727, 2018.
- S. Petridis, M. Pantic, "Deep complementary bottleneck features for visual speech recognition," ICASSP, pp. 2304-2308, 2016.
- M. Wand, J. Koutn, J. Schmidhuber, "Lipreading with long short-term memory," ICASSP, pp. 6115-6119, 2016.
- S. O. Kim, K. H. Lee, "Design & implementation of speechreading system using the face feature on the korean 8 vowels," Korea Society of Computer and Information Winter Conference pp. 135-140, 2008.
- M. A. Lee, "A lip-reading algorithm using optical flow and properties of articulatory phonation," Journal of Korea Multimedia Society, Vol. 21, No. 7, pp. 745-754, 2018. https://doi.org/10.9717/KMMS.2018.21.7.745
- J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia, S. Zafeiriou, "RetinaFace: Single-shot multi-level dense localisation in the wild," CVPR, pp. 5203-5212, 2020.
- A. Graves, S. Fernandez, F. Gomez, J. Schmidhuber, "Connectionist temporal classification: Labeling unsegmented sequence data with recurrent neural networks," ICML, pp. 369-376, 2006.
- T. Stafylakis, G. Tzimiropoulos, "Combining residual networks with LSTMs for lipreading," Interspeech, pp. 3652-3656, 2017.
- K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition," CVPR, pp. 770-778, 2016.
- J. S. Chung, A. Zisserman, "Lip reading in the wild," ACCV, pp. 87-103, 2016.
- J. S. Chung, A. Senior, O. Vinyals, A. Zisserman, "Lip reading sentences in the wild," CVPR, 2017.
- J. S. Chung, A. Zisserman, "Lip reading in profile," BMVC, pp. 155.1-155.11, 2017.
- T. Afouras, J. S. Chung, A. Zisserman, "LRS3-TED: a large-scale dataset for visual speech recognition," In arXiv preprint arXiv:1809.00496, 2018.
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, "SSD: Single shot multibox detector," ECCV, pp. 21-37, 2016.
- J. Yuan, M. Liberman, "Speaker identification on the scotus corpus," Journal of the Acoustical Society of America, Vol. 123, No. 5, pp. 3878-3882, 2008.
- J. S. Chung, A. Zisserman, "Out of time: automated lip sync in the wild," In Workshop on Multi-view Lip-reading, ACCV, pp. 251-263, 2016.
- T. Stafylakis, G. Tzimiropoulos, "Combining residual networks with lstms for lipreading," Interspeech, 2017.
- Y. M. Assael, B. Shillingford, S. Whiteson, N. Freitas, "Lipnet: Sentence-level lipreading," arXiv: 1611.01559, 2016.
- B. Shillingford, Y. Assael, M. W. Hoffman, T. Paine, C. Hughes, U. Prabhu, H. Liao, H. Sak, K. Rao, L. Bennett, M. Mulville, B. Coppin, B. Laurie, A. Senior, N. Freitas, "Large-scale visual speech recognition," Interspeech, pp. 4134-4139, 2019.
- X. Zhang, F. Cheng, S. Wang, "Spatio-temporal fusion based convolutional sequence learning for lip reading," ICCV, pp. 713-722, 2019.
- A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, R. Pang, "Conformer: Convolution-augmented transformer for speech recognition," Interspeech, pp. 5036-5040, 2020.
- P. Ma, S. Petridis, M. Pantic, "End-to-end autiovisual speech recognition with conformers," ICASSP, pp. 7613-7617, 2021.
- H. Dinkel, S. Wang, X. Xu, M. Wu, K. Yu, "Voice activity detection in the wild: a data-driven approach using teacher-student training," IEEE/ACM Trans. on Audio, Speech and Language Processing, Vol. 29, pp. 1542-1555, 2021. https://doi.org/10.1109/TASLP.2021.3073596
- M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, M. Sonderegger, "Montreal Forced Aligner: Trainable text-speech alignment using Kaldi," Interspeech, pp. 498-502, 2017.
- S. S. Yoon, T. Y. Chun, D.-J. Jung, H. S. Song, "A study on data preprocessing for lip-reading of national defense data," KIMST Annual Conference, pp. 367-368, 2022.