과제정보
본 연구는 2023년 정부(교육부)의 재원으로 한국연구재단 기초연구사업의 지원을 받아 수행된 연구임(No. NRF-2021R1F1A1049202).
참고문헌
- N. C. Camgoz, S. Hadfield, O. Koller, H. Ney, and R. Bowden, "Neural sign language translation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- N. C. Camgoz, O. Koller, S. Hadfield, and R. Bowden, "Sign language transformers: Joint end-to-end sign language recognition and translation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
- K. Yin, and R. Jesse, "Better sign language translation with STMC-transformer," arXiv preprint arXiv:2004.00588, 2020.
- H. Zhou, W. Zhou, W. Qi, J. Pu, and H. Li, "Improving sign language translation with monolingual data by sign back-translation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
- Y. Chen, F. Wei, X. Sun, Z. Wu, and S. Lin, "A simple multi-modality transfer learning baseline for sign language translation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Y. Chen, R. Zuo, F. Wei, Y. Wu, S. Liu, and B. Mak, "Two-stream network for sign language recognition and translation," Advances in Neural Information Processing Systems, Vol.35, pp.17043-17056, 2022.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.
- H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan et al., "Cvt: Introducing convolutions to vision transformers," Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
- A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks," Proceedings of the 23rd International Conference on Machine Learning, 2006.
- S. Xie, C. Sun, J. Huang, Z. Tu, and K. Murphy, "Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification," Proceedings of the European Conference on Computer Vision (ECCV), 2018.
- W. Kay et al., "The kinetics human action video dataset," arXiv preprint arXiv:1705.06950, 2017.
- D. Li, C. R. Opazo, X. Yu, and H. Li, "Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison," Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020.
- Y. Liu et al., "Multilingual denoising pre-training for neural machine translation," Transactions of the Association for Computational Linguistics, Vol.8, pp.726-742, 2020. https://doi.org/10.1162/tacl_a_00343
- K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, "Bleu: a method for automatic evaluation of machine translation," Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002.
- Y. Wang et al., "Internvideo: General video foundation models via generative and discriminative learning," arXiv preprint arXiv:2212.03191, 2022.
- A. J. Piergiovanni, W. Kuo, and A. Angelova, "Rethinking video vits: Sparse video tubes for joint image and video learning," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- G. Bertasius, H. Wang, and L. Torresani, "Is space-time attention all you need for video understanding?," ICML, Vol.2, No.3, 2021.
- A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lucic, and C. Schmid, "Vivit: A video vision transformer," Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
- M. Lewis et al., "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension," arXiv preprint arXiv:1910.13461, 2019.
- J. Guo et al., "Cmt: Convolutional neural networks meet vision transformers," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009.