References
- KIM, Youngmin, et al. Keypoint based sign language translation without glosses. arXiv preprint arXiv:2204.10511, 2022.
- CAO, Zhe, et al. Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. p. 7291-7299.
- BAHDANAU, Dzmitry; CHO, Kyunghyun; BENGIO, Yoshua. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- LU, Kevin, et al. Frozen pretrained transformers as universal computation engines. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2022. p. 7628-7636.
- KIPF, Thomas N.; WELLING, Max. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- SIMONYAN, Karen; ZISSERMAN, Andrew. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, 2014, 27.
- DONAHUE, Jeffrey, et al. Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 2625-2634.
- CAMGOZ, Necati Cihan, et al. Neural sign language translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 7784-7793.
- OTHMAN, Achraf; JEMNI, Mohamed. English-asl gloss parallel corpus 2012: Aslg-pc12. In: sign-lang@ LREC 2012. European Language Resources Association (ELRA), 2012. p. 151-154.
- KO, Sang-Ki, et al. Neural sign language translation based on human keypoint estimation. Applied sciences, 2019, 9.13: 2683.
- YANG, Seunghan, et al. The Korean sign language dataset for action recognition. In: International conference on multimedia modeling. Cham: Springer International Publishing, 2019. p. 532-542.
- DOSOVITSKIY, Alexey, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.