참고문헌
- A. Vaswani et al., "Attention is all you need," in Adv. Neural Inf. Process. Syst., 2017, vol.30, pp.5998-6008. DOI: 10.48550/arXiv.1706.03762
- A. Dosovitskiy et al., "An image is worth 16×16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020. DOI: 10.48550/arXiv.2010.11929
- Z. Liu et al., "Swin transformer: Hierarchical vision transformer using shifted windows," in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp.10012-10022. DOI: 10.1109/ICCV48922.2021.00986
- C. F. Chen, Q. Fan, and R. Panda, "CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification," in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp.10012-10022. DOI: 10.48550/arXiv.2103.14899
- R. Raksasat, S. Teerapittayanon, S. Itthipuripat, K. Praditpornsilpa, A. Petchlorlian, T. Chotibut, and I. Chatnuntawech, "Attentive pairwise interaction network for AI-assisted clock drawing test assessment of early visuospatial deficits," Sci. Rep., vol.13, no.1, p.18113, 2023. DOI: 10.1038/s41598-023-44723-1
- S. Chen et al., "Automatic dementia screening and scoring by applying deep learning on clock-drawing tests," Sci. Rep., vol.10, no.1, p.20854, 2020. DOI: 10.1038/s41598-020-74710-9
- J. Yao et al., "Extended Vision Transformer (ExViT) for Land Use and Land Cover Classification: A Multimodal Deep Learning Framework," IEEE Transactions on Geoscience and Remote Sensing, 2023. DOI: 10.1109/TGRS.2023.3284671
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014. DOI: 10.48550/arXiv.1409.1556
- K. He et al., "Deep residual learning for image recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp.770-778. DOI 10.1109/CVPR.2016.90
- H. Inoue, "Data augmentation by pairing samples for images classification," arXiv preprint arXiv:1801.02929, 2018. DOI: 10.48550/arXiv.1801.02929
- D. Yarats, I. Kostrikov, and R. Fergus, "Image augmentation is all you need: Regularizing deep reinforcement learning from pixels," in Int. Conf. Learn. Represent., 2021. DOI: 10.48550/arXiv.2004.13649
- H. Zhu, W. Ke, D. Li, J. Liu, L. Tian, and Y. Shan, "Dual cross-attention learning for fine-grained visual categorization and object re-identification," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp.4692-4702. DOI: 10.48550/arXiv.2205.02151
- X. Peng et al., "Optical Remote Sensing Image Change Detection Based on Attention Mechanism and Image Difference," IEEE Transactions on Geoscience and Remote Sensing, vol.59, no.9, pp.7426-7440, Sep. 2021. DOI: 10.1109/TGRS.2020.3033009
- S. Mehta and M. Rastegari, "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer," arXiv preprint arXiv:2110.02178, 2021. DOI: 10.48550/arXiv.2110.02178
- M. Dehghani et al., "Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution," arXiv preprint arXiv:2307.06304, 2023. DOI: 10.48550/arXiv.2307.06304
- K. Xu, P. Deng, and H. Huang, "Vision Transformer: An Excellent Teacher for Guiding Small Networks in Remote Sensing Image Scene Classification," IEEE Transactions on Geoscience and Remote Sensing, vol.60, pp.1-15, 2022. DOI: 10.1109/TGRS.2022.3152566
- T. Stegmuller, B. Bozorgtabar, A. Spahr, and J. P. Thiran, "Scorenet: Learning non-uniform attention and augmentation for transformer-based histopathological image classification," in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2023, pp.6170-6179.
- W. Wang et al., "Pyramid vision transformer: A versatile backbone for dense prediction without convolutions," in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 568-578. DOI: 10.1109/ICCV48922.2021.00061
- S. Amini et al., "An AI-assisted online tool for cognitive impairment detection using images from the clock drawing test," MedRxiv, 2021. DOI: 10.1101/2021.03.06.21253047
- Q. Chen, J. Fan, and W. Chen, "An improved image enhancement framework based on multiple attention mechanism," Displays, vol.70, pp.102091, 2021. DOI: 10.1016/j.displa.2021.102091