Acknowledgement
이 논문은 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (No. 2020R1F1A1068080).
References
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," in Proc. Conf. Neural Inf. Process. Syst., pp. 5998-6008, Dec. 2017. doi: https://doi.org/10.48550/arXiv.1706.03762
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An image is worth 16x16 words: Transformers for image recognition at scale," in Proc. Int. Conf. Learn. Represent., May 2021. doi: https://doi.org/10.48550/arXiv.2010.11929
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin transformer: Hierarchical vision transformer using shifted windows," in Proc. IEEE Int. Conf. Comput. Vis., pp. 10012-10022, Oct. 2021. doi: https://doi.org/10.1109/iccv48922.2021.00986
- X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, L. Yuan, D. Chen, and B. Guo, "CSWin transformer: A general vision transformer backbone with cross-shaped windows," 2021, arXiv:2107.00652. [Online]. Available: https://arxiv.org/abs/2107.00652 doi: https://doi.org/10.48550/arXiv.2107.00652
- J. Yang, C. Li, P. Zhang, X. Dai, B. Xiao, L. Yuan, and J. Gao, "Focal self-attention for local-global interactions in vision transformers," 2021, arXiv:2107.00641. [Online]. Available: https://arxiv.org/abs/2107.00641 doi: https://doi.org/10.48550/arXiv.2107.00641
- X. Chu, Z.Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia, and C. Shen, "Twins: Revisiting the design of spatial attention in vision transformers," in Proc. Conf. Neural Inf. Process. Syst., pp. 9355-9366, Dec. 2021. doi: https://doi.org/10.48550/arXiv.2104.13840
- J. Deng, W. Dong, R. Socher, LJ. Li, K. Li, and Li Fei-Fei, "ImageNet: A large-scale hierarchical image database," in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., pp. 248-255, Jun. 2009. doi: https://doi.org/10.1109/cvpr.2009.5206848
- J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., pp. 7132-7141, Jun. 2018. doi: https://doi.org/10.1109/cvpr.2018.00745
- R. Muller, S. Kornblith, and G. E. Hinton, "When does label smoothing help?," in Proc. Conf. Neural Inf. Process. Syst., pp. 4696-4705, Dec. 2019. doi: https://doi.org/10.48550/arXiv.1906.02629
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S.Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, "PyTorch: An imperative style, high-performance deep learning library," in Proc. Conf. Neural Inf. Process. Syst., pp. 8024-8035, Dec. 2019. doi: https://doi.org/10.48550/arXiv.1912.01703
- J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer normalization," 2016, arXiv:1607.06450. [Online]. Available: https://arxiv.org/abs/1607.06450 doi: https://doi.org/10.48550/arXiv.1607.06450
- I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," 2017, arXiv:1711.05101. [Online]. Available: https://arxiv.org/abs/1711.05101 doi: https://doi.org/10.48550/arXiv.1711.05101