망막 이미지에서의 질병 진단: 교차 데이터셋 연구

Disease Diagnosis on Fundus Images: A Cross-Dataset Study

  • ;
  • ;
  • 추현승 (성균관대학교 전자전기컴퓨터공학과)
  • Van-Nguyen Pham (Dept. of Electrical and Computer Engineering, Sungkyunkwan University) ;
  • Sun Xiaoying (Dept. of Electrical and Computer Engineering, Sungkyunkwan University) ;
  • Hyunseung Choo (Dept. of Electrical and Computer Engineering, Sungkyunkwan University)
  • 발행 : 2024.10.31


This paper presents a comparative study of five deep learning models-ResNet50, DenseNet121, Vision Transformer (ViT), Swin Transformer (SwinT), and CoatNet-on the task of multi-label classification of fundus images for ocular diseases. The models were trained on the Ocular Disease Recognition (ODIR) dataset and validated on the Retinal Fundus Multi-disease Image Dataset (RFMiD), with a focus on five disease classes: diabetic retinopathy, glaucoma, cataract, age-related macular degeneration, and myopia. The performance was evaluated using the area under the receiver operating characteristic curve (AUC-ROC) score for each class. CoatNet achieved the best AUC-ROC scores for diabetic retinopathy, glaucoma, cataract, and myopia, while ViT outperformed CoatNet for age-related macular degeneration. Overall, CoatNet exhibited the highest average performance across all classes, highlighting the effectiveness of hybrid architectures in medical image classification. These findings suggest that CoatNet may be a promising model for multi-label classification of fundus images in cross-dataset scenarios.



This work was supported in part by the BK21 FOUR Project (50%) and the Korea government (MSIT), IITP, Korea, under the ICT Creative Consilience program (RS-2020-II201821, 25%), Development of Brain Disease (Stroke) (RS-2024-00459512, 25%).


  1. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.
  2. Huang, Gao, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. "Densely connected convolutional networks." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708. 2017.
  3. Dosovitskiy, Alexey. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).
  4. Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. "Swin transformer: Hierarchical vision transformer using shifted windows." In Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012-10022. 2021.
  5. Dai, Zihang, Hanxiao Liu, Quoc V. Le, and Mingxing Tan. "Coatnet: Marrying convolution and attention for all data sizes." Advances in neural information processing systems 34 (2021): 3965-3977.