DOI QR코드

DOI QR Code

흉부 X-선 영상을 이용한 Vision transformer 기반 폐렴 진단 모델의 성능 평가

Performance Evaluation of Vision Transformer-based Pneumonia Detection Model using Chest X-ray Images

  • 장준용 (건양대학교 방사선학과) ;
  • 최용은 (건양대학교 의과학과) ;
  • 이승완 (건양대학교 방사선학과)
  • Junyong Chang (Department of Radiological Science, Konyang University) ;
  • Youngeun Choi (Department of Medical Science, Konyang University) ;
  • Seungwan Lee (Department of Radiological Science, Konyang University)
  • 투고 : 2024.09.27
  • 심사 : 2024.10.31
  • 발행 : 2024.10.31

초록

Convolutional neural network(CNN), recurrent neural network(RNN)와 같은 다양한 인공 신경망이 연구되고 있으며, 타 인공지능 기반 모델의 기초 구조로 활용되고 있다. 그 중, 트랜스포머를 기반으로 하는 인공 신경망은 자연어 처리 분야에서 그 성능이 입증되었고, 활발하게 연구되고 있는 구조이다. 최근 트랜스포머 기반 인공 신경망의 내부구조 변경을 통해 영상처리가 가능한 Vision transformer(ViT) 모델이 개발되었다. 비젼 영상처리에 있어 ViT 모델의 정확도와 성능은 다양한 연구를 통해 입증되었다. 본 연구에서는 흉부 X-선 영상을 이용하여 폐렴을 진단할 수 있는 ViT 기반 모델을 개발하고, 개발 모델의 학습효율 및 성능을 정량적으로 평가하였다. ViT 기반 모델의 구조는 encoder block의 개수를 다르게 하여 설계하였고, 신경망 학습 시 패치의 크기를 다르게 설정하였다. 또한 개발한 ViT 기반 모델을 검증하기 위하여 기존 CNN 기반 모델인 VGGNet, GoogLeNet 및 ResNet 모델과 성능 비교를 수행하였다. 연구결과 ViT 기반 모델의 학습효율 및 성능은 encoder block의 개수 및 학습 패치 크기에 따라 변화함을 확인하였고 F1 score가 최소 0.875, 최대 0.919로 측정되었다. 32 × 32 크기의 패치를 이용하여 학습한 ViT 기반 모델의 학습효율은 기존 CNN 기반 모델에 비해 우수한 것으로 확인되었으며, 본 연구에서 설계한 모든 ViT 기반 모델이 VGGNet 보다 폐렴 진단의 정확도가 높은 결과를 확인하였다. 결론적으로 본 연구에서 개발한 ViT 기반 모델은 흉부 X-선 영상을 이용한 폐렴 진단에 잠재적으로 사용될 수 있으며, 본 연구를 통해 ViT 기반 모델의 임상적 활용가능성을 향상시킬 수 있을 것이다.

The various structures of artificial neural networks, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have been extensively studied and served as the backbone of numerous models. Among these, a transformer architecture has demonstrated its potential for natural language processing and become a subject of in-depth research. Currently, the techniques can be adapted for image processing through the modifications of its internal structure, leading to the development of Vision transformer (ViT) models. The ViTs have shown high accuracy and performance with large data-sets. This study aims to develop a ViT-based model for detecting pneumonia using chest X-ray images and quantitatively evaluate its performance. The various architectures of the ViT-based model were constructed by varying the number of encoder blocks, and different patch sizes were applied for network training. Also, the performance of the ViT-based model was compared to the CNN-based models, such as VGGNet, GoogLeNet, and ResNet. The results showed that the traninig efficiency and accuracy of the ViT-based model depended on the number of encoder blocks and the patch size, and the F1 scores of the ViT-based model ranged from 0.875 to 0.919. The training effeciency of the ViT-based model with a large patch size was superior to the CNN-based models, and the pneumonia detection accuracy of the ViT-based model was higher than that of the VGGNet. In conclusion, the ViT-based model can be potentially used for pneumonia detection using chest X-ray images, and the clinical availability of the ViT-based model would be improved by this study.

키워드

과제정보

본 연구는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업입니다. (과제번호: RS-2023-00211810)

참고문헌

  1. S. Yu, K. Ma, Q. Bi, C. Bian, M. Ning, N. He, Y. Li, H. Liu, Y. Zheng, "MIL-VT:Multiple Instance Learning Enhanced Vision Transformer for Fundus Image Classification", Medical Image Computing and Computer Assisted Intervention, Vol. 12908, pp. 45-54, 2021. https://doi.org/10.1007/978-3-030-87237-3_5 
  2. S. Bhojanapalli, A. Chakrabarti, D. Glasner, D. Li, T. Unterthiner, A. Veit, "Understanding Robustness of Transformers for Image Classification", IEEE/CVF International Conference on Computer Vision, pp. 10231-10241, 2021. https://doi.org/10.48550/arXiv.2103.14586 
  3. K. Simonyan, A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition", 2014. https://doi.org/10.48550/arXiv.1409.1556 
  4. T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, S. Khudanpur, "Recurrent neural network based language model", Annual Conference of the International Speech Communication Association, pp. 1045-1048, 2010. 
  5. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, I. Polosukhin, "Attention Is All You Need", Advances in Neural Information Processing Systems, 2017. https://doi.org/10.48550/arXiv.1706.03762 
  6. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", International Conference on Learning Representation, 2021. https://doi.org/10.48550/arXiv.2010.11929 
  7. A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lucic, C. Schmid, "ViViT: A Video Vision Transformer", IEEE/CVF International Conference on Computer Vision, pp. 6836-6846, 2021. https://doi.org/10.48550/arXiv.2103.15691 
  8. Q. Wang, B. Li, T. Xiao, J. Zhu, C. Li, D. Wong, L. Chao, "Learning Deep Transformer Models for Machine Translation", Advances in Neural Information Processing Systems, pp. 2797-2806, 2017. https://doi.org/10.48550/arXiv.1706.03762 
  9. K. Irie, A. Zeyer, R. Schluter, H. Ney, "Language Modeling with Deep Transformers", Computation and Language, Annual Conference of the International Speech Communication Association, 2019. https://doi.org/10.48550/arXiv.1905.04226 
  10. Hashmi MF, Katiyar S, Keskar AG, Bokde ND, Geem ZW, "Efficient Pneumonia Detection in Chest Xray Images Using Deep Transfer Learning", Diagnostics, Vol. 10, No. 6, pp. 417, 2020. https://doi.org/10.3390/diagnostics10060417 
  11. M. Elemraid, M. Muller, D. Spencer, S. Rushton, R. Gorton, M. Thomas, K. Eastham, F. Hampton, A. Gennery, J. Clark, "Accuracy of the Interpretation of Chest Radiographs for the Diagnosis of Paediatric Pneumonia", PLoS ONE, Vol. 9, No. 8, 2014. https://doi.org/10.1371/journal.pone.0106051 
  12. S. Park, G. Kim, Y. Oh, J. B. Seo, S. M. Lee, J. H. Kim, S. Moon, J. K. Lim, C. M. Park, J. C. Ye, "Self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation", Nature Communications, Vol. 13, pp. 3848, 2022. https://doi.org/10.1038/s41467-022-31514-x 
  13. T. Wang, Z. Nie, R. Wang, Q. Xu, H. Huang, H. Xu, F. Xie, X.-J. Liu, "PneuNet: deep learning for COVID-19 pneumonia diagnosis on chest X-ray image analysis using Vision Transformer", Medical & Biological Engineering & Computing, Vol. 61, pp. 1395, 2023. https://doi.org/10.1007/s11517-022-02746-2 
  14. S. Singh, M. Kumar, A. Kumar, B. K. Verma, K. Abhishek, S. Selvarajan, "Efficient pneumonia detection using Vision Transformers on chest X-rays", Scientific Reports, Vol. 14, pp. 2487, 2024. https://doi.org/10.1038/s41598-024-52703-2 
  15. C. Chen, Q. Fan, "CrossViT:Cross-attention Multi-Scale Vision Transformer for Image Classification", IEEE/CVF International Conference on Computer Vision, pp. 357-366, 2021. https://doi.org/10.48550/arXiv.2103.14899 
  16. G. Okolo, S. Katsigiannis, N. Ramzan, "IEVIT: An enhanced vision transformer architecture for chest X-ray image classification", Computer Methods and Programs in Biomedicine, Vol. 226, 2022. https://doi.org/10.1016/j.cmpb.2022.107141 
  17. H. Fang, J. Lee, N. Moosavi, I. Gurevych, "Transformers with Learnable Activation Functions", 2022. https://doi.org/10.48550/arXiv.2208.14111 
  18. T. Li, F. Zhang, G. Xie, X. Fan, Y. Gao, M. Sun, "A high speed reconfigurable architecture for softmax and GELU in vision transformer", Electronics Letters, Vol. 59, No. 5, 2023. https://doi.org/10.1049/ell2.12751 
  19. X. Mao, G. Qi, Y. Chen, X. Li, R. Duan, S. Ye, Y. He, H. Xue, "Towards Robust Vision Transformer", IEEE/CVF International Conference on Computer Vision and Pattern Recognition, pp. 12042-12051, 2022. https://doi.org/10.48550/arXiv.2105.07926 
  20. Y. Bazi, L. Bashmal, M. Rahhal, R. Dayil, N. Ajlan, "Vision Transformers for Remote Sensing Image Classification", Remote Sensing, Vol. 13, No. 3, pp. 516, 2021. https://doi.org/10.3390/rs13030516 
  21. K. Simonyan, A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition", The International Conference on Learning Representations, 2015. https://doi.org/10.48550/arXiv.1409.1556 
  22. K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition", IEEE/CVF International Conference on Computer Vision and Pattern Recognition, 2016. https://doi.org/10.48550/arXiv.1512.03385 
  23. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, "Going Deeper with Convolutions", IEEE/CVF International Conference on Computer Vision and Pattern Recognition, 2015. https://doi.org/10.48550/arXiv.1409.4842 
  24. Z. Vujovic, "Classification Model Evaluation Metrics", International Journal of Advanced Computer Science and Applications, Vol. 12, No. 6, 2021. https://doi.org/10.14569/IJACSA.2021.0120670 
  25. S. Hong, G. Lee, W. Jang, S. Kim, "Improving Sample Quality of Diffusion Models Using Self-Attention Guidance", IEEE/CVF International Conference on Computer Vision, 2023. https://doi.org/10.48550/arXiv.2210.00939 
  26. Z. Chen, Y. Duan, W. Wang, J. He, T. Lu, J. Dai, Y. Qiao, "Vision Transformer Adapter for Dense Predictions", The International Conference on Learning Representations, 2023. https://doi.org/10.48550/arXiv.2205.08534