DOI QR코드

DOI QR Code

Assessment and Analysis of Fidelity and Diversity for GAN-based Medical Image Generative Model

GAN 기반 의료영상 생성 모델에 대한 품질 및 다양성 평가 및 분석

  • Jang, Yoojin (Graduate School of Artificial Intelligence, UNIST) ;
  • Yoo, Jaejun (Graduate School of Artificial Intelligence, UNIST) ;
  • Hong, Helen (Dept. of Software Convergence, Seoul Women's University)
  • 장유진 (울산과학기술원 인공지능대학원) ;
  • 유재준 (울산과학기술원 인공지능대학원) ;
  • 홍헬렌 (서울여자대학교 소프트웨어융합학과)
  • Received : 2022.03.08
  • Accepted : 2022.05.23
  • Published : 2022.06.01

Abstract

Recently, various researches on medical image generation have been suggested, and it becomes crucial to accurately evaluate the quality and diversity of the generated medical images. For this purpose, the expert's visual turing test, feature distribution visualization, and quantitative evaluation through IS and FID are evaluated. However, there are few methods for quantitatively evaluating medical images in terms of fidelity and diversity. In this paper, images are generated by learning a chest CT dataset of non-small cell lung cancer patients through DCGAN and PGGAN generative models, and the performance of the two generative models are evaluated in terms of fidelity and diversity. The performance is quantitatively evaluated through IS and FID, which are one-dimensional score-based evaluation methods, and Precision and Recall, Improved Precision and Recall, which are two-dimensional score-based evaluation methods, and the characteristics and limitations of each evaluation method are also analyzed in medical imaging.

최근 의료영상의 발전에 따라 의료 영상 생성에 대한 다양한 연구가 제안되고 있는데, 이와 관련하여 생성된 의료 영상의 품질과 다양성을 정확하게 평가하는 것이 중요해지고 있다. 생성된 의료 영상을 평가하는 방법으로는 전문가의 시각적 튜링 테스트(visual turing test), 특징 분포 시각화, IS, FID를 통한 정량적 평가를 통해 평가하고 있으나 의료 영상을 품질(fidelity)과 다양성(diversity) 측면에서 정량적으로 평가 하는 방법은 거의 이루어지고 있지 않다. 본 논문에서는 DCGAN과 PGGAN 생성 모델을 통해 비소세포폐암 환자의 흉부 CT 데이터 셋을 학습하여 영상을 생성하고, 이를 품질(fidelity)과 다양성(diversity) 측면에서 두 생성 모델의 성능을 평가한다. 1차원 점수 기반 평가방법인 IS, FID와 2차원 점수 기반 평가방법인 Precision 및 Recall, 개선된 Precision 및 Recall을 통해 성능을 정량적으로 평가하고, 의료영상에서의 각 평가방법들의 특징과 한계점에 대해서도 분석한다.

Keywords

Acknowledgement

본 연구는 서울여자대학교 학술연구비의 지원에 의한 것임(2022-0167).

References

  1. D. P. Kingma and M. Welling, "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013.
  2. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial networks," 2014.
  3. M. Kim and H.-J. Bae, "Data augmentation techniques for deep learning based medical image analyses." Journal of the Korean Society of Radiology, vol. 81, no. 6, 2020.
  4. V. Sandfort, K. Yan, P. J. Pickhardt, and R. M. Summers," Data augmentation using generative adversarial networks(cyclegan) to improve generalizability in ct segmentation tasks," Scientific reports, vol. 9, no. 1, pp. 1-9, 2019. https://doi.org/10.1038/s41598-018-37186-2
  5. G.-P. Diller, J. Vahle, R. Radke, M. L. B. Vidal, A. J. Fischer, U. M. Bauer, S. Sarikouch, F. Berger, P. Beerbaum, H. Baumgartner, et al. , "Utility of deep learning networks for the generation of artificial cardiac magnetic resonance images in congenital heart disease," BMC Medical Imaging, vol. 20, no. 1, pp. 1-8, 2020. https://doi.org/10.1186/s12880-019-0399-0
  6. H. Y. Park, H.-J. Bae, G.-S. Hong, M. Kim, J. Yun, S. Park, W. J. Chung, and N. Kim, "Realistic high-resolution body computed tomography image synthesis by using progressive growing generative adversarial network: Visual turing test," JMIR Medical Informatics, vol. 9, no. 3, p. e23328, 2021. https://doi.org/10.2196/23328
  7. C. Han, H. Hayashi, L. Rundo, R. Araki, W. Shimoda, S. Mu-ramatsu, Y. Furukawa, G. Mauri, and H. Nakayama, "Gan-based synthetic brain mr image generation," in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI2018). IEEE, 2018, pp. 734-738.
  8. M. J. Chuquicusma, S. Hussein, J. Burt, and U. Bagci, "How to fool radiologists with generative adversarial networks? a visual turing test for lung cancer diagnosis," in 2018 IEEE 15th international symposium on biomedical imaging (ISBI2018). IEEE, 2018, pp. 240-244.
  9. C. Zheng, X. Xie, K. Zhou, B. Chen, J. Chen, H. Ye, W. Li, T. Qiao, S. Gao, J. Yang, et al., "Assessment of generative adversarial networks model for synthetic optical coherence tomography images of retinal disorders," Translational Vision Science & Technology, vol. 9, no. 2, pp. 29-29, 2020.
  10. H. Lee, H. Lee, H. Hong, H. Bae, J. S. Lim, and J. Kim, "Classification of focal liver lesions in ct images using convolutional neural networks with lesion information augmented patches and synthetic data augmentation," Medical physics, vol. 48, no. 9, pp. 5029-5046, 2021. https://doi.org/10.1002/mp.15118
  11. M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Gold-berger, and H. Greenspan, "Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification," Neurocomputing, vol. 321, pp. 321-331, 2018. https://doi.org/10.1016/j.neucom.2018.09.013
  12. C. Han, Y. Kitamura, A. Kudo, A. Ichinose, L. Rundo, Y. Furukawa, K. Umemoto, Y. Li, and H. Nakayama, "Synthesizing diverse lung nodules wherever massively: 3d multi-conditional gan-based ct image augmentation for object detection," in 2019 International Conference on 3D Vision(3DV). IEEE, 2019, pp. 729-737.
  13. A. Borji, "Pros and cons of gan evaluation measures," 2018.
  14. M. S. Sajjadi, O. Bachem, M. Lucic, O. Bousquet, and S. Gelly, "Assessing generative models via precision and recall," Advances in Neural Information Processing Systems, vol. 31, 2018.
  15. T. Koga, N. Nonaka, J. Sakuma, and J. Seita, "General-to-detailed gan for infrequent class medical images," arXiv preprint arXiv:1812.01690, 2018.
  16. Skandarani, Youssef, Pierre-Marc Jodoin, and Alain Lalande. "Gans for medical image synthesis: An empirical study." arXiv preprint arXiv:2105.05318 2021.
  17. A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," 2016.
  18. T. Karras, T. Aila, S. Laine, and J. Lehtinen, "Progressive growing of gans for improved quality, stability, and variation," in International Conference on Learning Representations, 2018.
  19. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, "Improved training of wasserstein gans," Advances in neural information processing systems, vol. 30, 2017.
  20. T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, "Improved techniques for training gans," 2016.
  21. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, "Gans trained by a two time-scale update rule converge to a local nash equilibrium," Advances in neural information processing systems, vol. 30, 2017.
  22. A. Borji, "Pros and cons of gan evaluation measures: New developments," 2021.
  23. T. Kynkaanniemi, T. Karras, S. Laine, J. Lehtinen, and T. Aila, "Improved precision and recall metric for assessing generative models."
  24. Aerts, H. J. W. L., Wee, L., Rios Velazquez, E., Leijenaar, R. T. H., Parmar, C., Grossmann, P., ... Lambin, P. (2019). Data From NSCLC-Radiomics [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/K9/TCIA.2015.PF0M9REI
  25. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818-2826.
  26. M. F. Naeem, S. J. Oh, Y. Uh, Y. Choi, and J. Yoo, "Reliable fidelity and diversity metrics for generative models," in International Conference on Machine Learning. PMLR, 2020, pp. 7176-7185.
  27. R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, "Imagenet-trained cnns are biased to-wards texture; increasing shape bias improves accuracy and robustness," in International Conference on Learning Representations, 2018.
  28. T. Karras, S. Laine, and T. Aila, "A style-based generator architecture for generative adversarial networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401-4410.
  29. T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, "Analyzing and improving the image quality of stylegan," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8110-8119.
  30. A. Brock, J. Donahue, and K. Simonyan, "Large scale gan training for high fidelity natural image synthesis," in International Conference on Learning Representations, 2018.