DOI QR코드

DOI QR Code

Review of Statistical Methods for Evaluating the Performance of Survival or Other Time-to-Event Prediction Models (from Conventional to Deep Learning Approaches)

  • Seo Young Park (Department of Statistics and Data Science, Korea National Open University) ;
  • Ji Eun Park (Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center) ;
  • Hyungjin Kim (Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital) ;
  • Seong Ho Park (Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center)
  • 투고 : 2021.03.20
  • 심사 : 2021.05.17
  • 발행 : 2021.10.01

초록

The recent introduction of various high-dimensional modeling methods, such as radiomics and deep learning, has created a much greater diversity in modeling approaches for survival prediction (or, more generally, time-to-event prediction). The newness of the recent modeling approaches and unfamiliarity with the model outputs may confuse some researchers and practitioners about the evaluation of the performance of such models. Methodological literacy to critically appraise the performance evaluation of the models and, ideally, the ability to conduct such an evaluation would be needed for those who want to develop models or apply them in practice. This article intends to provide intuitive, conceptual, and practical explanations of the statistical methods for evaluating the performance of survival prediction models with minimal usage of mathematical descriptions. It covers from conventional to deep learning methods, and emphasis has been placed on recent modeling approaches. This review article includes straightforward explanations of C indices (Harrell's C index, etc.), time-dependent receiver operating characteristic curve analysis, calibration plot, other methods for evaluating the calibration performance, and Brier score.

키워드

참고문헌

  1. Wang P, Li Y, Reddy CK. Machine learning for survival analysis: a survey. ACM Comput Surv 2019;51:1-36  https://doi.org/10.1145/3214306
  2. Park SH, Han K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 2018;286:800-809  https://doi.org/10.1148/radiol.2017171920
  3. Park HJ, Park B, Lee SS. Radiomics and deep learning: hepatic applications. Korean J Radiol 2020;21:387-401  https://doi.org/10.3348/kjr.2019.0752
  4. Park JE, Kickingereder P, Kim HS. Radiomics and deep learning from research to clinical workflow: neuro-oncologic imaging. Korean J Radiol 2020;21:1126-1137  https://doi.org/10.3348/kjr.2019.0847
  5. Do S, Song KD, Chung JW. Basics of deep learning: a radiologist's guide to understanding published radiology articles on deep learning. Korean J Radiol 2020;21:33-41  https://doi.org/10.3348/kjr.2019.0312
  6. Lee G, Park H, Bak SH, Lee HY. Radiomics in lung cancer from basic to advanced: current status and future directions. Korean J Radiol 2020;21:159-171  https://doi.org/10.3348/kjr.2019.0630
  7. Lee SH, Park H, Ko ES. Radiomics in breast imaging from techniques to clinical applications: a review. Korean J Radiol 2020;21:779-792  https://doi.org/10.3348/kjr.2019.0855
  8. Punt CJ, Buyse M, Kohne CH, Hohenberger P, Labianca R, Schmoll HJ, et al. Endpoints in adjuvant treatment trials: a systematic review of the literature in colon cancer and proposed definitions for future trials. J Natl Cancer Inst 2007;99:998-1003  https://doi.org/10.1093/jnci/djm024
  9. Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br J Cancer 2003;89:232-238  https://doi.org/10.1038/sj.bjc.6601118
  10. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 2018;18:24 
  11. Tibshirani R. The lasso method for variable selection in the Cox model. Stat Med 1997;16:385-395  https://doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
  12. Park JE, Kim HS, Jo Y, Yoo RE, Choi SH, Nam SJ, et al. Radiomics prognostication model in glioblastoma using diffusion- and perfusion-weighted MRI. Sci Rep 2020;10:4250 
  13. Han K, Song K, Choi BW. How to develop, validate, and compare clinical prediction models involving radiological parameters: study design and statistical methods. Korean J Radiol 2016;17:339-350  https://doi.org/10.3348/kjr.2016.17.3.339
  14. Kim DW, Lee SS, Kim SO, Kim JH, Kim HJ, Byun JH, et al. Estimating recurrence after upfront surgery in patients with resectable pancreatic ductal adenocarcinoma by using pancreatic CT: development and validation of a risk score. Radiology 2020;296:541-551  https://doi.org/10.1148/radiol.2020200281
  15. Gensheimer MF, Narasimhan B. A scalable discrete-time survival model for neural networks. PeerJ 2019;7:e6257 
  16. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat 2008;2:841-860 
  17. Kim H, Goo JM, Lee KH, Kim YT, Park CM. Preoperative CT-based deep learning model for predicting disease-free survival in patients with lung adenocarcinomas. Radiology 2020;296:216-224  https://doi.org/10.1148/radiol.2020192764
  18. Uno H, Cai T, Pencina MJ, D'Agostino RB, Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 2011;30:1105-1117  https://doi.org/10.1002/sim.4154
  19. Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA 1982;247:2543-2546  https://doi.org/10.1001/jama.1982.03320430047030
  20. Brentnall AR, Cuzick J. Use of the concordance index for predictors of censored survival data. Stat Methods Med Res 2018;27:2359-2373  https://doi.org/10.1177/0962280216680245
  21. Pencina MJ, D'Agostino RB Sr. Evaluating discrimination of risk prediction models: the C statistic. JAMA 2015;314:1063-1064  https://doi.org/10.1001/jama.2015.11082
  22. Park SH, Choi J, Byeon JS. Key principles of clinical validation, device approval, and insurance coverage decisions of artificial intelligence. Korean J Radiol 2021;22:442-453  https://doi.org/10.3348/kjr.2021.0048
  23. Park SH, Goo JM, Jo CH. Receiver operating characteristic (ROC) curve: practical review for radiologists. Korean J Radiol 2004;5:11-18  https://doi.org/10.3348/kjr.2004.5.1.11
  24. Kamarudin AN, Cox T, Kolamunnage-Dona R. Time-dependent ROC curve analysis in medical research: current methods and applications. BMC Med Res Methodol 2017;17:53 
  25. Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics 2005;61:92-105  https://doi.org/10.1111/j.0006-341X.2005.030814.x
  26. Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-387  https://doi.org/10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4
  27. Potapov S, Adler W, Schmid M. Package 'survAUC'. Cran. r-project.org Web site. https://cran.r-project.org/web/packages/survAUC/survAUC.pdf. Accessed April 29, 2021 
  28. Heagerty PJ, Saha-Chaudhuri P. Package 'risksetROC'. Cran. r-project.org Web site. https://cran.r-project.org/web/packages/risksetROC/risksetROC.pdf. Accessed April 29, 2021 
  29. Steyerberg EW. Evaluation of performance. In: Steyerberg EW, ed. Clinical prediction models: a practical approach to development, validation, and updating. New York: Springer-Verlag New York, 2010 
  30. Crowson CS, Atkinson EJ, Therneau TM. Assessing calibration of prognostic risk scores. Stat Methods Med Res 2016;25:1692-1706  https://doi.org/10.1177/0962280213497434
  31. Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW. Calibration: the Achilles heel of predictive analytics. BMC Med 2019;17:230 
  32. Kuhn AM, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, et al. Package 'caret'. Cran.r-project.org Web site. https://cran.r-project.org/web/packages/caret/caret.pdf. Accessed April 29, 2021 
  33. Frank E Harrell Jr. Package 'rms'. Cran.r-project.org Web site. https://cran.r-project.org/web/packages/rms/rms.pdf. Accessed April 29, 2021 
  34. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 2010;21:128-138  https://doi.org/10.1097/EDE.0b013e3181c30fb2
  35. Ji GW, Zhu FP, Xu Q, Wang K, Wu MY, Tang WW, et al. Radiomic features at contrast-enhanced CT predict recurrence in early stage hepatocellular carcinoma: a multi-institutional study. Radiology 2020;294:568-579  https://doi.org/10.1148/radiol.2020191470
  36. Kickingereder P, Neuberger U, Bonekamp D, Piechotta PL, Gotz M, Wick A, et al. Radiomic subtyping improves disease stratification beyond key molecular, clinical, and standard imaging characteristics in patients with glioblastoma. Neuro Oncol 2018;20:848-857  https://doi.org/10.1093/neuonc/nox188
  37. Gerds TA. Package 'pec'. Cran.r-project.org Web site. https://cran.r-project.org/web/packages/pec/pec.pdf. Accessed April 29, 2021 
  38. Austin PC, Pencinca MJ, Steyerberg EW. Predictive accuracy of novel risk factors and markers: a simulation study of the sensitivity of different performance measures for the Cox proportional hazards regression model. Stat Methods Med Res 2017;26:1053-1077  https://doi.org/10.1177/0962280214567141
  39. Rahman MS, Ambler G, Choodari-Oskooei B, Omar RZ. Review and evaluation of performance measures for survival prediction models in external validation settings. BMC Med Res Methodol 2017;17:60 
  40. Royston P, Sauerbrei W. A new measure of prognostic separation in survival data. Stat Med 2004;23:723-748  https://doi.org/10.1002/sim.1621
  41. O'Quigley J, Xu R, Stare J. Explained randomness in proportional hazards models. Stat Med 2005;24:479-489  https://doi.org/10.1002/sim.1946
  42. Kent JT, O'Quigley J. Measures of dependence for censored survival data. Biometrika 1988;75:525-534  https://doi.org/10.1093/biomet/75.3.525
  43. Chu SG. Comparison of measures evaluating performance for a new factor in survival data. Riss.kr Web site. http://www.riss.kr/link?id=T14004195&outLink=K. Accessed April 29, 2021 
  44. Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med 2000;19:1141-1164  https://doi.org/10.1002/(SICI)1097-0258(20000515)19:9<1141::AID-SIM479>3.0.CO;2-F
  45. Bae S, Choi YS, Ahn SS, Chang JH, Kang SG, Kim EH, et al. Radiomic MRI phenotyping of glioblastoma: improving survival prediction. Radiology 2018;289:797-806  https://doi.org/10.1148/radiol.2018180200
  46. Kang L, Chen W, Petrick NA, Gallas BD. Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach. Stat Med 2015;34:685-703  https://doi.org/10.1002/sim.6370
  47. Park C, Kim JH, Kim PH, Kim SY, Gwon DI, Chu HH, et al. Imaging predictors of survival in patients with single small hepatocellular carcinoma treated with transarterial chemoembolization. Korean J Radiol 2021;22:213-224 https://doi.org/10.3348/kjr.2020.0325