DOI QR코드

DOI QR Code

Index of union and other accuracy measures

Index of Union와 다른 정확도 측도들

  • Received : 2020.05.06
  • Accepted : 2020.06.13
  • Published : 2020.08.31

Abstract

Most classification accuracy measures for optimal threshold are divided into two types: one is expressed with cumulative distribution functions and probability density functions, the other is based on ROC curve and AUC. Unal (2017) proposed the index of union (IU) as an accuracy measure that considers two types to get them. In this study, ten kinds of accuracy measures (including IU) are divided into six categories, and the advantages of the IU are studied by comparing the measures belonging to each category. The optimal thresholds of these measures are obtained by setting various normal mixture distributions; subsequently, the first and second type of errors as well as the error sums corresponding to each threshold are calculated. The properties and characteristics of the IU statistic are explored by comparing the discriminative power of other accuracy measures based on error values.The values of the first type error and error sum of IU statistic converge to those of the best accuracy measures of the second category as the mean difference between the two distributions increases. Therefore, IU could be an accuracy measure to evaluate the discriminant power of a model.

최적분류점에 대한 대부분의 정확도 측도들은 두 종류의 누적분포함수와 확률밀도함수를 기반으로 정의하거나 또는 ROC 곡선과 AUC를 기반으로 정의하는 방법으로 구분하는데, Unal (2017)은 두 가지 방법을 혼합하여 누적분포함수와 AUC를 모두 고려하는 정확도 측도 Index of Union (IU) 통계량을 제안하였다. 본 연구에서는 IU 통계량을 포함한 열 개의 정확도 측도들을 여섯 종류의 범주로 구분하여 각 범주에 속하는 측도들을 비교하면서 IU의 장점을 연구한다. 다양한 정규혼합분포를 설정하여 각각의 측도들에 대응하는 최적분류점들을 구하고 각 분류점에 대응하는 제1종과 제2종 오류 그리고 두 종류의 오류합을 구해서 오류들의 크기를 비교하면서 분류정확도 측도들의 판별력을 비교하면서 IU의 성격과 특징을 탐색한다. 두 종류 분포들의 평균 차이가 증가할수록 IU 통계량의 제1종 오류와 오류합의 크기가 최고의 분류정확도를 갖는 제2범주의 정확도 측도의 오류에 수렴하는 것을 발견하였다. 그러므로 IU는 모형의 판별력을 평가하는 정확도 측도로 활용할 수 있다.

Keywords

References

  1. Brasil, P. (2010). Diagnostic test accuracy evaluation for medical professionals, Package DiagnosisMed in R.
  2. Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognitions, 30, 1145-1159. https://doi.org/10.1016/S0031-3203(96)00142-2
  3. Cantor, S. B., Sun, C. C., Tortolero-Luna, G., Richards-Kortum, R., and Follen, M. (1999). A comparison of C/B ratios from studies using receiver operating characteristic curve analysis, Journal of Clinical Epidemiology, 52, 885-892. https://doi.org/10.1016/S0895-4356(99)00075-X
  4. Centor, R. N. (1991). Signal detectability: The use of ROC curves and their analyses, Medical Decision Making, 11, 102-106. https://doi.org/10.1177/0272989X9101100205
  5. Connell, F. A. and Koepsell, T. D. (1985). Measures of gain in certainty from a diagnostic test, American Journal of Epidemiology, 121, 744-753. https://doi.org/10.1093/aje/121.5.744
  6. Egan, J. P. (1975). Signal detection theory and ROC analysis, Academic Press, New York.
  7. Engelmann, B., Hayden, E., and Tasche, D. (2003). Testing rating accuracy, Risk, 16, 82-86.
  8. Fawcett, T. (2006). An introduction to ROC analysis, Pattern Recognition Letters, 27, 861-874. https://doi.org/10.1016/j.patrec.2005.10.010
  9. Fawcett, T. and Provost, F. (1997). Adaptive fraud detection, Data Mining and Knowledge Discovery, 1, 291-316. https://doi.org/10.1023/A:1009700419189
  10. Freeman, E. A. and Moisen, G. G. (2008). A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa, Ecological Modelling, 217, 48-58. https://doi.org/10.1016/j.ecolmodel.2008.05.015
  11. Greiner, M. M. and Gardner, I. A. (2000). Epidemiologic issues in the validation of veterinary diagnostic tests, Preventive Veterinary Medicine, 45, 3-22. https://doi.org/10.1016/S0167-5877(00)00114-8
  12. Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 143, 29-36. https://doi.org/10.1148/radiology.143.1.7063747
  13. Hong, C. S. and Lee S. J. (2018). TROC curve and accuracy measures, Journal of the Korean Data & Information Science Society, 29, 861-872. https://doi.org/10.7465/jkdi.2018.29.4.861
  14. Hong, C. S., Joo, J. S., and Choi, J. S. (2010). Optimal thresholds from mixture distributions, The Korean Journal of Applied Statistics, 23(1), 13-28. https://doi.org/10.5351/KJAS.2010.23.1.013
  15. Hong, C. S., Lin, M. H., Hong, S. W., and Kim, G. C. (2011). Classification accuracy measures with minimum error rate for normal mixture, Journal of the Korean Data & Information Science Society, 22, 619-630.
  16. Hsieh, F. and Turnbull, B. W. (1996). Nonparametric and semiparametric estimation of the receiver operating characteristic curve, The Annals of Statistics, 24, 25-40. https://doi.org/10.1214/aos/1033066197
  17. Krzanowski, W. J. and Hand, D. J. (2009). ROC Curves for Continuous Data, Chapman & Hall/CRC, Boca Raton.
  18. Lambert, J. and Lipkovich, I. (2008). A macro for getting more out of your ROC curve, SAS Global Forum, 231.
  19. Liu, C., White, M., and Newell1, G. (2009). Measuring the accuracy of species distribution models: a review. In Proceedings 18th World IMACs/MODSIM Congress, 4241, 4247.
  20. Metz, C. E. and Kronman H. B. (1980). Statistical significance tests for binormal ROC curves, Journal of Mathematical Psychology, 22, 218-243. https://doi.org/10.1016/0022-2496(80)90020-6
  21. Moses, L. E., Shapiro, D., and Littenberg, B. (1993). Combining independent studies of a diagnostic test into a summary ROC curve: Data-analytic approaches and some additional considerations, Statistics in Medicine, 12, 1293-1316. https://doi.org/10.1002/sim.4780121403
  22. Pepe, M. S. (2000). Receiver operating characteristic methodology, Journal of the American Statistical Association, 95, 308-311. https://doi.org/10.1080/01621459.2000.10473930
  23. Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classication and Prediction, Oxford University Press, Oxford.
  24. Perkins, N. J. and Schisterman, E. F. (2006). The inconsistency of "optimal" cutpoints obtained using two criteria based on the receiver operating characteristic curve, American Journal of Epidemiology, 163, 670-675. https://doi.org/10.1093/aje/kwj063
  25. Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments, Machine Learning, 42, 203-231. https://doi.org/10.1023/A:1007601015854
  26. Spackman, K. A. (1989). Signal detection theory: valuable tools for evaluating inductive learning, The Analytics of Risk Model Validation, San Mateo, 160-163.
  27. Tasche, D. (2006). Validation of internal rating systems and PD estimates, The Analytics of Risk Model Validation, 169-196.
  28. Unal, I. (2017). Defining an optimal cut-point value in ROC analysis: an alternative approach, Computational & Mathematical Methods in Medicine, 2017, 1-14. https://doi.org/10.1155/2017/3762651
  29. Vuk, M. and Curk, T. (2006). ROC curve, lift chart and calibration plot, Metodoloski Zvezki, 3, 89-108.
  30. Yoo, H. S. and Hong, C. S. (2011). Optimal criterion of classification accuracy measures for normal mixture, Communications for Statistical Applications and Methods, 18, 343-355. https://doi.org/10.5351/CKSS.2011.18.3.343
  31. Youden, W. J. (1950). Index for rating diagnostic test, Cancer, 3, 32-35. https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3
  32. Zweig, M. and Campbell, G. (1993). Receiver-operating characteristics (ROC) plots: A fundamental evaluation tool in clinical medicine, Clinical Chemistry, 39, 561-577. https://doi.org/10.1093/clinchem/39.4.561