DOI QR코드

DOI QR Code

ROC Curve Fitting with Normal Mixtures

정규혼합분포를 이용한 ROC 분석

  • Hong, Chong-Sun (Department of Statistics, Sungkyunkwan University) ;
  • Lee, Won-Yong (Research Institute of Applied Statistics, Sungkyunkwan University)
  • Received : 20101200
  • Accepted : 20110200
  • Published : 2011.04.30

Abstract

There are many researches that have considered the distribution functions and appropriate covariates corresponding to the scores in order to improve the accuracy of a diagnostic test, including the ROC curve that is represented with the relations of the sensitivity and the specificity. The ROC analysis was used by the regression model including some covariates under the assumptions that its distribution function is known or estimable. In this work, we consider a general situation that both the distribution function and the elects of covariates are unknown. For the ROC analysis, the mixtures of normal distributions are used to estimate the distribution function fitted to the credit evaluation data that is consisted of the score random variable and two sub-populations of parameters. The AUC measure is explored to compare with the nonparametric and empirical ROC curve. We conclude that the method using normal mixtures is fitted to the classical one better than other methods.

스코어 변수의 민감도와 특이도와의 관계로 표현한 ROC 곡선을 더욱 정확한 진단을 위하여 분포함수와 공변량을 고려한 연구가 많이 진행되었다. 공변량을 고려하는 회귀분석 방법을 사용하였으며 이때 분포함수를 정규분포로 가정하거나 잔차의 분포함수를 추정하여 ROC 분석을 하였다. 본 연구는 분포함수가 주어지지 않으며 진단에 영향을 주는 공변량을 모르는 일반적인 상황에서 논의하였다. 확률변수인 스코어와 두 개의 보모집단으로 구성된 신용평가 자료에 적합한 분포함수를 추정하기 위하여 여러 개의 정규분포가 혼합된 정규혼합분포를 사용하여 ROC 분석을 한다. 고전적인 비모수적이고 경험적인 ROC 곡선에 적합한지를 파악하기 위하여 AUC 통계량을 사용하여 비교하며, 본 연구에서 제안한 정규혼합분포를 이용한 ROC 곡선이 다른 방법으로 구한 ROC 곡선보다 적합함을 보였다.

Keywords

References

  1. 홍종선, 주재선, 최진수 (2010). 혼합분포에서의 최적분류점, <응용통계연구>, 23, 13-28.
  2. 홍종선, 최진수 (2009). ROC와 CAP 곡선에서의 최적분류점, <응용통계연구>, 22, 911-921.
  3. Drummond, C. and Holte, R. C. (2006). Cost curves: An improved method for visualizing classifier performance, Machine Learning, 65, 95-130. https://doi.org/10.1007/s10994-006-8199-5
  4. Engelmann, B., Hayden, E. and Tasche, D. (2003). Measuring the discriminative power of rating systems, Discussion Paper, Series 2: Banking and Financial Supervision.
  5. Fawcett, T. (2003). ROC Graphs: Notes and practical considerations for data mining researchers, Technical Report HPL-2003-4, HP Laboratories, 1-28.
  6. Gatsonis, C. A., Begg, C. B. and Wieand, S. A. (1995). Introduction to advances in statistical methods for diagnostic radiology: A symposium, Academic Radiology, 2, S1-3. https://doi.org/10.1016/S1076-6332(05)80239-9
  7. Hanley, A. and McNeil, B. (1982). The meaning and use of the area under a receiver operating characteristics curve, Diagnostic Radiology, 143, 29-36.
  8. McCullagh, P. and Nelder, J. A. (1983). Quasi-likelihood functions, Annals of Statistics, 11, 59-67. https://doi.org/10.1214/aos/1176346056
  9. Pepe, M. S. (1998). Three approaches to regression analysis of receiver operating characteristic curves for continuous test results, Biometrics, 54, 124-135. https://doi.org/10.2307/2534001
  10. Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction, University Press, Oxford.
  11. Provost, F. and Fawcett, T. (1997). Analysis and visualization of classifier performance comparison under imprecise class and cost distributions, In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, 43-48.
  12. Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments, Machine Learning, 42, 203-231. https://doi.org/10.1023/A:1007601015854
  13. Sobehart, J. R. and Keenan, S. C. (2001). Measuring default accurately, credit risk special report, Risk, 14, 31-33.
  14. Stover, L., Gorga, M. P. and Neely, T. (1996). Towards optimizing the clinical utility of distortion product otoacoustic emission measurements, Journal of the Acoustical Society of America, 100, 956-967. https://doi.org/10.1121/1.416207
  15. Swets, J. A. (1988). Measuring the accuracy of diagnostic systems, American Association for the Advancement of Science, 240, 1285-1293. https://doi.org/10.1126/science.3287615
  16. Swets, J. A. and Pickett, R. M. (1982). Evaluation Diagnostic Systems, Methods from Signal Detection Theory, Academic Press, New York.
  17. Tasche, D. (2006). Validation of internal rating systems and PD estimates, On-line bibliography available from: http://arXiv:physics/0606071.
  18. Zou, K. H. (2002). Receiver operating characteristic literature research, On-line bibliography available from: http://www.spl.havard.edu/pages/ppl/zou/roc.html.

Cited by

  1. ROC Function Estimation vol.24, pp.6, 2011, https://doi.org/10.5351/KJAS.2011.24.6.987
  2. Alternative Optimal Threshold Criteria: MFR vol.27, pp.5, 2014, https://doi.org/10.5351/KJAS.2014.27.5.773