DOI QR코드

DOI QR Code

Optimal Thresholds from Mixture Distributions

혼합분포에서 최적분류점

  • Hong, Chong-Sun (Department of Statistics, Sungkyunkwan University) ;
  • Joo, Jae-Seon (Statistics and Panel Center, Korean Women's Development Institute) ;
  • Choi, Jin-Soo (Research Institute of Applied Statistics, Sungkyunkwan University)
  • 홍종선 (성균관대학교 경제학부) ;
  • 주재선 (한국여성정책연구원 통계패널센터) ;
  • 최진수 (성균관대학교 응용통계연구소)
  • Received : 20090900
  • Accepted : 20091100
  • Published : 2010.02.28

Abstract

Assuming a mixture distribution for credit evaluation studies, we discuss estimating threshold methods to minimize errors that default borrowers are predicted as non defaults or non defaults are regarded as defaults. A method by using statistical hypotheses tests, the most powerful test and generalized likelihood ratio test, for the probability density functions which are defined with the score random variable and the parameter space consisted of only two elements such as the default and non default states is proposed to estimate a threshold. And anther optimal thresholds to maximize classification accuracy measures of the accuracy and the true rate for ROC and CAP curves are estimated as equations related with these probability density functions. Three kinds of optimal thresholds in terms of the hypotheses testing, the accuracy and the true rate are obtained from normal random samples with various means and variances. The sums of the type I and type II errors corresponding to each optimal threshold are obtained and compared. Finally we discuss about their efficiency and derive conclusions.

혼합분포를 가정한 신용평가연구에서 부도차주를 정상으로 예측하거나 정상차주를 부도로 예측하는 오류를 최소화하는 분류점을 추정하는 방법을 토론한다. 확률변수 스코어와 정상과 부도상태의 모수공간으로 정의된 확률밀도함수들에 대하여 강력검정과 일반화가능도비검정을 이용하여 최적분류점의 추정방법을 제안하고, ROC와 CAP 곡선에서 분류정확도를 측정하는 정확도(accuarcy)와 진실율(true rate)을 이용하여 이 측도를 최대로 하는 최적분류점을 확률밀도함수의 관계식으로 추정하는 방법을 제안한다. 다양한 정규분포에서 가설검정, 정확도 그러고 진실율을 이용하는 세가지 방법의 최적분류점을 구하고 각최적분류점에 대응하는 제 I 종과 제 II 종 오류합의 크기를 비교하여 효율성을 토론한다.

Keywords

References

  1. 홍종선, 최진수 (2009). ROC와 CAP 곡선에서의 최적 분류점, <응용통계연구>, 22, 911-922. https://doi.org/10.5351/KJAS.2009.22.5.911
  2. Berry, M. J. A. and Linoff, G. (1999). Data Mining Techniques: For Marketing, Sales, and Customer Support, Morgan Kaufmann Publishers.
  3. Drummond, C. and Holte, R. C. (2006). Cost curves: An improved method for visualizing classifier performance, Machine Learning, 65, 95-130. https://doi.org/10.1007/s10994-006-8199-5
  4. Engelmann, B., Hayden, E. and Tasche, D. (2003). Measuring the discriminative power of rating systems, Discussion paper, Series 2: Banking and Financial Supervision.
  5. Fawcett, T. (2003). ROC Graphs: Notes and Practical Considerations for Data Mining Researchers, HP Laboratories, 1501 page Mill Road, Palo Alto, CA 94304.
  6. Hanley, A. and McNeil, B. (1982). The meaning and use of the area under a receiver operating characteristics curve, Diagnostic Radiology, 143, 29-36.
  7. Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classifcaiton and Prediction, University Press, Oxford.
  8. Provost, F. and Fawcett, T. (1997). Analysis and visualization of classifier performance comparison under imprecise class and cost distributions, In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo park, CA, 43-48.
  9. Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments, Machine Learning, 42, 203-231. https://doi.org/10.1023/A:1007601015854
  10. Sobehart, J. R. and Keenan, S. C. (2001). Measuring default accurately, credit risk special report, Risk, 14, 31-33.
  11. Sobehart, J. R., Keenan, S. C. and Stein, R. M. (2000). Benchmarking quantitative default risk models: A validation methodology, Moodys Investors Service.
  12. Stein, R. M. (2005). The relationship between default prediction and lending profits: Integrating ROC analysis and loan pricing, Journal of Banking and Finance, 29, 1213-1236. https://doi.org/10.1016/j.jbankfin.2004.04.008
  13. Swets, J. A. (1988). Measuring the accuracy of diagnostic systems, American Association for the Advancement of Science, 240, 1285-1293. https://doi.org/10.1126/science.3287615
  14. Tasche, D. (2006). Validation of internal rating systems and PD estimates, arXiv.org, eprint arXiv: physics/0606071.
  15. Tasche, D. (2009). Estimating discriminatory power and PD curves when the number of defaults is small, arXiv.org, eprint arXiv:0905.3928v1.
  16. Vuk, M. and Curk, T. (2006). ROC curve, lift chart and calibration plot, Metodoloki Zvezki, 3, 89-108.
  17. Zou, K. H. (2002). Receiver Operating Characteristic Literature Research, On-line bibliography available from: http://www.spl.harvard.edu/pages/ppl/zou/roc.html.

Cited by

  1. ROC Function Estimation vol.24, pp.6, 2011, https://doi.org/10.5351/KJAS.2011.24.6.987
  2. Two optimal threshold criteria for ROC analysis vol.26, pp.1, 2015, https://doi.org/10.7465/jkdi.2015.26.1.255
  3. VUS and HUM Represented with Mann-Whitney Statistic vol.22, pp.3, 2015, https://doi.org/10.5351/CSAM.2015.22.3.223
  4. Proposition of polytomous discrimination index and test statistics vol.27, pp.2, 2016, https://doi.org/10.7465/jkdi.2016.27.2.337
  5. Optimal thresholds criteria for ROC surfaces vol.24, pp.6, 2013, https://doi.org/10.7465/jkdi.2013.24.6.1489
  6. Alternative Optimal Threshold Criteria: MFR vol.27, pp.5, 2014, https://doi.org/10.5351/KJAS.2014.27.5.773
  7. Test Statistics for Volume under the ROC Surface and Hypervolume under the ROC Manifold vol.22, pp.4, 2015, https://doi.org/10.5351/CSAM.2015.22.4.377
  8. Parameter estimation for the imbalanced credit scoring data using AUC maximization vol.29, pp.2, 2016, https://doi.org/10.5351/KJAS.2016.29.2.309
  9. Standard Criterion of VUS for ROC Surface vol.26, pp.6, 2013, https://doi.org/10.5351/KJAS.2013.26.6.977
  10. Alternative accuracy for multiple ROC analysis vol.25, pp.6, 2014, https://doi.org/10.7465/jkdi.2014.25.6.1521
  11. Parameter estimation of linear function using VUS and HUM maximization vol.26, pp.6, 2015, https://doi.org/10.7465/jkdi.2015.26.6.1305
  12. Bivariate ROC Curve vol.19, pp.2, 2012, https://doi.org/10.5351/CKSS.2012.19.2.277