• Title/Summary/Keyword: 최적분류점

Search Result 120, Processing Time 0.023 seconds

Partial AUC and optimal thresholds (부분 AUC와 최적분류점들)

  • Hong, Chong Sun;Cho, Hyun Su
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.187-198
    • /
    • 2019
  • Extensive literature exists on how to estimate optimal thresholds based on various accuracy measures using receiver operating characteristic (ROC) and cumulative accuracy profile (CAP) curves. This paper now proposes an alternative measure to represented the specific partial area under the ROC and CAP curves. The relationship between ROC and CAP functions is examined using differential equations of the new defined partial area under curves. In addition, the relationship with the optimal thresholds under conditions of various accuracy measures for the ROC and CAP functions is also derived. We assume there are two kinds of distribution functions composing the mixed distribution as various normal distributions before finding the optimal thresholds. Corresponding type 1 and 2 errors are also explored and discussed under various conditions for accuracy measures.

Optimal threshold using the correlation coefficient for the confusion matrix (혼동행렬의 상관계수를 이용한 최적분류점)

  • Hong, Chong Sun;Oh, Se Hyeon;Choi, Ye Won
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.77-91
    • /
    • 2022
  • The optimal threshold estimation is considered in order to discriminate the mixture distribution in the fields of Biostatistics and credit evaluation. There exists well-known various accuracy measures that examine the discriminant power. Recently, Matthews correlation coefficient and the F1 statistic were studied to estimate optimal thresholds. In this study, we explore whether these accuracy measures are appropriate for the optimal threshold to discriminate the mixture distribution. It is found that some accuracy measures that depend on the sample size are not appropriate when two sample sizes are much different. Moreover, an alternative method for finding the optimal threshold is proposed using the correlation coefficient that defines the ratio of the confusion matrix, and the usefulness and utility of this method are also discusses.

Optimal Thresholds from Mixture Distributions (혼합분포에서 최적분류점)

  • Hong, Chong-Sun;Joo, Jae-Seon;Choi, Jin-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.1
    • /
    • pp.13-28
    • /
    • 2010
  • Assuming a mixture distribution for credit evaluation studies, we discuss estimating threshold methods to minimize errors that default borrowers are predicted as non defaults or non defaults are regarded as defaults. A method by using statistical hypotheses tests, the most powerful test and generalized likelihood ratio test, for the probability density functions which are defined with the score random variable and the parameter space consisted of only two elements such as the default and non default states is proposed to estimate a threshold. And anther optimal thresholds to maximize classification accuracy measures of the accuracy and the true rate for ROC and CAP curves are estimated as equations related with these probability density functions. Three kinds of optimal thresholds in terms of the hypotheses testing, the accuracy and the true rate are obtained from normal random samples with various means and variances. The sums of the type I and type II errors corresponding to each optimal threshold are obtained and compared. Finally we discuss about their efficiency and derive conclusions.

Optimal Thresholds from Non-Normal Mixture (비정규 혼합분포에서의 최적분류점)

  • Hong, Chong-Sun;Joo, Jae-Seon
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.943-953
    • /
    • 2010
  • From a mixture distribution of the score random variable for credit evaluation, there are many methods of estimating optimal thresholds. Most the research news is based on the assumption of normal distributions. In this paper, we extend non-normal distributions such as Weibull, Logistic and Gamma distributions to estimate an optimal threshold by using a hypotheses test method and other methods maximizing the total accuracy and the true rate. The type I and II errors are obtained and compared with their sums. Finally we discuss their e ciency and derive conclusions for non-normal distributions.

Odds curve and optimal threshold (오즈 곡선과 최적분류점)

  • Hong, Chong Sun;Oh, Tae Gyu;Oh, Se Hyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.807-822
    • /
    • 2021
  • Various accuracy measures that can be explained on the odds curve are discussed, and an alternative accuracy measure, the maximum square, is proposed based on the characteristics of the odds curve. Thresholds corresponding to these accuracy measures are obtained by considering various probability distribution functions and an illustrative example. Their characteristics are discussed while comparing many kinds of statistics measuring thresholds. Therefore, we can conclude that optimal thresholds could be explored from the odds curve, similar to the ROC curve, and that the maximum square measure can be used as a good accuracy measure that can improve the performance of the binary classification model.

Alternative Optimal Threshold Criteria: MFR (대안적인 분류기준: 오분류율곱)

  • Hong, Chong Sun;Kim, Hyomin Alex;Kim, Dong Kyu
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.5
    • /
    • pp.773-786
    • /
    • 2014
  • We propose the multiplication of false rates (MFR) which is a classification accuracy criteria and an area type of rectangle from ROC curve. Optimal threshold obtained using MFR is compared with other criteria in terms of classification performance. Their optimal thresholds for various distribution functions are also found; consequently, some properties and advantages of MFR are discussed by comparing FNR and FPR corresponding to optimal thresholds. Based on general cost function, cost ratios of optimal thresholds are computed using various classification criteria. The cost ratios for cost curves are observed so that the advantages of MFR are explored. Furthermore, the de nition of MFR is extended to multi-dimensional ROC analysis and the relations of classification criteria are also discussed.

AROC Curve and Optimal Threshold (AROC 곡선과 최적분류점)

  • Hong, Chong-Sun;Lee, Hee-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.1
    • /
    • pp.185-191
    • /
    • 2011
  • In the credit evaluation study with the assumption of mixture distributions, the ROC curve is a useful method to explore the discriminatory power of default and non-default borrowers. The AROC curve is an adjusted ROC curve that can be identified with the corresponding score and is mathematically analyzed in this work. We obtain patterns of this curve by applying normal distributions. Moreover, the relationship between the AROC curve and many classification accuracy statistics are explored to find the optimal threshold. In the case of equivalent variances of two distributions, we obtain that the local minimum of the AROC curve is estimated at the optimal threshold to maximize certain classification accuracies.

Optimal Threshold from ROC and CAP Curves (ROC와 CAP 곡선에서의 최적 분류점)

  • Hong, Chong-Sun;Choi, Jin-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.5
    • /
    • pp.911-921
    • /
    • 2009
  • Receiver Operating Characteristic(ROC) and Cumulative Accuracy Profile(CAP) curves are two methods used to assess the discriminatory power of different credit-rating approaches. The points of optimal classification accuracy on an ROC curve and of maximal profit on a CAP curve can be found by using iso-performance tangent lines, which are based on the standard notion of accuracy. In this paper, we offer an alternative accuracy measure called the true rate. Using this rate, one can obtain alternative optimal threshold points on both ROC and CAP curves. For most real populations of borrowers, the number of the defaults is much less than that of the non-defaults, and in such cases the true rate may be more efficient than the accuracy rate in terms of cost functions. Moreover, it is shown that both alternative scores of optimal classification accuracy and maximal profit are the identical, and this single score coincides with the score corresponding to Kolmogorov-Smirnov statistic used to test the homogeneous distribution functions of the defaults and non-defaults.

Index of union and other accuracy measures (Index of Union와 다른 정확도 측도들)

  • Hong, Chong Sun;Choi, So Yeon;Lim, Dong Hui
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.395-407
    • /
    • 2020
  • Most classification accuracy measures for optimal threshold are divided into two types: one is expressed with cumulative distribution functions and probability density functions, the other is based on ROC curve and AUC. Unal (2017) proposed the index of union (IU) as an accuracy measure that considers two types to get them. In this study, ten kinds of accuracy measures (including IU) are divided into six categories, and the advantages of the IU are studied by comparing the measures belonging to each category. The optimal thresholds of these measures are obtained by setting various normal mixture distributions; subsequently, the first and second type of errors as well as the error sums corresponding to each threshold are calculated. The properties and characteristics of the IU statistic are explored by comparing the discriminative power of other accuracy measures based on error values.The values of the first type error and error sum of IU statistic converge to those of the best accuracy measures of the second category as the mean difference between the two distributions increases. Therefore, IU could be an accuracy measure to evaluate the discriminant power of a model.

Extraction of Optimal Interest Points for Shape-based Image Classification (모양 기반 이미지 분류를 위한 최적의 우세점 추출)

  • 조성택;엄기현
    • Journal of KIISE:Databases
    • /
    • v.30 no.4
    • /
    • pp.362-371
    • /
    • 2003
  • In this paper, we propose an optimal interest point extraction method to support shape-base image classification and indexing for image database by applying a dynamic threshold that reflects the characteristics of the shape contour. The threshold is determined dynamically by comparing the contour length ratio of the original shape and the approximated polygon while the algorithm is running. Because our algorithm considers the characteristics of the shape contour, it can minimize the number of interest points. For n points of the contour, the proposed algorithm has O(nlogn) computational cost on an average to extract the number of m optimal interest points. Experiments were performed on the 70 synthetic shapes of 7 different contour types and 1100 fish shapes. It shows the average optimization ratio up to 0.92 and has 14% improvement, compared to the fixed threshold method. The shape features extracted from our proposed method can be used for shape-based image classification, indexing, and similarity search via normalization.