• Title/Summary/Keyword: Accuracy Statistics

Search Result 799, Processing Time 0.031 seconds

Odds curve and optimal threshold (오즈 곡선과 최적분류점)

  • Hong, Chong Sun;Oh, Tae Gyu;Oh, Se Hyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.807-822
    • /
    • 2021
  • Various accuracy measures that can be explained on the odds curve are discussed, and an alternative accuracy measure, the maximum square, is proposed based on the characteristics of the odds curve. Thresholds corresponding to these accuracy measures are obtained by considering various probability distribution functions and an illustrative example. Their characteristics are discussed while comparing many kinds of statistics measuring thresholds. Therefore, we can conclude that optimal thresholds could be explored from the odds curve, similar to the ROC curve, and that the maximum square measure can be used as a good accuracy measure that can improve the performance of the binary classification model.

Assessing the Accuracy of Outlier Tests in Nonlinear Regression

  • Kahng, Myung-Wook;Kim, Bu-Yang
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.163-168
    • /
    • 2009
  • Given the specific mean shift outlier model, the standard approaches to obtaining test statistics for outliers are discussed. Accuracy of outlier tests is investigated using subset curvatures. These subset curvatures appear to be reliable indicators of the adequacy of the linearization based test. Also, we consider obtaining graphical summaries of uncertainty in estimating parameters through confidence curves. The results are applied to the problem of assessing the accuracy of outlier tests.

Alternative accuracy for multiple ROC analysis

  • Hong, Chong Sun;Wu, Zhi Qiang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1521-1530
    • /
    • 2014
  • The ROC analysis is considered for multiple class diagnosis. There exist many criteria to find optimal thresholds and measure the accuracy of diagnostic tests for k dimensional ROC analysis. In this paper, we proposed a diagnostic accuracy measure called the correct classification simple rate, which is defined as the summation of true rates for each classification distribution and expressed as a function of summation of sequential true rates for two consecutive distributions. This measure does not weight accuracy across categories by the category prevalence and is comparable across populations for multiple class diagnosis. It is found that this accuracy measure does not only have a relationship with Kolmogorov - Smirnov statistics, but also can be represented as a linear function of some optimal threshold criteria. With these facts, the suggested measure could be applied to test for comparing multiple distributions.

SUPPORT VECTOR MACHINE USING K-MEANS CLUSTERING

  • Lee, S.J.;Park, C.;Jhun, M.;Koo, J.Y.
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.1
    • /
    • pp.175-182
    • /
    • 2007
  • The support vector machine has been successful in many applications because of its flexibility and high accuracy. However, when a training data set is large or imbalanced, the support vector machine may suffer from significant computational problem or loss of accuracy in predicting minority classes. We propose a modified version of the support vector machine using the K-means clustering that exploits the information in class labels during the clustering process. For large data sets, our method can save the computation time by reducing the number of data points without significant loss of accuracy. Moreover, our method can deal with imbalanced data sets effectively by alleviating the influence of dominant class.

Index of union and other accuracy measures (Index of Union와 다른 정확도 측도들)

  • Hong, Chong Sun;Choi, So Yeon;Lim, Dong Hui
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.395-407
    • /
    • 2020
  • Most classification accuracy measures for optimal threshold are divided into two types: one is expressed with cumulative distribution functions and probability density functions, the other is based on ROC curve and AUC. Unal (2017) proposed the index of union (IU) as an accuracy measure that considers two types to get them. In this study, ten kinds of accuracy measures (including IU) are divided into six categories, and the advantages of the IU are studied by comparing the measures belonging to each category. The optimal thresholds of these measures are obtained by setting various normal mixture distributions; subsequently, the first and second type of errors as well as the error sums corresponding to each threshold are calculated. The properties and characteristics of the IU statistic are explored by comparing the discriminative power of other accuracy measures based on error values.The values of the first type error and error sum of IU statistic converge to those of the best accuracy measures of the second category as the mean difference between the two distributions increases. Therefore, IU could be an accuracy measure to evaluate the discriminant power of a model.

Optimal threshold using the correlation coefficient for the confusion matrix (혼동행렬의 상관계수를 이용한 최적분류점)

  • Hong, Chong Sun;Oh, Se Hyeon;Choi, Ye Won
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.77-91
    • /
    • 2022
  • The optimal threshold estimation is considered in order to discriminate the mixture distribution in the fields of Biostatistics and credit evaluation. There exists well-known various accuracy measures that examine the discriminant power. Recently, Matthews correlation coefficient and the F1 statistic were studied to estimate optimal thresholds. In this study, we explore whether these accuracy measures are appropriate for the optimal threshold to discriminate the mixture distribution. It is found that some accuracy measures that depend on the sample size are not appropriate when two sample sizes are much different. Moreover, an alternative method for finding the optimal threshold is proposed using the correlation coefficient that defines the ratio of the confusion matrix, and the usefulness and utility of this method are also discusses.

A study on sensitivity of representativeness indicator in survey sampling (표본 추출법에서 R-지수의 민감도에 관한 연구)

  • Lee, Yujin;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.69-82
    • /
    • 2017
  • R-indicator (representativeness indicator) is used to check the representativeness of samples when non-responses occur. The representativeness is related with the accuracy of parameter estimator and the accuracy is related with bias of the estimator. Hence, unbiased estimator generates high accuracy. Therefore, high value of R-indicator guarantees the accuracy of parameter estimation with a small bias. R-indicator is calculated through propensity scores obtained by logit or probit modeling. In this paper we investigate the degree of relation between R-indicator and different non-response rates in strata using simulation studies. We also analyze a modified Korea Economic Census data for real data analysis.

Partial AUC and optimal thresholds (부분 AUC와 최적분류점들)

  • Hong, Chong Sun;Cho, Hyun Su
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.187-198
    • /
    • 2019
  • Extensive literature exists on how to estimate optimal thresholds based on various accuracy measures using receiver operating characteristic (ROC) and cumulative accuracy profile (CAP) curves. This paper now proposes an alternative measure to represented the specific partial area under the ROC and CAP curves. The relationship between ROC and CAP functions is examined using differential equations of the new defined partial area under curves. In addition, the relationship with the optimal thresholds under conditions of various accuracy measures for the ROC and CAP functions is also derived. We assume there are two kinds of distribution functions composing the mixed distribution as various normal distributions before finding the optimal thresholds. Corresponding type 1 and 2 errors are also explored and discussed under various conditions for accuracy measures.

Fixed-accuracy confidence interval estimation of P(X > c) for a two-parameter gamma population

  • Zhuang, Yan;Hu, Jun;Zou, Yixuan
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.6
    • /
    • pp.625-639
    • /
    • 2020
  • The gamma distribution is a flexible right-skewed distribution widely used in many areas, and it is of great interest to estimate the probability of a random variable exceeding a specified value in survival and reliability analysis. Therefore, the study develops a fixed-accuracy confidence interval for P(X > c) when X follows a gamma distribution, Γ(α, β), and c is a preassigned positive constant through: 1) a purely sequential procedure with known shape parameter α and unknown rate parameter β; and 2) a nonparametric purely sequential procedure with both shape and rate parameters unknown. Both procedures enjoy appealing asymptotic first-order efficiency and asymptotic consistency properties. Extensive simulations validate the theoretical findings. Three real-life data examples from health studies and steel manufacturing study are discussed to illustrate the practical applicability of both procedures.

Optimal Threshold from ROC and CAP Curves (ROC와 CAP 곡선에서의 최적 분류점)

  • Hong, Chong-Sun;Choi, Jin-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.5
    • /
    • pp.911-921
    • /
    • 2009
  • Receiver Operating Characteristic(ROC) and Cumulative Accuracy Profile(CAP) curves are two methods used to assess the discriminatory power of different credit-rating approaches. The points of optimal classification accuracy on an ROC curve and of maximal profit on a CAP curve can be found by using iso-performance tangent lines, which are based on the standard notion of accuracy. In this paper, we offer an alternative accuracy measure called the true rate. Using this rate, one can obtain alternative optimal threshold points on both ROC and CAP curves. For most real populations of borrowers, the number of the defaults is much less than that of the non-defaults, and in such cases the true rate may be more efficient than the accuracy rate in terms of cost functions. Moreover, it is shown that both alternative scores of optimal classification accuracy and maximal profit are the identical, and this single score coincides with the score corresponding to Kolmogorov-Smirnov statistic used to test the homogeneous distribution functions of the defaults and non-defaults.