• Title/Summary/Keyword: misclassification rate

Search Result 68, Processing Time 0.022 seconds

Single-Layer Neural Networks with Double Rejection Mechanisms for Character Recognition (단층 신경망과 이중 기각 방법을 이용한 문자인식)

  • 임준호;채수익
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.3
    • /
    • pp.522-532
    • /
    • 1995
  • Multilayer neural networks with backpropagation learning algorithm are widely used for pattern classification problems. For many real applications, it is more important to reduce the misclassification rate than to increase the rate of successful classification. But multilayer perceptrons(MLP's) have drawbacks of slow learning speed and false convergence to local minima. In this paper, we propose a new method for character recognition problems with a single-layer network and double rejection mechanisms, which guarantees a very low misclassification rate. Comparing to the MLP's, it yields fast learning and requires a simple hardware architecture. We also introduce a new coding scheme to reduce the misclassification rate. We have prepared two databases: one with 135,000 digit patterns and the other with 117,000 letter patterns, and have applied the proposed method for printed character recognition, which shows that the method reduces the misclassification rate significantly without sacrificing the correct recognition rate.

  • PDF

Evaluating Predictive Ability of Classification Models with Ordered Multiple Categories

  • Oong-Hyun Sung
    • Communications for Statistical Applications and Methods
    • /
    • v.6 no.2
    • /
    • pp.383-395
    • /
    • 1999
  • This study is concerned with the evaluation of predictive ability of classification models with ordered multiple categories. If categories can be ordered or ranked the spread of misclassification should be considered to evaluate the performance of the classification models using loss rate since the apparent error rate can not measure the spread of misclassification. Since loss rate is known to underestimate the true loss rate the bootstrap method were used to estimate the true loss rate. thus this study suggests the method to evaluate the predictive power of the classification models using loss rate and the bootstrap estimate of the true loss rate.

  • PDF

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Kim Ji-Hyun;Cha Eun-Song
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.151-165
    • /
    • 2006
  • It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

Analyzing Customer Management Data by Data Mining: Case Study on Chum Prediction Models for Insurance Company in Korea

  • Cho, Mee-Hye;Park, Eun-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1007-1018
    • /
    • 2008
  • The purpose of this case study is to demonstrate database-marketing management. First, we explore original variables for insurance customer's data, modify them if necessary, and go through variable selection process before analysis. Then, we develop churn prediction models using logistic regression, neural network and SVM analysis. We also compare these three data mining models in terms of misclassification rate.

  • PDF

Classification Analysis for Unbalanced Data (불균형 자료에 대한 분류분석)

  • Kim, Dongah;Kang, Suyeon;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.495-509
    • /
    • 2015
  • We study a classification problem of significant differences in the proportion of two groups known as the unbalanced classification problem. It is usually more difficult to classify classes accurately in unbalanced data than balanced data. Most observations are likely to be classified to the bigger group if we apply classification methods to the unbalanced data because it can minimize the misclassification loss. However, this smaller group is misclassified as the larger group problem that can cause a bigger loss in most real applications. We compare several classification methods for the unbalanced data using sampling techniques (up and down sampling). We also check the total loss of different classification methods when the asymmetric loss is applied to simulated and real data. We use the misclassification rate, G-mean, ROC and AUC (area under the curve) for the performance comparison.

Undecided inference using logistic regression for credit evaluation (신용평가에서 로지스틱 회귀를 이용한 미결정자 추론)

  • Hong, Chong-Sun;Jung, Min-Sub
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.2
    • /
    • pp.149-157
    • /
    • 2011
  • Undecided inference could be regarded as a missing data problem such as MARand MNAR. Under the assumption of MAR, undecided inference make use of logistic regression model. The probability of default for the undecided group is obtained with regression coefficient vectors for the decided group and compare with the probability of default for the decided group. And under the assumption of MNAR, undecide dinference make use of logistic regression model with additional feature random vector. Simulation results based on two kinds of real data are obtained and compared. It is found that the misclassification rates are not much different from the rate of rawdata under the assumption of MAR. However the misclassification rates under the assumption of MNAR are less than those under the assumption of MAR, and as the ratio of the undecided group is increasing, the misclassification rates is decreasing.

Bivariate ROC Curve and Optimal Classification Function

  • Hong, C.S.;Jeong, J.A.
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.4
    • /
    • pp.629-638
    • /
    • 2012
  • We propose some methods to obtain optimal thresholds and classification functions by using various cutoff criterion based on the bivariate ROC curve that represents bivariate cumulative distribution functions. The false positive rate and false negative rate are calculated with these classification functions for bivariate normal distributions.

Analyzing the Effect of Lexical and Conceptual Information in Spam-mail Filtering System

  • Kang Sin-Jae;Kim Jong-Wan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.6 no.2
    • /
    • pp.105-109
    • /
    • 2006
  • In this paper, we constructed a two-phase spam-mail filtering system based on the lexical and conceptual information. There are two kinds of information that can distinguish the spam mail from the ham (non-spam) mail. The definite information is the mail sender's information, URL, a certain spam keyword list, and the less definite information is the word list and concept codes extracted from the mail body. We first classified the spam mail by using the definite information, and then used the less definite information. We used the lexical information and concept codes contained in the email body for SVM learning in the 2nd phase. According to our results the ham misclassification rate was reduced if more lexical information was used as features, and the spam misclassification rate was reduced when the concept codes were included in features as well.

Comparison of Multiway Discretization Algorithms for Data Mining

  • Kim, Jeong-Suk;Jang, Young-Mi;Na, Jong-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.801-813
    • /
    • 2005
  • The discretization algorithms for continuous data have been actively studied in the area of data mining. These discretizations are very important in data analysis, especially for efficient model selection in data mining. So, in this paper, we introduce the principles of some mutiway discretization algorithms including KEX, 1R and CN4 algorithm and investigate the efficiency of these algorithms through numerical study. For various underlying distribution, we compare these algorithms in view of misclassification rate.

  • PDF

Comparison of Binary Discretization Algorithms for Data Mining

  • Na, Jong-Hwa;Kim, Jeong-Mi;Cho, Wan-Sup
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.769-780
    • /
    • 2005
  • Recently, the discretization algorithms for continuous data have been actively studied. But there are few articles to compare the efficiency of these algorithms. In this paper we introduce the principles of some binary discretization algorithms including C4.5, CART and QUEST and investigate the efficiency of these algorithms through numerical study. For various underlying distribution, we compare these algorithms in view of misclassification rate and MSE. Real data examples are also included.

  • PDF