• Title/Summary/Keyword: misclassification cost

Search Result 27, Processing Time 0.028 seconds

Design and Performance Measurement of a Genetic Algorithm-based Group Classification Method : The Case of Bond Rating (유전 알고리듬 기반 집단분류기법의 개발과 성과평가 : 채권등급 평가를 중심으로)

  • Min, Jae-H.;Jeong, Chul-Woo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.32 no.1
    • /
    • pp.61-75
    • /
    • 2007
  • The purpose of this paper is to develop a new group classification method based on genetic algorithm and to com-pare its prediction performance with those of existing methods in the area of bond rating. To serve this purpose, we conduct various experiments with pilot and general models. Specifically, we first conduct experiments employing two pilot models : the one searching for the cluster center of each group and the other one searching for both the cluster center and the attribute weights in order to maximize classification accuracy. The results from the pilot experiments show that the performance of the latter in terms of classification accuracy ratio is higher than that of the former which provides the rationale of searching for both the cluster center of each group and the attribute weights to improve classification accuracy. With this lesson in mind, we design two generalized models employing genetic algorithm : the one is to maximize the classification accuracy and the other one is to minimize the total misclassification cost. We compare the performance of these two models with those of existing statistical and artificial intelligent models such as MDA, ANN, and Decision Tree, and conclude that the genetic algorithm-based group classification method that we propose in this paper significantly outperforms the other methods in respect of classification accuracy ratio as well as misclassification cost.

Empirical Bayesian Misclassification Analysis on Categorical Data (범주형 자료에서 경험적 베이지안 오분류 분석)

  • 임한승;홍종선;서문섭
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.39-57
    • /
    • 2001
  • Categorical data has sometimes misclassification errors. If this data will be analyzed, then estimated cell probabilities could be biased and the standard Pearson X2 tests may have inflated true type I error rates. On the other hand, if we regard wellclassified data with misclassified one, then we might spend lots of cost and time on adjustment of misclassification. It is a necessary and important step to ask whether categorical data is misclassified before analyzing data. In this paper, when data is misclassified at one of two variables for two-dimensional contingency table and marginal sums of a well-classified variable are fixed. We explore to partition marginal sums into each cells via the concepts of Bound and Collapse of Sebastiani and Ramoni (1997). The double sampling scheme (Tenenbein 1970) is used to obtain informations of misclassification. We propose test statistics in order to solve misclassification problems and examine behaviors of the statistics by simulation studies.

  • PDF

Cost-Sensitive Case Based Reasoning using Genetic Algorithm: Application to Diagnose for Diabetes

  • Park Yoon-Joo;Kim Byung-Chun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2006.06a
    • /
    • pp.327-335
    • /
    • 2006
  • Case Based Reasoning (CBR) has come to be considered as an appropriate technique for diagnosis, prognosis and prescription in medicine. However, canventional CBR has a limitation in that it cannot incorporate asymmetric misclassification cast. It assumes that the cast of type1 error and type2 error are the same, so it cannot be modified according ta the error cast of each type. This problem provides major disincentive to apply conventional CBR ta many real world cases that have different casts associated with different types of error. Medical diagnosis is an important example. In this paper we suggest the new knowledge extraction technique called Cast-Sensitive Case Based Reasoning (CSCBR) that can incorporate unequal misclassification cast. The main idea involves a dynamic adaptation of the optimal classification boundary paint and the number of neighbors that minimize the tatol misclassification cast according ta the error casts. Our technique uses a genetic algorithm (GA) for finding these two feature vectors of CSCBR. We apply this new method ta diabetes datasets and compare the results with those of the cast-sensitive methods, C5.0 and CART. The results of this paper shaw that the proposed technique outperforms other methods and overcomes the limitation of conventional CBR.

  • PDF

A Rectifying Inspection Plan Giving LTPD Protection for Destructive Testing (파괴검사시(破壞檢査時)의 계수선별형(計數選別型) LTPD 보증(保證)샘플링 검사방식(檢査方式))

  • Yu, Mun-Chan
    • Journal of Korean Society for Quality Management
    • /
    • v.15 no.1
    • /
    • pp.68-75
    • /
    • 1987
  • A rectifying inspection plan is considered for the case of destructive testing. Screening inspection for rejected lots is performed by some nondestructive testing which is prone to misclassification errors. Apparent defectives found in the screening process is replaced with apparent good items. The plan provides LTPD protection on each individual lot while the sum of the cost of testing and the cost due to producer's risk at process average quality is minimized. A brief discussion on average outgoing quality is also given.

  • PDF

유전자 알고리즘을 활용한 데이터 불균형 해소 기법의 조합적 활용

  • Jang, Yeong-Sik;Kim, Jong-U;Heo, Jun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.05a
    • /
    • pp.309-320
    • /
    • 2007
  • The data imbalance problem which can be uncounted in data mining classification problems typically means that there are more or less instances in a class than those in other classes. It causes low prediction accuracy of the minority class because classifiers tend to assign instances to major classes and ignore the minor class to reduce overall misclassification rate. In order to solve the data imbalance problem, there has been proposed a number of techniques based on resampling with replacement, adjusting decision thresholds, and adjusting the cost of the different classes. In this paper, we study the feasibility of the combination usage of the techniques previously proposed to deal with the data imbalance problem, and suggest a combination method using genetic algorithm to find the optimal combination ratio of the techniques. To improve the prediction accuracy of a minority class, we determine the combination ratio based on the F-value of the minority class as the fitness function of genetic algorithm. To compare the performance with those of single techniques and the matrix-style combination of random percentage, we performed experiments using four public datasets which has been generally used to compare the performance of methods for the data imbalance problem. From the results of experiments, we can find the usefulness of the proposed method.

  • PDF

Study on the fast nearest-neighbor searching classifier using distance approximation (거리 근사를 이용하는 고속 최근 이웃 탐색 분류기에 관한 연구)

  • 이일완;채수익
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.2
    • /
    • pp.71-79
    • /
    • 1997
  • In this paper, we propose a new nearest-neighbor classifier with reduced computational complexity in search process. In the proposed classifier, the classes are divided into two sets: reference and non-reference sets. It reduces computational requriement by approximating the distance between the input and a class iwth the information of distances among the calsses. It calculates only the distance between the input and the reference classes. We convert a given classifier into RCC (reduced computational complexity but smal lincrease in misclassification probability of its corresponding RCC classifier. We designed RCC classifiers for the recognition of digits from the NIST database. We obtained an RCC classifier with 60% reduction in the computational complexity with the cost of 0.5% increase in misclassification probability.

  • PDF

Modifying linearly non-separable support vector machine binary classifier to account for the centroid mean vector

  • Mubarak Al-Shukeili;Ronald Wesonga
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.245-258
    • /
    • 2023
  • This study proposes a modification to the objective function of the support vector machine for the linearly non-separable case of a binary classifier yi ∈ {-1, 1}. The modification takes into account the position of each data item xi from its corresponding class centroid. The resulting optimization function involves the centroid mean vector, and the spread of data besides the support vectors, which should be minimized by the choice of hyper-plane β. Theoretical assumptions have been tested to derive an optimal separable hyperplane that yields the minimal misclassification rate. The proposed method has been evaluated using simulation studies and real-life COVID-19 patient outcome hospitalization data. Results show that the proposed method performs better than the classical linear SVM classifier as the sample size increases and is preferred in the presence of correlations among predictors as well as among extreme values.

An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems (비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형)

  • Lee, Hyeon-Uk;Kim, Ji-Hun;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.125-141
    • /
    • 2012
  • These days, the malicious attacks and hacks on the networked systems are dramatically increasing, and the patterns of them are changing rapidly. Consequently, it becomes more important to appropriately handle these malicious attacks and hacks, and there exist sufficient interests and demand in effective network security systems just like intrusion detection systems. Intrusion detection systems are the network security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. Conventional intrusion detection systems have generally been designed using the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. However, they cannot handle new or unknown patterns of the network attacks, although they perform very well under the normal situation. As a result, recent studies on intrusion detection systems use artificial intelligence techniques, which can proactively respond to the unknown threats. For a long time, researchers have adopted and tested various kinds of artificial intelligence techniques such as artificial neural networks, decision trees, and support vector machines to detect intrusions on the network. However, most of them have just applied these techniques singularly, even though combining the techniques may lead to better detection. With this reason, we propose a new integrated model for intrusion detection. Our model is designed to combine prediction results of four different binary classification models-logistic regression (LOGIT), decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), which may be complementary to each other. As a tool for finding optimal combining weights, genetic algorithms (GA) are used. Our proposed model is designed to be built in two steps. At the first step, the optimal integration model whose prediction error (i.e. erroneous classification rate) is the least is generated. After that, in the second step, it explores the optimal classification threshold for determining intrusions, which minimizes the total misclassification cost. To calculate the total misclassification cost of intrusion detection system, we need to understand its asymmetric error cost scheme. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, total misclassification cost is more affected by FNE rather than FPE. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 10,000 samples from them by using random sampling method. Also, we compared the results from our model with the results from single techniques to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell R4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on GA outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that the proposed model outperformed all the other comparative models in the total misclassification cost perspective. Consequently, it is expected that our study may contribute to build cost-effective intelligent intrusion detection systems.

An Economic Design of Rectifying Inspection Plans Based on a Correlated Variable (대용품질특성치를 이용한 계수선별형 샘플링 검사방식의 경제적 설계)

  • Bai, D.S.;Lee, K.T.;Choi, I.S.
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.23 no.4
    • /
    • pp.793-802
    • /
    • 1997
  • A sampling plan is presented for situations where sampling inspection is based on the quality characteristic of interest and items in rejected lots are screened based on a correlated variable. A cost model is constructed which involves the costs of misclassification errors, sampling and screening inspections. A method of finding optimal values of sample size, acceptance number and cutoff value on the correlated variable is presented, and numerical studies are given.

  • PDF

Optimum Screening Procedures Using Prior Information

  • Kim, Sang-Boo
    • Journal of Korean Society for Quality Management
    • /
    • v.22 no.1
    • /
    • pp.142-151
    • /
    • 1994
  • Optimum screening procedures using prior information are presented. An optimal cutoff value on the screening variable X minimizing the expected total cost is obtained for the normal model; it is assumed that a continuous screening variable X given a dichotomous performance variable T is normally distributed and that costs are incurred by screening inspection and misclassification errors. Methods for finding optimal cutoff values based on the prior distributions for unknown parameters are presented.

  • PDF