• 제목/요약/키워드: classifiers

Search Result 743, Processing Time 0.034 seconds

Illegal Cash Accommodation Detection Modeling Using Ensemble Size Reduction (신용카드 불법현금융통 적발을 위한 축소된 앙상블 모형)

  • Lee, Hwa-Kyung;Han, Sang-Bum;Jhee, Won-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.1
    • /
    • pp.93-116
    • /
    • 2010
  • Ensemble approach is applied to the detection modeling of illegal cash accommodation (ICA) that is the well-known type of fraudulent usages of credit cards in far east nations and has not been addressed in the academic literatures. The performance of fraud detection model (FDM) suffers from the imbalanced data problem, which can be remedied to some extent using an ensemble of many classifiers. It is generally accepted that ensembles of classifiers produce better accuracy than a single classifier provided there is diversity in the ensemble. Furthermore, recent researches reveal that it may be better to ensemble some selected classifiers instead of all of the classifiers at hand. For the effective detection of ICA, we adopt ensemble size reduction technique that prunes the ensemble of all classifiers using accuracy and diversity measures. The diversity in ensemble manifests itself as disagreement or ambiguity among members. Data imbalance intrinsic to FDM affects our approach for ICA detection in two ways. First, we suggest the training procedure with over-sampling methods to obtain diverse training data sets. Second, we use some variants of accuracy and diversity measures that focus on fraud class. We also dynamically calculate the diversity measure-Forward Addition and Backward Elimination. In our experiments, Neural Networks, Decision Trees and Logit Regressions are the base models as the ensemble members and the performance of homogeneous ensembles are compared with that of heterogeneous ensembles. The experimental results show that the reduced size ensemble is as accurate on average over the data-sets tested as the non-pruned version, which provides benefits in terms of its application efficiency and reduced complexity of the ensemble.

Bayesian Network-Based Analysis on Clinical Data of Infertility Patients (베이지안 망에 기초한 불임환자 임상데이터의 분석)

  • Jung, Yong-Gyu;Kim, In-Cheol
    • The KIPS Transactions:PartB
    • /
    • v.9B no.5
    • /
    • pp.625-634
    • /
    • 2002
  • In this paper, we conducted various experiments with Bayesian networks in order to analyze clinical data of infertility patients. With these experiments, we tried to find out inter-dependencies among important factors playing the key role in clinical pregnancy, and to compare 3 different kinds of Bayesian network classifiers (including NBN, BAN, GBN) in terms of classification performance. As a result of experiments, we found the fact that the most important features playing the key role in clinical pregnancy (Clin) are indication (IND), stimulation, age of female partner (FA), number of ova (ICT), and use of Wallace (ETM), and then discovered inter-dependencies among these features. And we made sure that BAN and GBN, which are more general Bayesian network classifiers permitting inter-dependencies among features, show higher performance than NBN. By comparing Bayesian classifiers based on probabilistic representation and reasoning with other classifiers such as decision trees and k-nearest neighbor methods, we found that the former show higher performance than the latter due to inherent characteristics of clinical domain. finally, we suggested a feature reduction method in which all features except only some ones within Markov blanket of the class node are removed, and investigated by experiments whether such feature reduction can increase the performance of Bayesian classifiers.

Comparison of Korean Classification Models' Korean Essay Score Range Prediction Performance (한국어 학습 모델별 한국어 쓰기 답안지 점수 구간 예측 성능 비교)

  • Cho, Heeryon;Im, Hyeonyeol;Yi, Yumi;Cha, Junwoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.133-140
    • /
    • 2022
  • We investigate the performance of deep learning-based Korean language models on a task of predicting the score range of Korean essays written by foreign students. We construct a data set containing a total of 304 essays, which include essays discussing the criteria for choosing a job ('job'), conditions of a happy life ('happ'), relationship between money and happiness ('econ'), and definition of success ('succ'). These essays were labeled according to four letter grades (A, B, C, and D), and a total of eleven essay score range prediction experiments were conducted (i.e., five for predicting the score range of 'job' essays, five for predicting the score range of 'happiness' essays, and one for predicting the score range of mixed topic essays). Three deep learning-based Korean language models, KoBERT, KcBERT, and KR-BERT, were fine-tuned using various training data. Moreover, two traditional probabilistic machine learning classifiers, naive Bayes and logistic regression, were also evaluated. Experiment results show that deep learning-based Korean language models performed better than the two traditional classifiers, with KR-BERT performing the best with 55.83% overall average prediction accuracy. A close second was KcBERT (55.77%) followed by KoBERT (54.91%). The performances of naive Bayes and logistic regression classifiers were 52.52% and 50.28% respectively. Due to the scarcity of training data and the imbalance in class distribution, the overall prediction performance was not high for all classifiers. Moreover, the classifiers' vocabulary did not explicitly capture the error features that were helpful in correctly grading the Korean essay. By overcoming these two limitations, we expect the score range prediction performance to improve.

GA-Based Construction of Fuzzy Classifiers Using Information Granules

  • Kim Do-Wan;Lee Ho-Jae;Park Jin-Bae;Joo Young-Hoon
    • International Journal of Control, Automation, and Systems
    • /
    • v.4 no.2
    • /
    • pp.187-196
    • /
    • 2006
  • A new GA-based methodology using information granules is suggested for the construction of fuzzy classifiers. The proposed scheme consists of three steps: selection of information granules, construction of the associated fuzzy sets, and tuning of the fuzzy rules. First, the genetic algorithm (GA) is applied to the development of the adequate information granules. The fuzzy sets are then constructed from the analysis of the developed information granules. An interpretable fuzzy classifier is designed by using the constructed fuzzy sets. Finally, the GA is utilized for tuning of the fuzzy rules, which can enhance the classification performance on the misclassified data (e.g., data with the strange pattern or on the boundaries of the classes). To show the effectiveness of the proposed method, an example, the classification of the Iris data, is provided.

An Improved Domain-Knowledge-based Reinforcement Learning Algorithm

  • Jang, Si-Young;Suh, Il-Hong
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1309-1314
    • /
    • 2003
  • If an agent has a learning ability using previous knowledge, then it is expected that the agent can speed up learning by interacting with environment. In this paper, we present an improved reinforcement learning algorithm using domain knowledge which can be represented by problem-independent features and their classifiers. Here, neural networks are employed as knowledge classifiers. To show the validity of our proposed algorithm, computer simulations are illustrated, where navigation problem of a mobile robot and a micro aerial vehicle(MAV) are considered.

  • PDF

An Improvement of LVQ3 Learning Using SVM (SVM을 이용한 LVQ3 학습의 성능개선)

  • 김상운
    • Proceedings of the IEEK Conference
    • /
    • 2001.06c
    • /
    • pp.9-12
    • /
    • 2001
  • Learning vector quantization (LVQ) is a supervised learning technique that uses class information to move the vector quantizer slightly, so as to improve the quality of the classifier decision regions. In this paper we propose a selection method of initial codebook vectors for a teaming vector quantization (LVQ3) using support vector machines (SVM). The method is experimented with artificial and real design data sets and compared with conventional methods of the condensed nearest neighbor (CNN) and its modifications (mCNN). From the experiments, it is discovered that the proposed method produces higher performance than the conventional ones and then it could be used efficiently for designing nonparametric classifiers.

  • PDF

A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.5
    • /
    • pp.617-625
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

Face Recognition Using Fuzzy Fusion and Wavelet Decomposition Method

  • Kwak, Keun-Chang;Min, Jun-Oh;Chun, Myung-Geun;Witold Pedrycz
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.364-367
    • /
    • 2003
  • In this study, we develop a method for recognizing face images by combining wavelet decomposition, fisherface method, and fuzzy integral. The proposed approach comprises of four main stages. The first stage uses the wavelet decomposition. As a result of this decomposition, we obtain four subimages. The second stage of the approach applies a fisherface method to these four subimage sets. The two last phases are concerned with the generation of the degree of fuzzy membership and the aggregation of the individual classifiers by means of the fuzzy integral. The experimental results obtained for the CNU and Yale face databases reveal that the approach presented in this study yields better classification performance in comparison to the results produced by other classifiers.

  • PDF

A Comparative Study on Classification Methods of Sleep Stages by Using EEG

  • Kim, Jinwoo
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.2
    • /
    • pp.113-123
    • /
    • 2014
  • Electrophysiological recordings are considered a reliable method of assessing a person's alertness. Sleep medicine is asked to offer objective methods to measure daytime alertness, tiredness and sleepiness. As EEG signals are non-stationary, the conventional method of frequency analysis is not highly successful in recognition of alertness level. In this paper, EEG signals have been analyzed using wavelet transform as well as discrete wavelet transform and classification using statistical classifiers such as euclidean and mahalanobis distance classifiers and a promising method SVM (Support Vector Machine). As a result of simulation, the average values of accuracies for the Linear Discriminant Analysis (LDA)-Quadratic, k-Nearest Neighbors (k-NN)-Euclidean, and Linear SVM were 48%, 34.2%, and 86%, respectively. The experimental results show that SVM classification method offer the better performance for reliable classification of the EEG signal in comparison with the other classification methods.

Ensemble Learning for Underwater Target Classification (수중 표적 식별을 위한 앙상블 학습)

  • Seok, Jongwon
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.11
    • /
    • pp.1261-1267
    • /
    • 2015
  • The problem of underwater target detection and classification has been attracted a substantial amount of attention and studied from many researchers for both military and non-military purposes. The difficulty is complicate due to various environmental conditions. In this paper, we study classifier ensemble methods for active sonar target classification to improve the classification performance. In general, classifier ensemble method is useful for classifiers whose variances relatively large such as decision trees and neural networks. Bagging, Random selection samples, Random subspace and Rotation forest are selected as classifier ensemble methods. Using the four ensemble methods based on 31 neural network classifiers, the classification tests were carried out and performances were compared.