• Title/Summary/Keyword: Single classifiers

Search Result 66, Processing Time 0.024 seconds

Effective Korean sentiment classification method using word2vec and ensemble classifier (Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안)

  • Park, Sung Soo;Lee, Kun Chang
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.133-140
    • /
    • 2018
  • Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.

Aircraft Classification with Fusion of HRRP and JEM Based on the Confidence of a Classifier (구분기 신뢰도에 기반한 HRRP 및 JEM 융합 항공기 식별)

  • Kim, Si-Ho;Lee, Sang-In;Chae, Dae-Young
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.28 no.3
    • /
    • pp.217-224
    • /
    • 2017
  • In this paper, we propose a fusion classification method combining HRRP and JEM classifier with complementary properties for the classification of aircraft. The fusion method is based on the confidence of a classifier for a classification result to improve performance compared with single classifier in various situations. The confidence is defined as the posterior probability estimated from the classification performance of a classifier and it depends on the aspect angle and the certainty for a classification result. Through the classification test using simulation data, we can verify that the proposed fusion method shows good performance by fusing the classifiers effectively.

유전자 알고리즘을 활용한 데이터 불균형 해소 기법의 조합적 활용

  • Jang, Yeong-Sik;Kim, Jong-U;Heo, Jun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.05a
    • /
    • pp.309-320
    • /
    • 2007
  • The data imbalance problem which can be uncounted in data mining classification problems typically means that there are more or less instances in a class than those in other classes. It causes low prediction accuracy of the minority class because classifiers tend to assign instances to major classes and ignore the minor class to reduce overall misclassification rate. In order to solve the data imbalance problem, there has been proposed a number of techniques based on resampling with replacement, adjusting decision thresholds, and adjusting the cost of the different classes. In this paper, we study the feasibility of the combination usage of the techniques previously proposed to deal with the data imbalance problem, and suggest a combination method using genetic algorithm to find the optimal combination ratio of the techniques. To improve the prediction accuracy of a minority class, we determine the combination ratio based on the F-value of the minority class as the fitness function of genetic algorithm. To compare the performance with those of single techniques and the matrix-style combination of random percentage, we performed experiments using four public datasets which has been generally used to compare the performance of methods for the data imbalance problem. From the results of experiments, we can find the usefulness of the proposed method.

  • PDF

Ensemble learning of Regional Experts (지역 전문가의 앙상블 학습)

  • Lee, Byung-Woo;Yang, Ji-Hoon;Kim, Seon-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.2
    • /
    • pp.135-139
    • /
    • 2009
  • We present a new ensemble learning method that employs the set of region experts, each of which learns to handle a subset of the training data. We split the training data and generate experts for different regions in the feature space. When classifying a data, we apply a weighted voting among the experts that include the data in their region. We used ten datasets to compare the performance of our new ensemble method with that of single classifiers as well as other ensemble methods such as Bagging and Adaboost. We used SMO, Naive Bayes and C4.5 as base learning algorithms. As a result, we found that the performance of our method is comparable to that of Adaboost and Bagging when the base learner is C4.5. In the remaining cases, our method outperformed the benchmark methods.

Fuzzy Classifier System for Edge Detection

  • Sim, Kwee-Bo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.3 no.1
    • /
    • pp.52-57
    • /
    • 2003
  • In this paper, we propose a Fuzzy Classifier System(FCS) to find a set of fuzzy rules which can carry out the edge detection. The classifier system of Holland can evaluate the usefulness of rules represented by classifiers with repeated learning. FCS makes the classifier system be able to carry out the mapping from continuous inputs to outputs. It is the FCS that applies the method of machine learning to the concept of fuzzy logic. It is that the antecedent and consequent of classifier is same as a fuzzy rule. In this paper, the FCS is the Michigan style. A single fuzzy if-then rule is coded as an individual. The average gray levels which each group of neighbor pixels has are represented into fuzzy set. Then a pixel is decided whether it is edge pixel or not using fuzzy if-then rules. Depending on the average of gray levels, a number of fuzzy rules can be activated, and each rules makes the output. These outputs are aggregated and defuzzified to take new gray value of the pixel. To evaluate this edge detection, we will compare the new gray level of a pixel with gray level obtained by the other edge detection method such as Sobel edge detection. This comparison provides a reinforcement signal for FCS which is reinforcement learning. Also the FCS employs the Genetic Algorithms to make new rules and modify rules when performance of the system needs to be improved.

A Study on the Effectiveness of Bigrams in Text Categorization (바이그램이 문서범주화 성능에 미치는 영향에 관한 연구)

  • Lee, Chan-Do;Choi, Joon-Young
    • Journal of Information Technology Applications and Management
    • /
    • v.12 no.2
    • /
    • pp.15-27
    • /
    • 2005
  • Text categorization systems generally use single words (unigrams) as features. A deceptively simple algorithm for improving text categorization is investigated here, an idea previously shown not to work. It is to identify useful word pairs (bigrams) made up of adjacent unigrams. The bigrams it found, while small in numbers, can substantially raise the quality of feature sets. The algorithm was tested on two pre-classified datasets, Reuters-21578 for English and Korea-web for Korean. The results show that the algorithm was successful in extracting high quality bigrams and increased the quality of overall features. To find out the role of bigrams, we trained the Na$\"{i}$ve Bayes classifiers using both unigrams and bigrams as features. The results show that recall values were higher than those of unigrams alone. Break-even points and F1 values improved in most documents, especially when documents were classified along the large classes. In Reuters-21578 break-even points increased by 2.1%, with the highest at 18.8%, and F1 improved by 1.5%, with the highest at 3.2%. In Korea-web break-even points increased by 1.0%, with the highest at 4.5%, and F1 improved by 0.4%, with the highest at 4.2%. We can conclude that text classification using unigrams and bigrams together is more efficient than using only unigrams.

  • PDF

Recognition of Handwritten Numerals using Hybrid Features And Combined Classifier (복합 특징과 결합 인식기에 의한 필기체 숫자인식)

  • 박중조;송영기;김경민
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.1
    • /
    • pp.14-22
    • /
    • 2001
  • Off-line handwritten numeral recognition is a very difficult task and hard to achieve high recognition results using a single feature and a single classifier, since handwritten numerals contain many pattern variations which mostly depend upon individual writing styles. In this paper, we propose handwritten numeral recognition system using hybrid features and combined classifier. To improve recognition rate, we select mutually helpful features -directional features, crossing point feature and mesh features- and make throe new hybrid feature sets by using these features. These hybrid feature sets hold the local and global characteristics of input numeral images. And we implement combined classifier by combining three neural network classifiers to achieve high recognition rate, where fuzzy integral is used for multiple network fusion. In order to verify the performance of the proposed recognition system, experiments with the unconstrained handwritten numeral database of Concordia University, Canada were performed. As a result, our method has produced 97.85% of the recognition rate.

  • PDF

Bankruptcy prediction using ensemble SVM model (앙상블 SVM 모형을 이용한 기업 부도 예측)

  • Choi, Ha Na;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1113-1125
    • /
    • 2013
  • Corporate bankruptcy prediction has been an important topic in the accounting and finance field for a long time. Several data mining techniques have been used for bankruptcy prediction. However, there are many limits for application to real classification problem with a single model. This study proposes ensemble SVM (support vector machine) model which assembles different SVM models with each different kernel functions. Our ensemble model is made and evaluated by v-fold cross-validation approach. The k top performing models are recruited into the ensemble. The classification is then carried out using the majority voting opinion of the ensemble. In this paper, we investigate the performance of ensemble SVM classifier in terms of accuracy, error rate, sensitivity, specificity, ROC curve, and AUC to compare with single SVM classifiers based on financial ratios dataset and simulation dataset. The results confirmed the advantages of our method: It is robust while providing good performance.

Tracing the breeding farm of domesticated pig using feature selection (Sus scrofa)

  • Kwon, Taehyung;Yoon, Joon;Heo, Jaeyoung;Lee, Wonseok;Kim, Heebal
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.30 no.11
    • /
    • pp.1540-1549
    • /
    • 2017
  • Objective: Increasing food safety demands in the animal product market have created a need for a system to trace the food distribution process, from the manufacturer to the retailer, and genetic traceability is an effective method to trace the origin of animal products. In this study, we successfully achieved the farm tracing of 6,018 multi-breed pigs, using single nucleotide polymorphism (SNP) markers strictly selected through least absolute shrinkage and selection operator (LASSO) feature selection. Methods: We performed farm tracing of domesticated pig (Sus scrofa) from SNP markers and selected the most relevant features for accurate prediction. Considering multi-breed composition of our data, we performed feature selection using LASSO penalization on 4,002 SNPs that are shared between breeds, which also includes 179 SNPs with small between-breed difference. The 100 highest-scored features were extracted from iterative simulations and then evaluated using machine-leaning based classifiers. Results: We selected 1,341 SNPs from over 45,000 SNPs through iterative LASSO feature selection, to minimize between-breed differences. We subsequently selected 100 highest-scored SNPs from iterative scoring, and observed high statistical measures in classification of breeding farms by cross-validation only using these SNPs. Conclusion: The study represents a successful application of LASSO feature selection on multi-breed pig SNP data to trace the farm information, which provides a valuable method and possibility for further researches on genetic traceability.

Multiple-Classifier Combination based on Image Degradation Model for Low-Quality Image Recognition (저화질 영상 인식을 위한 화질 저하 모델 기반 다중 인식기 결합)

  • Ryu, Sang-Jin;Kim, In-Jung
    • The KIPS Transactions:PartB
    • /
    • v.17B no.3
    • /
    • pp.233-238
    • /
    • 2010
  • In this paper, we propose a multiple classifier combination method based on image degradation modeling to improve recognition performance on low-quality images. Using an image degradation model, it generates a set of classifiers each of which is specialized for a specific image quality. In recognition, it combines the results of the recognizers by weighted averaging to decide the final result. At this time, the weight of each recognizer is dynamically decided from the estimated quality of the input image. It assigns large weight to the recognizer specialized to the estimated quality of the input image, but small weight to other recognizers. As the result, it can effectively adapt to image quality variation. Moreover, being a multiple-classifier system, it shows more reliable performance then the single-classifier system on low-quality images. In the experiment, the proposed multiple-classifier combination method achieved higher recognition rate than multiple-classifier combination systems not considering the image quality or single classifier systems considering the image quality.