• Title/Summary/Keyword: bayesian classification

Search Result 254, Processing Time 0.025 seconds

Accelerating the EM Algorithm through Selective Sampling for Naive Bayes Text Classifier (나이브베이즈 문서분류시스템을 위한 선택적샘플링 기반 EM 가속 알고리즘)

  • Chang Jae-Young;Kim Han-Joon
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.369-376
    • /
    • 2006
  • This paper presents a new method of significantly improving conventional Bayesian statistical text classifier by incorporating accelerated EM(Expectation Maximization) algorithm. EM algorithm experiences a slow convergence and performance degrade in its iterative process, especially when real online-textual documents do not follow EM's assumptions. In this study, we propose a new accelerated EM algorithm with uncertainty-based selective sampling, which is simple yet has a fast convergence speed and allow to estimate a more accurate classification model on Naive Bayesian text classifier. Experiments using the popular Reuters-21578 document collection showed that the proposed algorithm effectively improves classification accuracy.

A Study on Sex Classification of a Name using Naive Bayesian (나이브 베이지안을 사용한 성명에 대한 성별 구분 연구)

  • Lim, Myung-Jae;Jung, Jin-Pyo;Kim, Myung-Gwan
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.6
    • /
    • pp.155-159
    • /
    • 2013
  • This article employs Naive Bayesian Classifier to realize a system that can distinguish the sex of a name. Unlike foreign names, in Korean names, the pronoun referring to a person shows discordance with sex. With the characteristics of Korean names, however, the study distinguishes names frequently used for men and for women. And as it also includes names of which sex is rather ambiguous such as proper nouns, the accuracy of it is somewhat low. The result of the experiment conducted in this article indicates 84% accuracy for Korean men and 88% for Korean women; thus, the total accuracy equals 86%. Meanwhile, about foreign names, men show 80% accuracy, and women 84%, so the total accuracy equals 83%.

Large-Scale Text Classification with Deep Neural Networks (깊은 신경망 기반 대용량 텍스트 데이터 분류 기술)

  • Jo, Hwiyeol;Kim, Jin-Hwa;Kim, Kyung-Min;Chang, Jeong-Ho;Eom, Jae-Hong;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.322-327
    • /
    • 2017
  • The classification problem in the field of Natural Language Processing has been studied for a long time. Continuing forward with our previous research, which classifies large-scale text using Convolutional Neural Networks (CNN), we implemented Recurrent Neural Networks (RNN), Long-Short Term Memory (LSTM) and Gated Recurrent Units (GRU). The experiment's result revealed that the performance of classification algorithms was Multinomial Naïve Bayesian Classifier < Support Vector Machine (SVM) < LSTM < CNN < GRU, in order. The result can be interpreted as follows: First, the result of CNN was better than LSTM. Therefore, the text classification problem might be related more to feature extraction problem than to natural language understanding problems. Second, judging from the results the GRU showed better performance in feature extraction than LSTM. Finally, the result that the GRU was better than CNN implies that text classification algorithms should consider feature extraction and sequential information. We presented the results of fine-tuning in deep neural networks to provide some intuition regard natural language processing to future researchers.

Moving Object Classification through Fusion of Shape and Motion Information (형상 정보와 모션 정보 융합을 통한 움직이는 물체 인식)

  • Kim Jung-Ho;Ko Han-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.5 s.311
    • /
    • pp.38-47
    • /
    • 2006
  • Conventional classification method uses a single classifier based on shape or motion feature. However this method exhibits a weakness if naively used since the classification performance is highly sensitive to the accuracy of moving region to be detected. The detection accuracy, in turn, depends on the condition of the image background. In this paper, we propose to resolve the drawback and thus strengthen the classification reliability by employing a Bayesian decision fusion and by optimally combining the decisions of three classifiers. The first classifier is based on shape information obtained from Fourier descriptors while the second is based on the shape information obtained from image gradients. The third classifier uses motion information. Our experimental results on the classification Performance of human and vehicle with a static camera in various directions confirm a significant improvement and indicate the superiority of the proposed decision fusion method compared to the conventional Majority Voting and Weight Average Score approaches.

A Comparison Study of Classification Algorithms in Data Mining

  • Lee, Seung-Joo;Jun, Sung-Rae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.1
    • /
    • pp.1-5
    • /
    • 2008
  • Generally the analytical tools of data mining have two learning types which are supervised and unsupervised learning algorithms. Classification and prediction are main analysis tools for supervised learning. In this paper, we perform a comparison study of classification algorithms in data mining. We make comparative studies between popular classification algorithms which are LDA, QDA, kernel method, K-nearest neighbor, naive Bayesian, SVM, and CART. Also, we use almost all classification data sets of UCI machine learning repository for our experiments. According to our results, we are able to select proper algorithms for given classification data sets.

Bayesian Inference for Multinomial Group Testing

  • Heo, Tae-Young;Kim, Jong-Min
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.1
    • /
    • pp.81-92
    • /
    • 2007
  • This paper consider trinomial group testing concerned with classification of N given units into one of k disjoint categories. In this paper, we propose Bayesian inference for estimating individual category proportions using the trinomial group testing model proposed by Bar-Lev et al. (2005). We compared a relative efficience (RE) based on the mean squared error (MSE) of MLE and Bayes estimators with various prior information. The impact of different prior specifications on the estimates is also investigated using selected prior distribution. The impact of different priors on the Bayes estimates is modest when the sample size and group size we large.

Classification of Gene Expression Data by Ensemble of Bayesian Networks (앙상블 베이지안망에 의한 유전자발현데이터 분류)

  • 황규백;장정호;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.434-436
    • /
    • 2003
  • DNA칩 기술로 얻어지는 유전자발현데이터(gene expression data)는 생채 조직이나 세포의 수천개에 달하는 유전자의 발현량(expression level)을 측정한 것으로, 유전자발현양상(gene expression pattern)에 기반한 암 종류의 분류 등에 유용하다. 본 논문에서는 확률그래프모델(probabilistic graphical model)의 하나인 베이지안망(Bayesian network)을 발현데이터의 분류에 적응하며, 분류 성능을 높이기 위해 베이지안망의 앙상블(ensemble of Bayesian networks)을 구성한다. 실험은 실제 암 조직에서 추출된 유전자발현데이터에 대해 행해졌다 실험 결과, 앙상블 베이지안망의 분류 정확도는 단일 베이지안망보다 높았으며, naive Bayes 분류기, 신경망, support vector machine(SVM) 등과 대등한 성능을 보였다.

  • PDF

A Study on Document Filtering Using Naive Bayesian Classifier (베이지안 분류기를 이용한 문서 필터링)

  • Lim Soo-Yeon;Son Ki-Jun
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.3
    • /
    • pp.227-235
    • /
    • 2005
  • Document filtering is a task of deciding whether a document has relevance to a specified topic. As Internet and Web becomes wide-spread and the number of documents delivered by e-mail explosively grows the importance of text filtering increases as well. In this paper, we treat document filtering problem as binary document classification problem and we proposed the News Filtering system based on the Bayesian Classifier. For we perform filtering, we make an experiment to find out how many training documents, and how accurate relevance checks are needed.

  • PDF

An Attribute Weighting Approach for Naive Bayesian based on Very Fast Decision Tree (Very Fast Decision Tree 기반 Naive Bayesian 알고리즘의 Weight 부여 기법)

  • Kim, Se-Jun;Yoo, Seung-Eon;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.139-140
    • /
    • 2018
  • 본 논문에서는 지도 기계 학습 알고리즘 중 하나인 Naive Bayesian (NB) 알고리즘의 데이터 분류 정확도를 향상시키기 위하여 데이터 속성에 Weight를 부여하는 새로운 기법을 제안하였다. 기존에 Decision Tree(DT) 알고리즘의 깊이를 이용하여 Weigth를 부여하는 방법이 제안되었으나, DT를 구축하는데 오버헤드가 크기 때문에 데이터의 실시간 분석이나 자원 제한적인 환경에서의 적용은 어렵다는 단점이 있다. 이를 해결하기 위하여 본 논문에서는 최소한의 데이터를 사용하여 신속하게 DT를 구축하는 Very Fast Decision Tree (VFDT) 알고리즘 기반의 Weight 부여 기법을 제안함으로써 적은 오버헤드로 NB의 정확도를 향상시킨다.

  • PDF

A Three-Step Preprocessing Algorithm for Enhanced Classification of E-Mail Recommendation System (이메일 추천 시스템의 분류 향상을 위한 3단계 전처리 알고리즘)

  • Jeong Ok-Ran;Cho Dong-Sub
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.4
    • /
    • pp.251-258
    • /
    • 2005
  • Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier's performance. This research identifies e-mail document's characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document's atypical characteristics. In the first 5go, uncertain based sampling algorithm that used Mean Absolute Deviation(MAD), is used to address the question of selection learning document for the rule generation at the time of classification. In the subsequent stage, Weighted vlaue assigning method by attribute is applied to increase the discriminating capability of the terms that appear on the title on the e-mail document characteristic level. in the third and last stage, accuracy level during classification by each category is increased by using Naive Bayesian Presumptive Algorithm's Dynamic Threshold. And, we implemented an E-Mail Recommendtion System using a three-step preprocessing algorithm the enable users for direct and optimal classification with the recommendation of the applicable category when a mail arrives.