• Title/Summary/Keyword: Naive Bayesian Network

Search Result 41, Processing Time 0.022 seconds

Data Mining Using Reversible Jump MCMC and Bayesian Network Learning (Reversible Jump MCMC와 베이지안망 학습에 의한 데이터마이닝)

  • 하선영;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10b
    • /
    • pp.90-92
    • /
    • 2000
  • 데이터마이닝 문제는 데이터를 그 속성들에 따라 분류하여 예측하는 것뿐만 아니라 분류된 속성들간의 연관성에 대해 잘 설명할 수 있어야 한다. 일반적으로 변수들간의 연관성을 잘 설명할 수 있으면서도 높은 예측력을 가지는 방법으로는 베이지안 네트웍 분류자(Bayesian network classifier)가 있다. 그러나 이것은 데이터 마이닝과 같은 대용량 데이터에서는 성능이 떨어지는 단점이 있다. 이에 이 논문에서는 최근 RBF 신경망이 입력변수 선정문제에 성공적으로 적용된 Reversible Jump Markov Chain Monte Carlo 방법을 이용하여 최적의 입력변수들만을 선택하여 베이지안 네트웍을 학습하는 Selective BN Augmented Naive-Bayes Classifier를 새로운 방안으로 제안하고 이를 실제 데이터마이닝 문제에 적용한 결과를 제시한다.

  • PDF

Classification of Gene Expression Data by Ensemble of Bayesian Networks (앙상블 베이지안망에 의한 유전자발현데이터 분류)

  • 황규백;장정호;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.434-436
    • /
    • 2003
  • DNA칩 기술로 얻어지는 유전자발현데이터(gene expression data)는 생채 조직이나 세포의 수천개에 달하는 유전자의 발현량(expression level)을 측정한 것으로, 유전자발현양상(gene expression pattern)에 기반한 암 종류의 분류 등에 유용하다. 본 논문에서는 확률그래프모델(probabilistic graphical model)의 하나인 베이지안망(Bayesian network)을 발현데이터의 분류에 적응하며, 분류 성능을 높이기 위해 베이지안망의 앙상블(ensemble of Bayesian networks)을 구성한다. 실험은 실제 암 조직에서 추출된 유전자발현데이터에 대해 행해졌다 실험 결과, 앙상블 베이지안망의 분류 정확도는 단일 베이지안망보다 높았으며, naive Bayes 분류기, 신경망, support vector machine(SVM) 등과 대등한 성능을 보였다.

  • PDF

A Hierarchical CPV Solar Generation Tracking System based on Modular Bayesian Network (베이지안 네트워크 기반 계층적 CPV 태양광 추적 시스템)

  • Park, Susang;Yang, Kyon-Mo;Cho, Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.41 no.7
    • /
    • pp.481-491
    • /
    • 2014
  • The power production using renewable energy is more important because of a limited amount of fossil fuel and the problem of global warming. A concentrative photovoltaic system comes into the spotlight with high energy production, since the rate of power production using solar energy is proliferated. These systems, however, need to sophisticated tracking methods to give the high power production. In this paper, we propose a hierarchical tracking system using modular Bayesian networks and a naive Bayes classifier. The Bayesian networks can respond flexibly in uncertain situations and can be designed by domain knowledge even when the data are not enough. Bayesian network modules infer the weather states which are classified into nine classes. Then, naive Bayes classifier selects the most effective method considering inferred weather states and the system makes a decision using the rules. We collected real weather data for the experiments and the average accuracy of the proposed method is 93.9%. In addition, comparing the photovoltaic efficiency with the pinhole camera system results in improved performance of about 16.58%.

Generation and Selection of Nominal Virtual Examples for Improving the Classifier Performance (분류기 성능 향상을 위한 범주 속성 가상예제의 생성과 선별)

  • Lee, Yu-Jung;Kang, Byoung-Ho;Kang, Jae-Ho;Ryu, Kwang-Ryel
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1052-1061
    • /
    • 2006
  • This paper presents a method of using virtual examples to improve the classification accuracy for data with nominal attributes. Most of the previous researches on virtual examples focused on data with numeric attributes, and they used domain-specific knowledge to generate useful virtual examples for a particularly targeted learning algorithm. Instead of using domain-specific knowledge, our method samples virtual examples from a naive Bayesian network constructed from the given training set. A sampled example is considered useful if it contributes to the increment of the network's conditional likelihood when added to the training set. A set of useful virtual examples can be collected by repeating this process of sampling followed by evaluation. Experiments have shown that the virtual examples collected this way.can help various learning algorithms to derive classifiers of improved accuracy.

IDS Model using Improved Bayesian Network to improve the Intrusion Detection Rate (베이지안 네트워크 개선을 통한 탐지율 향상의 IDS 모델)

  • Choi, Bomin;Lee, Jungsik;Han, Myung-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.495-503
    • /
    • 2014
  • In recent days, a study of the intrusion detection system collecting and analyzing network data, packet or logs, has been actively performed to response the network threats in computer security fields. In particular, Bayesian network has advantage of the inference functionality which can infer with only some of provided data, so studies of the intrusion system based on Bayesian network have been conducted in the prior. However, there were some limitations to calculate high detection performance because it didn't consider the problems as like complexity of the relation among network packets or continuos input data processing. Therefore, in this paper we proposed two methodologies based on K-menas clustering to improve detection rate by reforming the problems of prior models. At first, it can be improved by sophisticatedly setting interval range of nodes based on K-means clustering. And for the second, it can be improved by calculating robust CPT through applying weighted-leaning based on K-means clustering, too. We conducted the experiments to prove performance of our proposed methodologies by comparing K_WTAN_EM applied to proposed two methodologies with prior models. As the results of experiment, the detection rate of proposed model is higher about 7.78% than existing NBN(Naive Bayesian Network) IDS model, and is higher about 5.24% than TAN(Tree Augmented Bayesian Network) IDS mode and then we could prove excellence our proposing ideas.

Forecasting of Various Air Pollutant Parameters in Bangalore Using Naïve Bayesian

  • Shivkumar M;Sudhindra K R;Pranesha T S;Chate D M;Beig G
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.3
    • /
    • pp.196-200
    • /
    • 2024
  • Weather forecasting is considered to be of utmost important among various important sectors such as flood management and hydro-electricity generation. Although there are various numerical methods for weather forecasting but majority of them are reported to be Mechanistic computationally demanding due to their complexities. Therefore, it is necessary to develop and build models for accurately predicting the weather conditions which are faster as well as efficient in comparison to the prevalent meteorological models. The study has been undertaken to forecast various atmospheric parameters in the city of Bangalore using Naïve Bayes algorithms. The individual parameters analyzed in the study consisted of wind speed (WS), wind direction (WD), relative humidity (RH), solar radiation (SR), black carbon (BC), radiative forcing (RF), air temperature (AT), bar pressure (BP), PM10 and PM2.5 of the Bangalore city collected from Air Quality Monitoring Station for a period of 5 years from January 2015 to May 2019. The study concluded that Naive Bayes is an easy and efficient classifier that is centered on Bayes theorem, is quite efficient in forecasting the various air pollution parameters of the city of Bangalore.

Performance Evaluation on the Learning Algorithm for Automatic Classification of Q&A Documents (고객 질의 문서 자동 분류를 위한 학습 알고리즘 성능 평가)

  • Choi Jung-Min;Lee Byoung-Soo
    • The KIPS Transactions:PartD
    • /
    • v.13D no.1 s.104
    • /
    • pp.133-138
    • /
    • 2006
  • Electric commerce of surpassing the traditional one appeared before the public and has currently led the change in the management of enterprises. To establish and maintain good relations with customers, electric commerce has various channels for customers that understand what they want to and suggest it to them. The bulletin board and e-mail among em are inbound information that enterprises can directly listen to customers' opinions and are different from other channels in characters. Enterprises can effectively manage the bulletin board and e-mail by understanding customers' ideas as many as possible and provide them with optimum answers. It is one of the important factors to improve the reliability of the notice board and e-mail as well as the whole electric commerce. Therefore this thesis researches into methods to classify various kinds of documents automatically in electric commerce; they are possible to solve existing problems of the bulletin board and e-mail, to operate effectively and to manage systematically. Moreover, it researches what the most suitable algorithm is in the automatic classification of Q&A documents by experiment the classifying performance of Naive Bayesian, TFIDF, Neural Network, k-NN

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.163-170
    • /
    • 2019
  • Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.

Development of newly recruited privates on-the-job Training Achievements Group Classification Model (신병 주특기교육 성취집단 예측모형 개발)

  • Kwak, Ki-Hyo;Suh, Yong-Moo
    • Journal of the military operations research society of Korea
    • /
    • v.33 no.2
    • /
    • pp.101-113
    • /
    • 2007
  • The period of military personnel service will be phased down by 2014 according to 'The law of National Defense Reformation' issued by the Ministry of National Defense. For this reason, the ROK army provides discrimination education to 'newly recruited privates' for more effective individual performance in the on-the-job training. For the training to be more effective, it would be essential to predict the degree of achievements by new privates in the training. Thus, we used data mining techniques to develop a classification model which classifies the new privates into one of two achievements groups, so that different skills of education are applied to each group. The target variable for this model is a binary variable, whose value can be either 'a group of general control' or 'a group of special control'. We developed four pure classification models using Neural Network, Decision Tree, Support Vector Machine and Naive Bayesian. We also built four hybrid models, each of which combines k-means clustering algorithm with one of these four mining technique. Experimental results demonstrated that the highest performance model was the hybrid model of k-means and Neural Network. We expect that various military education programs could be supported by these classification models for better educational performance.

Prediction Model for Gastric Cancer via Class Balancing Techniques

  • Danish, Jamil ;Sellappan, Palaniappan;Sanjoy Kumar, Debnath;Muhammad, Naseem;Susama, Bagchi ;Asiah, Lokman
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.53-63
    • /
    • 2023
  • Many researchers are trying hard to minimize the incidence of cancers, mainly Gastric Cancer (GC). For GC, the five-year survival rate is generally 5-25%, but for Early Gastric Cancer (EGC), it is almost 90%. Predicting the onset of stomach cancer based on risk factors will allow for an early diagnosis and more effective treatment. Although there are several models for predicting stomach cancer, most of these models are based on unbalanced datasets, which favours the majority class. However, it is imperative to correctly identify cancer patients who are in the minority class. This research aims to apply three class-balancing approaches to the NHS dataset before developing supervised learning strategies: Oversampling (Synthetic Minority Oversampling Technique or SMOTE), Undersampling (SpreadSubsample), and Hybrid System (SMOTE + SpreadSubsample). This study uses Naive Bayes, Bayesian Network, Random Forest, and Decision Tree (C4.5) methods. We measured these classifiers' efficacy using their Receiver Operating Characteristics (ROC) curves, sensitivity, and specificity. The validation data was used to test several ways of balancing the classifiers. The final prediction model was built on the one that did the best overall.