• Title/Summary/Keyword: Naive Bayes algorithm

Search Result 75, Processing Time 0.022 seconds

An Efficient Algorithm for NaiveBayes with Matrix Transposition (행렬 전치를 이용한 효율적인 NaiveBayes 알고리즘)

  • Lee, Jae-Moon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.1
    • /
    • pp.117-124
    • /
    • 2004
  • This paper proposes an efficient algorithm of NaiveBayes without loss of its accuracy. The proposed method uses the transposition of category vectors, and minimizes the computation of the probability of NaiveBayes. The proposed method was implemented on the existing framework of the text categorization, so called, AI::Categorizer and it was compared with the conventional NaiveBayes with the well-known data, Router-21578. The comparisons show that the proposed method outperforms NaiveBayes about two times with respect to the executing time.

Naive Bayes Learning Algorithm based on Map-Reduce Programming Model (Map-Reduce 프로그래밍 모델 기반의 나이브 베이스 학습 알고리즘)

  • Kang, Dae-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.10a
    • /
    • pp.208-209
    • /
    • 2011
  • In this paper, we introduce a Naive Bayes learning algorithm for learning and reasoning in Map-Reduce model based environment. For this purpose, we use Apache Mahout to execute Distributed Naive Bayes on University of California, Irvine (UCI) benchmark data sets. From the experimental results, we see that Apache Mahout' s Distributed Naive Bayes algorithm is comparable to WEKA' s Naive Bayes algorithm in terms of performance. These results indicates that in the future Big Data environment, Map-Reduce model based systems such as Apache Mahout can be promising for machine learning usage.

  • PDF

Development of Incident Detection Algorithm Using Naive Bayes Classification (나이브 베이즈 분류기를 이용한 돌발상황 검지 알고리즘 개발)

  • Kang, Sunggwan;Kwon, Bongkyung;Kwon, Cheolwoo;Park, Sangmin;Yun, Ilsoo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.17 no.6
    • /
    • pp.25-39
    • /
    • 2018
  • The purpose of this study is to develop an efficient incident detection algorithm by applying machine learning, which is being widely used in the transport sector. As a first step, network of the target site was constructed with micro-simulation model. Secondly, data has been collected under various incident scenarios produced with combination of variables that are expected to affect the incident situation. And, detection results from both McMaster algorithm, a well known incident detection algorithm, and the Naive Bayes algorithm, developed in this study, were compared. As a result of comparison, Naive Bayes algorithm showed less negative effect and better detect rate (DR) than the McMaster algorithm. However, as DR increases, so did false alarm rate (FAR). Also, while McMaster algorithm detected in four cycles, Naive Bayes algorithm determine the situation with just one cycle, which increases DR but also seems to have increased FAR. Consequently it has been identified that the Naive Bayes algorithm has a great potential in traffic incident detection.

Breast Cancer Diagnosis using Naive Bayes Analysis Techniques (Naive Bayes 분석기법을 이용한 유방암 진단)

  • Park, Na-Young;Kim, Jang-Il;Jung, Yong-Gyu
    • Journal of Service Research and Studies
    • /
    • v.3 no.1
    • /
    • pp.87-93
    • /
    • 2013
  • Breast cancer is known as a disease that occurs in a lot of developed countries. However, in recent years, the incidence of Korea's modern woman is increased steadily. As well known, breast cancer usually occurs in women over 50. In the case of Korea, however, the incidence of 40s with young women is increased steadily than the West. Therefore, it is a very urgent task to build a manual to the accurate diagnosis of breast cancer in adult women in Korea. In this paper, we show how using data mining techniques to predict breast cancer. Data mining refers to the process of finding regular patterns or relationships among variables within the database. To this, sophisticated analysis using the model, you will find useful information that is easily revealed. In this paper, through experiments Deicion Tree Naive Bayes analysis techniques were compared using analysis techniques to diagnose breast cancer. Two algorithms was analyzed by applying C4.5 algorithm. Deicison Tree classification accuracy was fairly good. Naive Bayes classification method showed better accuracy compared to the Decision Tree method.

  • PDF

Detection of Malicious Code using Association Rule Mining and Naive Bayes classification (연관규칙 마이닝과 나이브베이즈 분류를 이용한 악성코드 탐지)

  • Ju, Yeongji;Kim, Byeongsik;Shin, Juhyun
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.11
    • /
    • pp.1759-1767
    • /
    • 2017
  • Although Open API has been invigorated by advancements in the software industry, diverse types of malicious code have also increased. Thus, many studies have been carried out to discriminate the behaviors of malicious code based on API data, and to determine whether malicious code is included in a specific executable file. Existing methods detect malicious code by analyzing signature data, which requires a long time to detect mutated malicious code and has a high false detection rate. Accordingly, in this paper, we propose a method that analyzes and detects malicious code using association rule mining and an Naive Bayes classification. The proposed method reduces the false detection rate by mining the rules of malicious and normal code APIs in the PE file and grouping patterns using the DHP(Direct Hashing and Pruning) algorithm, and classifies malicious and normal files using the Naive Bayes.

An Active Learning-based Method for Composing Training Document Set in Bayesian Text Classification Systems (베이지언 문서분류시스템을 위한 능동적 학습 기반의 학습문서집합 구성방법)

  • 김제욱;김한준;이상구
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.12
    • /
    • pp.966-978
    • /
    • 2002
  • There are two important problems in improving text classification systems based on machine learning approach. The first one, called "selection problem", is how to select a minimum number of informative documents from a given document collection. The second one, called "composition problem", is how to reorganize selected training documents so that they can fit an adopted learning method. The former problem is addressed in "active learning" algorithms, and the latter is discussed in "boosting" algorithms. This paper proposes a new learning method, called AdaBUS, which proactively solves the above problems in the context of Naive Bayes classification systems. The proposed method constructs more accurate classification hypothesis by increasing the valiance in "weak" hypotheses that determine the final classification hypothesis. Consequently, the proposed algorithm yields perturbation effect makes the boosting algorithm work properly. Through the empirical experiment using the Routers-21578 document collection, we show that the AdaBUS algorithm more significantly improves the Naive Bayes-based classification system than other conventional learning methodson system than other conventional learning methods

Identifying the Optimal Machine Learning Algorithm for Breast Cancer Prediction

  • ByungJoo Kim
    • International journal of advanced smart convergence
    • /
    • v.13 no.3
    • /
    • pp.80-88
    • /
    • 2024
  • Breast cancer remains a significant global health burden, necessitating accurate and timely detection for improved patient outcomes. Machine learning techniques have demonstrated remarkable potential in assisting breast cancer diagnosis by learning complex patterns from multi-modal patient data. This study comprehensively evaluates several popular machine learning models, including logistic regression, decision trees, random forests, support vector machines (SVMs), naive Bayes, k-nearest neighbors (KNN), XGBoost, and ensemble methods for breast cancer prediction using the Wisconsin Breast Cancer Dataset (WBCD). Through rigorous benchmarking across metrics like accuracy, precision, recall, F1-score, and area under the ROC curve (AUC), we identify the naive Bayes classifier as the top-performing model, achieving an accuracy of 0.974, F1-score of 0.979, and highest AUC of 0.988. Other strong performers include logistic regression, random forests, and XGBoost, with AUC values exceeding 0.95. Our findings showcase the significant potential of machine learning, particularly the robust naive Bayes algorithm, to provide highly accurate and reliable breast cancer screening from fine needle aspirate (FNA) samples, ultimately enabling earlier intervention and optimized treatment strategies.

Propositionalized Attribute Taxonomy Guided Naive Bayes Learning Algorithm (명제화된 어트리뷰트 택소노미를 이용하는 나이브 베이스 학습 알고리즘)

  • Kang, Dae-Ki;Cha, Kyung-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.12
    • /
    • pp.2357-2364
    • /
    • 2008
  • In this paper, we consider the problem of exploiting a taxonomy of propositionalized attributes in order to generate compact and robust classifiers. We introduce Propositionalized Attribute Taxonomy guided Naive Bayes Learner (PAT-NBL), an inductive learning algorithm that exploits a taxonomy of propositionalized attributes as prior knowledge to generate compact and accurate classifiers. PAT-NBL uses top-down and bottom-up search to find a locally optimal cut that corresponds to the instance space from propositionalized attribute taxonomy and data. Our experimental results on University of California-Irvine (UCI) repository data set, show that the proposed algorithm can generate a classifier that is sometimes comparably compact and accurate to those produced by standard Naive Bayes learners.

Text-independent Speaker Identification Using Soft Bag-of-Words Feature Representation

  • Jiang, Shuangshuang;Frigui, Hichem;Calhoun, Aaron W.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.240-248
    • /
    • 2014
  • We present a robust speaker identification algorithm that uses novel features based on soft bag-of-word representation and a simple Naive Bayes classifier. The bag-of-words (BoW) based histogram feature descriptor is typically constructed by summarizing and identifying representative prototypes from low-level spectral features extracted from training data. In this paper, we define a generalization of the standard BoW. In particular, we define three types of BoW that are based on crisp voting, fuzzy memberships, and possibilistic memberships. We analyze our mapping with three common classifiers: Naive Bayes classifier (NB); K-nearest neighbor classifier (KNN); and support vector machines (SVM). The proposed algorithms are evaluated using large datasets that simulate medical crises. We show that the proposed soft bag-of-words feature representation approach achieves a significant improvement when compared to the state-of-art methods.

Selecting Machine Learning Model Based on Natural Language Processing for Shanghanlun Diagnostic System Classification (자연어 처리 기반 『상한론(傷寒論)』 변병진단체계(辨病診斷體系) 분류를 위한 기계학습 모델 선정)

  • Young-Nam Kim
    • 대한상한금궤의학회지
    • /
    • v.14 no.1
    • /
    • pp.41-50
    • /
    • 2022
  • Objective : The purpose of this study is to explore the most suitable machine learning model algorithm for Shanghanlun diagnostic system classification using natural language processing (NLP). Methods : A total of 201 data items were collected from 『Shanghanlun』 and 『Clinical Shanghanlun』, 'Taeyangbyeong-gyeolhyung' and 'Eumyangyeokchahunobokbyeong' were excluded to prevent oversampling or undersampling. Data were pretreated using a twitter Korean tokenizer and trained by logistic regression, ridge regression, lasso regression, naive bayes classifier, decision tree, and random forest algorithms. The accuracy of the models were compared. Results : As a result of machine learning, ridge regression and naive Bayes classifier showed an accuracy of 0.843, logistic regression and random forest showed an accuracy of 0.804, and decision tree showed an accuracy of 0.745, while lasso regression showed an accuracy of 0.608. Conclusions : Ridge regression and naive Bayes classifier are suitable NLP machine learning models for the Shanghanlun diagnostic system classification.

  • PDF