• Title/Summary/Keyword: naive bayes

Search Result 237, Processing Time 0.027 seconds

Data mining Algorithms for the Development of Sasang Type Diagnosis (사상체질 진단검사를 위한 데이터마이닝 알고리즘 연구)

  • Hong, Jin-Woo;Kim, Young-In;Park, So-Jung;Kim, Byoung-Chul;Eom, Il-Kyu;Hwang, Min-Woo;Shin, Sang-Woo;Kim, Byung-Joo;Kwon, Young-Kyu;Chae, Han
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.23 no.6
    • /
    • pp.1234-1240
    • /
    • 2009
  • This study was to compare the effectiveness and validity of various data-mining algorithm for Sasang type diagnostic test. We compared the sensitivity and specificity index of nine attribute selection and eleven class classification algorithms with 31 data-set characterizing Sasang typology and 10-fold validation methods installed in Waikato Environment Knowledge Analysis (WEKA). The highest classification validity score can be acquired as follows; 69.9 as Percentage Correctly Predicted index with Naive Bayes Classifier, 80 as sensitivity index with LWL/Tae-Eum type, 93.5 as specificity index with Naive Bayes Classifier/So-Eum type. The classification algorithm with highest PCP index of 69.62 after attribute selection was Naive Bayes Classifier. In this study we can find that the best-fit algorithm for traditional medicine is case sensitive and that characteristics of clinical circumstances, and data-mining algorithms and study purpose should be considered to get the highest validity even with the well defined data sets. It is also confirmed that we can't find one-fits-all algorithm and there should be many studies with trials and errors. This study will serve as a pivotal foundation for the development of medical instruments for Pattern Identification and Sasang type diagnosis on the basis of traditional Korean Medicine.

Android Malware Analysis Technology Research Based on Naive Bayes (Naive Bayes 기반 안드로이드 악성코드 분석 기술 연구)

  • Hwang, Jun-ho;Lee, Tae-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.5
    • /
    • pp.1087-1097
    • /
    • 2017
  • As the penetration rate of smartphones increases, the number of malicious codes targeting smartphones is increasing. I 360 Security 's smartphone malware statistics show that malicious code increased 437 percent in the first quarter of 2016 compared to the fourth quarter of 2015. In particular, malicious applications, which are the main means of distributing malicious code on smartphones, are aimed at leakage of user information, data destruction, and money withdrawal. Often, it is operated by an API, which is an interface that allows you to control the functions provided by the operating system or programming language. In this paper, we propose a mechanism to detect malicious application based on the similarity of API pattern in normal application and malicious application by learning pattern of API in application derived from static analysis. In addition, we show a technique for improving the detection rate and detection rate for each label derived by using the corresponding mechanism for the sample data. In particular, in the case of the proposed mechanism, it is possible to detect when the API pattern of the new malicious application is similar to the previously learned patterns at a certain level. Future researches of various features of the application and applying them to this mechanism are expected to be able to detect new malicious applications of anti-malware system.

An Analysis of the Characteristics of Companies introducing Smart Factory System Using Data Mining Technique (데이터 마이닝 기법을 활용한 스마트팩토리 도입 기업의 특성 분석)

  • Oh, Jeong-yoon;Choi, Sang-hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.5
    • /
    • pp.179-189
    • /
    • 2018
  • Currently, research on smart factories is steadily being carried out in terms of implementation strategies and considerations in construction. Various studies have not been conducted on companies that introduced smart factories. This study conducted a questionnaire survey for SMEs applying the basic stage of smart factory. And the cluster analysis was conducted to examine the characteristics of the company. In addition, we conducted Decision Tree and Naive Bay to examine how the characteristics of a company are derived and compare the results. As a result of the cluster analysis, it was confirmed that the group was divided into the high satisfaction group and the low satisfaction group. The decision tree and the Naive Bay analysis showed that the higher satisfaction group has high productivity.

Performance Analysis of Machine Learning Algorithms for Application Traffic Classification (애플리케이션 트래픽 분류를 위한 머신러닝 알고리즘 성능 분석)

  • Kim, Sung-Yun;Kim, Myung-Sup
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.05a
    • /
    • pp.968-970
    • /
    • 2008
  • 기존에 트래픽 분류 방법으로 payload 분석이나 well-known port를 이용한 방법을 많이 사용했다. 하지만 동적으로 변하는 애플리케이션이 늘어남에 따라 기존 방법으로 애플리케이션 트래픽 분류가 어렵다. 이러한 문제의 대안으로 Machine Learning(ML) 알고리즘을 이용한 애플리케이션 트래픽 분류방법이 연구되고 있다. 기존의 논문에서는 일정 시간동안 수집한 data set을 사용하기 때문에 적게 발생한 애플리케이션은 제대로 분류하지 못하여도 전체적으로는 좋은 성능을 보일 수 있다. 본 논문에서는 이러한 문제를 해결하기 위해 각 애플리케이션마다 동일한 수의 data set을 수집하여 애플리케이션 트래픽을 분류하는 방법을 제시한다. ML 알고리즘 중 J48, REPTree, BayesNet, NaiveBayes, Multilayer Perceptron 알고리즘을 이용하여 애플리케이션 트래픽 분류의 정확도를 비교한다.

Enhancing E-commerce Security: A Comprehensive Approach to Real-Time Fraud Detection

  • Sara Alqethami;Badriah Almutanni;Walla Aleidarousr
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.4
    • /
    • pp.1-10
    • /
    • 2024
  • In the era of big data, the growth of e-commerce transactions brings forth both opportunities and risks, including the threat of data theft and fraud. To address these challenges, an automated real-time fraud detection system leveraging machine learning was developed. Four algorithms (Decision Tree, Naïve Bayes, XGBoost, and Neural Network) underwent comparison using a dataset from a clothing website that encompassed both legitimate and fraudulent transactions. The dataset exhibited an imbalance, with 9.3% representing fraud and 90.07% legitimate transactions. Performance evaluation metrics, including Recall, Precision, F1 Score, and AUC ROC, were employed to assess the effectiveness of each algorithm. XGBoost emerged as the top-performing model, achieving an impressive accuracy score of 95.85%. The proposed system proves to be a robust defense mechanism against fraudulent activities in e-commerce, thereby enhancing security and instilling trust in online transactions.

Urdu News Classification using Application of Machine Learning Algorithms on News Headline

  • Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.2
    • /
    • pp.229-237
    • /
    • 2021
  • Our modern 'information-hungry' age demands delivery of information at unprecedented fast rates. Timely delivery of noteworthy information about recent events can help people from different segments of life in number of ways. As world has become global village, the flow of news in terms of volume and speed demands involvement of machines to help humans to handle the enormous data. News are presented to public in forms of video, audio, image and text. News text available on internet is a source of knowledge for billions of internet users. Urdu language is spoken and understood by millions of people from Indian subcontinent. Availability of online Urdu news enable this branch of humanity to improve their understandings of the world and make their decisions. This paper uses available online Urdu news data to train machines to automatically categorize provided news. Various machine learning algorithms were used on news headline for training purpose and the results demonstrate that Bernoulli Naïve Bayes (Bernoulli NB) and Multinomial Naïve Bayes (Multinomial NB) algorithm outperformed other algorithms in terms of all performance parameters. The maximum level of accuracy achieved for the dataset was 94.278% by multinomial NB classifier followed by Bernoulli NB classifier with accuracy of 94.274% when Urdu stop words were removed from dataset. The results suggest that short text of headlines of news can be used as an input for text categorization process.

Slangs and Short forms of Malay Twitter Sentiment Analysis using Supervised Machine Learning

  • Yin, Cheng Jet;Ayop, Zakiah;Anawar, Syarulnaziah;Othman, Nur Fadzilah;Zainudin, Norulzahrah Mohd
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.11
    • /
    • pp.294-300
    • /
    • 2021
  • The current society relies upon social media on an everyday basis, which contributes to finding which of the following supervised machine learning algorithms used in sentiment analysis have higher accuracy in detecting Malay internet slang and short forms which can be offensive to a person. This paper is to determine which of the algorithms chosen in supervised machine learning with higher accuracy in detecting internet slang and short forms. To analyze the results of the supervised machine learning classifiers, we have chosen two types of datasets, one is political topic-based, and another same set but is mixed with 50 tweets per targeted keyword. The datasets are then manually labelled positive and negative, before separating the 275 tweets into training and testing sets. Naïve Bayes and Random Forest classifiers are then analyzed and evaluated from their performances. Our experiment results show that Random Forest is a better classifier compared to Naïve Bayes.

A Classification Model for Predicting the Injured Body Part in Construction Accidents in Korea

  • Lim, Jiseon;Cho, Sungjin;Kang, Sanghyeok
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.230-237
    • /
    • 2022
  • It is difficult to predict industrial accidents in the construction industry because many accident factors, such as human-related factors and environment-related factors, affect the accidents. Many studies have analyzed the severity of injuries and types of accidents; however, there were few studies on the prediction of injured body parts. This study aims to develop a classification model to predict the part of the injured body based on accident-related factors. Construction accident cases from June 2018 to July 2021 provided by the Korea Construction Safety Management Integrated Information were collected through web crawling and then preprocessed. A naïve Bayes classifier, one of the supervised learning algorithms, was employed to construct a classification model of the injured body part, which has four categories: 1) torso, 2) upper extremity, 3) head, and 4) lower extremity. The predictor variables are accident type, type of work, facility type, injury source, and activity type. As a result, the average accuracy for each injured body part was 50.4%. The accuracy of the upper extremity and lower extremity was relatively higher than the cases of the torso and head. Unlike the other classifications, such as spam mail filtering, a naïve Bayes classifier does not provide a good classification performance in construction accidents. The reasons are discussed in the study. Based on the results of this study, more detailed guidelines for construction safety management can be provided, which help establish safety measures at the construction site.

  • PDF

Emotional States Recognition of Text Data Using Hidden Markov Models (HMM을 이용한 채팅 텍스트로부터의 화자 감정상태 분석)

  • 문현구;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.127-129
    • /
    • 2001
  • 입력된 문장을 분석하여 미리 정해진 범주에 따라 그 문장의 감정 상태의 천이를 출력해 주는 감정인식 시스템을 제안한다. Naive Bayes 알고리즘을 사용했던 이전 방법과 달리 새로 연구된 시스템은 Hidden Markov Model(HMM)을 사용한다. HMM은 특정 분포로 발생하는 현상에서 그 현상의 원인이 되는 상태의 천이를 찾아내는데 적합한 방법으로서, 하나의 문장에 여러 가지 감정이 표현된다는 가정 하에 감정인식에 관한 이상적인 알고리즘이라 할 수 있다. 본 논문에서는 HMM을 사용한 감정인식 시스템에 관한 개요를 설명하고 이전 버전에 비해 보다 향상된 실험결과를 보여준다.

  • PDF

COVID_19 fake news and real news discrimination system (코로나19 가짜뉴스와 진짜뉴스 판별 시스템)

  • Lee, Jimin;Lee, Jisun;Woo, Jiyoung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.411-412
    • /
    • 2022
  • 본 논문에서는 코로나19 뉴스와 코로나19 가짜뉴스의 데이터셋을 활용하여 입력 받은 뉴스가 가짜뉴스일 확률을 예측한다. 가짜 뉴스 본문에는 코로나19, 대통령, 정부, 가짜, 언론 등의 키워드의 빈도가 높았다. 위의 키워드를 토대로 나이브 베이즈 모델링을 하여 이를 적용해 가짜 뉴스를 가려내는 웹페이지를 개발하였다.

  • PDF