• Title/Summary/Keyword: Naive Bayes 분류

Search Result 132, Processing Time 0.023 seconds

Prediction model of peptic ulcer diseases in middle-aged and elderly adults based on machine learning (머신러닝 기반 중노년층의 기능성 위장장애 예측 모델 구현)

  • Lee, Bum Ju
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.4
    • /
    • pp.289-294
    • /
    • 2020
  • Peptic ulcer disease is a gastrointestinal disorder caused by Helicobacter pylori infection and the use of nonsteroid anti-inflammatory drugs. While many studies have been conducted to find the risk factors of peptic ulcers, there are no studies on the suggestion of peptic ulcer prediction models for Koreans. Therefore, the purpose of this study is to implement peptic ulcer prediction model using machine learning based on demographic information, obesity information, blood information, and nutritional information for middle-aged and elderly people. For model building, wrapper-based variable selection method and naive Bayes algorithm were used. The classification accuracy of the female prediction model was the area under the receiver operating characteristics curve (AUC) of 0.712, and males showed an AUC of 0.674, which is lower than that of females. These results can be used for prediction and prevention of peptic ulcers in the middle and elderly people.

A Hierarchical CPV Solar Generation Tracking System based on Modular Bayesian Network (베이지안 네트워크 기반 계층적 CPV 태양광 추적 시스템)

  • Park, Susang;Yang, Kyon-Mo;Cho, Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.41 no.7
    • /
    • pp.481-491
    • /
    • 2014
  • The power production using renewable energy is more important because of a limited amount of fossil fuel and the problem of global warming. A concentrative photovoltaic system comes into the spotlight with high energy production, since the rate of power production using solar energy is proliferated. These systems, however, need to sophisticated tracking methods to give the high power production. In this paper, we propose a hierarchical tracking system using modular Bayesian networks and a naive Bayes classifier. The Bayesian networks can respond flexibly in uncertain situations and can be designed by domain knowledge even when the data are not enough. Bayesian network modules infer the weather states which are classified into nine classes. Then, naive Bayes classifier selects the most effective method considering inferred weather states and the system makes a decision using the rules. We collected real weather data for the experiments and the average accuracy of the proposed method is 93.9%. In addition, comparing the photovoltaic efficiency with the pinhole camera system results in improved performance of about 16.58%.

Sensitivity Identification Method for New Words of Social Media based on Naive Bayes Classification (나이브 베이즈 기반 소셜 미디어 상의 신조어 감성 판별 기법)

  • Kim, Jeong In;Park, Sang Jin;Kim, Hyoung Ju;Choi, Jun Ho;Kim, Han Il;Kim, Pan Koo
    • Smart Media Journal
    • /
    • v.9 no.1
    • /
    • pp.51-59
    • /
    • 2020
  • From PC communication to the development of the internet, a new term has been coined on the social media, and the social media culture has been formed due to the spread of smart phones, and the newly coined word is becoming a culture. With the advent of social networking sites and smart phones serving as a bridge, the number of data has increased in real time. The use of new words can have many advantages, including the use of short sentences to solve the problems of various letter-limited messengers and reduce data. However, new words do not have a dictionary meaning and there are limitations and degradation of algorithms such as data mining. Therefore, in this paper, the opinion of the document is confirmed by collecting data through web crawling and extracting new words contained within the text data and establishing an emotional classification. The progress of the experiment is divided into three categories. First, a word collected by collecting a new word on the social media is subjected to learned of affirmative and negative. Next, to derive and verify emotional values using standard documents, TF-IDF is used to score noun sensibilities to enter the emotional values of the data. As with the new words, the classified emotional values are applied to verify that the emotions are classified in standard language documents. Finally, a combination of the newly coined words and standard emotional values is used to perform a comparative analysis of the technology of the instrument.

Study on Automatic Bug Triage using Deep Learning (딥 러닝을 이용한 버그 담당자 자동 배정 연구)

  • Lee, Sun-Ro;Kim, Hye-Min;Lee, Chan-Gun;Lee, Ki-Seong
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1156-1164
    • /
    • 2017
  • Existing studies on automatic bug triage were mostly used the method of designing the prediction system based on the machine learning algorithm. Therefore, it can be said that applying a high-performance machine learning model is the core of the performance of the automatic bug triage system. In the related research, machine learning models that have high performance are mainly used, such as SVM and Naïve Bayes. In this paper, we apply Deep Learning, which has recently shown good performance in the field of machine learning, to automatic bug triage and evaluate its performance. Experimental results show that the Deep Learning based Bug Triage system achieves 48% accuracy in active developer experiments, un improvement of up to 69% over than conventional machine learning techniques.

Generation and Selection of Nominal Virtual Examples for Improving the Classifier Performance (분류기 성능 향상을 위한 범주 속성 가상예제의 생성과 선별)

  • Lee, Yu-Jung;Kang, Byoung-Ho;Kang, Jae-Ho;Ryu, Kwang-Ryel
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1052-1061
    • /
    • 2006
  • This paper presents a method of using virtual examples to improve the classification accuracy for data with nominal attributes. Most of the previous researches on virtual examples focused on data with numeric attributes, and they used domain-specific knowledge to generate useful virtual examples for a particularly targeted learning algorithm. Instead of using domain-specific knowledge, our method samples virtual examples from a naive Bayesian network constructed from the given training set. A sampled example is considered useful if it contributes to the increment of the network's conditional likelihood when added to the training set. A set of useful virtual examples can be collected by repeating this process of sampling followed by evaluation. Experiments have shown that the virtual examples collected this way.can help various learning algorithms to derive classifiers of improved accuracy.

Investigating the Performance of Bayesian-based Feature Selection and Classification Approach to Social Media Sentiment Analysis (소셜미디어 감성분석을 위한 베이지안 속성 선택과 분류에 대한 연구)

  • Chang Min Kang;Kyun Sun Eo;Kun Chang Lee
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.1-19
    • /
    • 2022
  • Social media-based communication has become crucial part of our personal and official lives. Therefore, it is no surprise that social media sentiment analysis has emerged an important way of detecting potential customers' sentiment trends for all kinds of companies. However, social media sentiment analysis suffers from huge number of sentiment features obtained in the process of conducting the sentiment analysis. In this sense, this study proposes a novel method by using Bayesian Network. In this model MBFS (Markov Blanket-based Feature Selection) is used to reduce the number of sentiment features. To show the validity of our proposed model, we utilized online review data from Yelp, a famous social media about restaurant, bars, beauty salons evaluation and recommendation. We used a number of benchmarking feature selection methods like correlation-based feature selection, information gain, and gain ratio. A number of machine learning classifiers were also used for our validation tasks, like TAN, NBN, Sons & Spouses BN (Bayesian Network), Augmented Markov Blanket. Furthermore, we conducted Bayesian Network-based what-if analysis to see how the knowledge map between target node and related explanatory nodes could yield meaningful glimpse into what is going on in sentiments underlying the target dataset.

An Efficient kNN Algorithm (효율적인 kNN 알고리즘)

  • Lee Jae Moon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.7 s.96
    • /
    • pp.849-854
    • /
    • 2004
  • This paper proposes an algorithm to enhance the execution time of kNN in the document classification. The proposed algorithm is to enhance the execution time by minimizing the computing cost of the similarity between two documents by using the list of pairs, while the conventional kNN uses the iist of pairs. The 1ist of pairs can be obtained by applying the matrix transposition to the list of pairs at the training phase of the document classification. This paper analyzed the proposed algorithm in the time complexity and compared it with the conventional kNN. And it compared the proposed algorithm with the conventional kNN by using routers-21578 data experimentally. The experimental results show that the proposed algorithm outperforms kNN about $90{\%}$ in terms of the ex-ecution time.

Performance Comparison of Statistics-Based Machine Learning Model for Classification of Technical Documents (기술문서 분류를 위한 통계기반 기계학습 모델 성능비교 및 한계 연구)

  • Kim, Jin-gu;Yu, Heonchang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.393-396
    • /
    • 2022
  • 본 연구는 국방과학기술 분야의 특허 및 논문 실적을 이용하여 통계기반 기계학습 모델 4 종을 학습하고, 실제 분석 대상기관의 데이터 입력결과를 분석하여 실용성에 대한 한계점 분석을 목적으로 한다. 기존 연구에서는 특허분류코드를 기준으로 분류하여 특수 목적으로 활용하거나 세부 연구 범위 내 연구 주제탐색 및 특징연구 등 미시적인 관점에서의 상세연구 활용 목적인 반면, 본 연구는 거시적인 관점에서 연구의 전체적인 흐름과 경향성 파악을 목적으로 한다. 이에 ICT 기술 138 종의 특허 및 논문 30,965 건과 국방과학기술 192 종의 특허 및 논문 23,406 건을 학습데이터로 각 모델을 학습하였다. 비교한 통계기반 학습모델은 Support Vector Machines, Decision Tree, Naive Bayes, XGBoost 모델이다. 학습데이터에 대한 학습검증 단계에서는 최대 99.4%의 성능을 보였다. 다만, 실제 분석대상기관의 특허 및 논문 12,824 건으로 입력분석한 결과, 모델별 편향성 문제, 데이터 전처리 이슈, 다중클래스 및 다중레이블 문제를 확인, 도출한 문제에 대한 해결방안을 제시하고 추가 연구의 방향성을 제시한다.

정책기반의 새로운 공격 탐지 방법

  • 김형훈
    • Review of KIISC
    • /
    • v.13 no.1
    • /
    • pp.64-67
    • /
    • 2003
  • 컴퓨팅 환경이 보다 신뢰성 있고 실질적으로 사용되기 위해서는 보안이 필수적인 기능으로 요구된다. 알려진 공격의 패턴을 이용한 침입탐지는 공격자의 여러 가지 변형된 방법이나 새로운 공격 방법에 의해 쉽게 공격당할 수 있다. 또한 각각의 보안정책을 교묘히 회피하는 많은 공격 방법들이 수시로 개발되어 시도되고 있다. 따라서 침입에 성공하는 많은 공격들은 기존의 공격 패턴과 보안정책 사이의 허점을 이용하여 발생된다고 볼 수 있다. 본 논문에서 제안된 방법은 새로운 공격을 탐지하기 위해 이를 탐지하기 위한 특징값을 규칙집합을 통해 획득한다. 규칙집합은 알려진 공격, 보안정책과 관리자의 경험적 지식에 대한 분석을 통해 공격의 특징을 감지할 수 있도록 작성된다. 이러한 규칙집합에 의해 획득된 특징값들은 훈련단계에서 Naive Bayes 분류기법을 통해 공격에 대한 통계적 특징값으로 사용한다. 제안된 방법은 훈련단계에서 얻어진 공격에 대한 통계적 특징값을 이용하여 변형된 공격이 나 새로운 공격을 탐지할 수 있다.

An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning (기계학습에 기초한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.37-62
    • /
    • 2018
  • This study examined the factors affecting the performance of automatic classification based on machine learning for domestic journal articles in the field of LIS. In particular, In view of the classification performance that assigning automatically the class labels to the articles in "Journal of the Korean Society for Information Management", I investigated the characteristics of the key factors(weighting schemes, training set size, classification algorithms, label assigning methods) through the diversified experiments. Consequently, It is effective to apply each element appropriately according to the classification environment and the characteristics of the document set, and a fairly good performance can be obtained by using a simpler model. In addition, the classification of domestic journals can be considered as a multi-label classification that assigns more than one category to a specific article. Therefore, I proposed an optimal classification model using simple and fast classification algorithm and small learning set considering this environment.