• Title/Summary/Keyword: naive bayes

Search Result 238, Processing Time 0.022 seconds

COVID_19 fake news and real news discrimination system (코로나19 가짜뉴스와 진짜뉴스 판별 시스템)

  • Lee, Jimin;Lee, Jisun;Woo, Jiyoung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.411-412
    • /
    • 2022
  • 본 논문에서는 코로나19 뉴스와 코로나19 가짜뉴스의 데이터셋을 활용하여 입력 받은 뉴스가 가짜뉴스일 확률을 예측한다. 가짜 뉴스 본문에는 코로나19, 대통령, 정부, 가짜, 언론 등의 키워드의 빈도가 높았다. 위의 키워드를 토대로 나이브 베이즈 모델링을 하여 이를 적용해 가짜 뉴스를 가려내는 웹페이지를 개발하였다.

  • PDF

The Comparison Study on Observational Before-After Studies: Case Study on Safety Evaluation on Highways (관찰적 사전·사후 평가연구 방법의 비교 연구: 공용중인 고속도로 안전진단사업 효과평가를 사례로)

  • Mun, Sung Ra;Lee, Young-Ihn
    • Journal of Korean Society of Transportation
    • /
    • v.31 no.6
    • /
    • pp.67-89
    • /
    • 2013
  • This study is to perform empirical analysis on observational before-after studies in Naive Method, Comparison Group(CG) Method and Empirical Bayes(EB) Method, and to compare with their results and to propose ways to apply to evaluation researches. For this purpose, the evaluation of road safety audit executed on Y$\breve{o}$ng-dong freeway in 2005 and 2006 was performed. As a result, all three methods have showed improved effects due to safety treatments. The safety effectiveness of Naive method is the largest, CG Method is the second and EB method is the last. The results of Naive method are overestimated due to the trend of reducing traffic accidents and those of CG method are affected by the external casual effects of comparison group. In the EB method, as "regression to the mean" phenomenon are controlled by reference group's accident model, it's result is relatively more accurate than that of other methods. In the conduct of evaluation studies, the analysts have to understand the pros and cons of each evaluation method. And after leading the survey on accident trends of related all sites, evaluation analysis is performed to be able to minimize bias.

Prediction Model of Hypertension Using Sociodemographic Characteristics Based on Machine Learning (머신러닝 기반 사회인구학적 특징을 이용한 고혈압 예측모델)

  • Lee, Bum Ju
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.541-546
    • /
    • 2021
  • Recently, there is a trend of developing various identification and prediction models for hypertension using clinical information based on artificial intelligence and machine learning around the world. However, most previous studies on identification or prediction models of hypertension lack the consideration of the ideas of non-invasive and cost-effective variables, race, region, and countries. Therefore, the objective of this study is to present hypertension prediction model that is easily understood using only general and simple sociodemographic variables. Data used in this study was based on the Korea National Health and Nutrition Examination Survey (2018). In men, the model using the naive Bayes with the wrapper-based feature subset selection method showed the highest predictive performance (ROC = 0.790, kappa = 0.396). In women, the model using the naive Bayes with correlation-based feature subset selection method showed the strongest predictive performance (ROC = 0.850, kappa = 0.495). We found that the predictive performance of hypertension based on only sociodemographic variables was higher in women than in men. We think that our models based on machine leaning may be readily used in the field of public health and epidemiology in the future because of the use of simple sociodemographic characteristics.

A Study on Statistical Feature Selection with Supervised Learning for Word Sense Disambiguation (단어 중의성 해소를 위한 지도학습 방법의 통계적 자질선정에 관한 연구)

  • Lee, Yong-Gu
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.22 no.2
    • /
    • pp.5-25
    • /
    • 2011
  • This study aims to identify the most effective statistical feature selecting method and context window size for word sense disambiguation using supervised methods. In this study, features were selected by four different methods: information gain, document frequency, chi-square, and relevancy. The result of weight comparison showed that identifying the most appropriate features could improve word sense disambiguation performance. Information gain was the highest. SVM classifier was not affected by feature selection and showed better performance in a larger feature set and context size. Naive Bayes classifier was the best performance on 10 percent of feature set size. kNN classifier on under 10 percent of feature set size. When feature selection methods are applied to word sense disambiguation, combinations of a small set of features and larger context window size, or a large set of features and small context windows size can make best performance improvements.

Web Mining Using Fuzzy Integration of Multiple Structure Adaptive Self-Organizing Maps (다중 구조적응 자기구성지도의 퍼지결합을 이용한 웹 마이닝)

  • 김경중;조성배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.1
    • /
    • pp.61-70
    • /
    • 2004
  • It is difficult to find an appropriate web site because exponentially growing web contains millions of web documents. Personalization of web search can be realized by recommending proper web sites using user profile but more efficient method is needed for estimating preference because user's evaluation on web contents presents many aspects of his characteristics. As user profile has a property of non-linearity, estimation by classifier is needed and combination of classifiers is necessary to anticipate diverse properties. Structure adaptive self-organizing map (SASOM) that is suitable for Pattern classification and visualization is an enhanced model of SOM and might be useful for web mining. Fuzzy integral is a combination method using classifiers' relevance that is defined subjectively. In this paper, estimation of user profile is conducted by using ensemble of SASOM's teamed independently based on fuzzy integral and evaluated by Syskill & Webert UCI benchmark data. Experimental results show that the proposed method performs better than previous naive Bayes classifier as well as voting of SASOM's.

Rank-based Multiclass Gene Selection for Cancer Classification with Naive Bayes Classifiers based on Gene Expression Profiles (나이브 베이스 분류기를 이용한 유전발현 데이타기반 암 분류를 위한 순위기반 다중클래스 유전자 선택)

  • Hong, Jin-Hyuk;Cho, Sung-Bae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.8
    • /
    • pp.372-377
    • /
    • 2008
  • Multiclass cancer classification has been actively investigated based on gene expression profiles, where it determines the type of cancer by analyzing the large amount of gene expression data collected by the DNA microarray technology. Since gene expression data include many genes not related to a target cancer, it is required to select informative genes in order to obtain highly accurate classification. Conventional rank-based gene selection methods often use ideal marker genes basically devised for binary classification, so it is difficult to directly apply them to multiclass classification. In this paper, we propose a novel method for multiclass gene selection, which does not use ideal marker genes but directly analyzes the distribution of gene expression. It measures the class-discriminability by discretizing gene expression levels into several regions and analyzing the frequency of training samples for each region, and then classifies samples by using the naive Bayes classifier. We have demonstrated the usefulness of the proposed method for various representative benchmark datasets of multiclass cancer classification.

Relation Based Bayesian Network for NBNN

  • Sun, Mingyang;Lee, YoonSeok;Yoon, Sung-eui
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.4
    • /
    • pp.204-213
    • /
    • 2015
  • Under the conditional independence assumption among local features, the Naive Bayes Nearest Neighbor (NBNN) classifier has been recently proposed and performs classification without any training or quantization phases. While the original NBNN shows high classification accuracy without adopting an explicit training phase, the conditional independence among local features is against the compositionality of objects indicating that different, but related parts of an object appear together. As a result, the assumption of the conditional independence weakens the accuracy of classification techniques based on NBNN. In this work, we look into this issue, and propose a novel Bayesian network for an NBNN based classification to consider the conditional dependence among features. To achieve our goal, we extract a high-level feature and its corresponding, multiple low-level features for each image patch. We then represent them based on a simple, two-level layered Bayesian network, and design its classification function considering our Bayesian network. To achieve low memory requirement and fast query-time performance, we further optimize our representation and classification function, named relation-based Bayesian network, by considering and representing the relationship between a high-level feature and its low-level features into a compact relation vector, whose dimensionality is the same as the number of low-level features, e.g., four elements in our tests. We have demonstrated the benefits of our method over the original NBNN and its recent improvement, and local NBNN in two different benchmarks. Our method shows improved accuracy, up to 27% against the tested methods. This high accuracy is mainly due to consideration of the conditional dependences between high-level and its corresponding low-level features.

Word Sense Similarity Clustering Based on Vector Space Model and HAL (벡터 공간 모델과 HAL에 기초한 단어 의미 유사성 군집)

  • Kim, Dong-Sung
    • Korean Journal of Cognitive Science
    • /
    • v.23 no.3
    • /
    • pp.295-322
    • /
    • 2012
  • In this paper, we cluster similar word senses applying vector space model and HAL (Hyperspace Analog to Language). HAL measures corelation among words through a certain size of context (Lund and Burgess 1996). The similarity measurement between a word pair is cosine similarity based on the vector space model, which reduces distortion of space between high frequency words and low frequency words (Salton et al. 1975, Widdows 2004). We use PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) to reduce a large amount of dimensions caused by similarity matrix. For sense similarity clustering, we adopt supervised and non-supervised learning methods. For non-supervised method, we use clustering. For supervised method, we use SVM (Support Vector Machine), Naive Bayes Classifier, and Maximum Entropy Method.

  • PDF

A Study on Classification of Medical Information Documents using Word Correlation (색인어 연관성을 이용한 의료정보문서 분류에 관한 연구)

  • Lim, Hyeong-Geon;Jang, Duk-Sung
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.469-476
    • /
    • 2001
  • As the service of information through web system increases in modern society, many questions and consultations are going on through Home page and E-mail in the hospital. But there are some burdens for the management and postponements for answering the questions. In this paper, we investigate the document classification methods as a primary research of the auto-answering system. On the basis of 1200 documents which are questions of patients, 66% are used for the learning documents and 34% for test documents. All of are also used for the document classification using NBC (Naive Bayes Classifier), common words and coefficient of correlation. As the result of the experiments, the two methods proposed in this paper, that is, common words and coefficient of correlation are higher as much as 3% and 5% respectively than the basic NBC methods. This result shows that the correlation between indexes and categories is more effective than the word frequency in the document classification.

  • PDF

Video-based Facial Emotion Recognition using Active Shape Models and Statistical Pattern Recognizers (Active Shape Model과 통계적 패턴인식기를 이용한 얼굴 영상 기반 감정인식)

  • Jang, Gil-Jin;Jo, Ahra;Park, Jeong-Sik;Seo, Yong-Ho
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.3
    • /
    • pp.139-146
    • /
    • 2014
  • This paper proposes an efficient method for automatically distinguishing various facial expressions. To recognize the emotions from facial expressions, the facial images are obtained by digital cameras, and a number of feature points were extracted. The extracted feature points are then transformed to 49-dimensional feature vectors which are robust to scale and translational variations, and the facial emotions are recognized by statistical pattern classifiers such Naive Bayes, MLP (multi-layer perceptron), and SVM (support vector machine). Based on the experimental results with 5-fold cross validation, SVM was the best among the classifiers, whose performance was obtained by 50.8% for 6 emotion classification, and 78.0% for 3 emotions.