• Title/Summary/Keyword: $na{\ddot{i}}ve$ Bayesian classifier model

Search Result 4, Processing Time 0.017 seconds

Extending Data Model of $Na\ddot{i}ve-Bayesian$ Classifier in e-Catalog Classification (전자 카탈로그 자동분류에서 $Na\ddot{i}ve-Bayesian$ Classifier 데이터 모델 확장)

  • Kim Sung-hwan;Kim Hyun-chul;Lee Tae-hee;Lee Sang-goo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.100-102
    • /
    • 2005
  • 인터넷 환경에서의 B2B Market Place의 출현은 판매자와 구매자와의 다자간 거래를 가능하게 하였다. 이러한 기반에서 상품정보를 포함하는 전자 카탈로그의 활용은 나날이 증가하고 있다. 그러나 동일한 상품에 대한 분류체계와 기준이 다르므로 전자카탈로그에 대한 재분류는 고비용을 초래하는 필수 불가결한 문제로 남게 되었다. 본 연구에서는 이러한 문제를 해결하기 위해 기계학습 기법을 이용한 $Na\ddot{i}ve$ Bayesian classifier 모델을 사용하였다 학습 데이터를 생성해야 하는 $Na\ddot{i}ve$ Bayesian 알고리즘 적용 시 전자 카탈로그는 일반 문서보다 상대적으로 학습 정보가 적으므로 데이터 모델의 확장을 통해 학습 정보를 생성하여 이러한 단점을 보완하였다. 전자 카탈로그 자동분류에 있어서 효과적이고 풍부한 양의 학습 데이터를 생성하는 것이 분류 정확도 향상에 중요한 영향을 미침을 실험을 통해 확인하였다.

  • PDF

A novel nomogram of naïve Bayesian model for prevalence of cardiovascular disease

  • Kang, Eun Jin;Kim, Hyun Ji;Lee, Jea Young
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.3
    • /
    • pp.297-306
    • /
    • 2018
  • Cardiovascular disease (CVD) is the leading cause of death worldwide and has a high mortality rate after onset; therefore, the CVD management requires the development of treatment plans and the prediction of prevalence rates. In our study, age, income, education level, marriage status, diabetes, and obesity were identified as risk factors for CVD. Using these 6 factors, we proposed a nomogram based on a $na{\ddot{i}}ve$ Bayesian classifier model for CVD. The attributes for each factor were assigned point values between -100 and 100 by Bayes' theorem, and the negative or positive attributes for CVD were represented to the values. Additionally, the prevalence rate can be calculated even in cases with some missing attribute values. A receiver operation characteristic (ROC) curve and calibration plot verified the nomogram. Consequently, when the attribute values for these risk factors are known, the prevalence rate for CVD can be predicted using the proposed nomogram based on a $na{\ddot{i}}ve$ Bayesian classifier model.

Software Quality Classification using Bayesian Classifier (베이지안 분류기를 이용한 소프트웨어 품질 분류)

  • Hong, Euy-Seok
    • Journal of Information Technology Services
    • /
    • v.11 no.1
    • /
    • pp.211-221
    • /
    • 2012
  • Many metric-based classification models have been proposed to predict fault-proneness of software module. This paper presents two prediction models using Bayesian classifier which is one of the most popular modern classification algorithms. Bayesian model based on Bayesian probability theory can be a promising technique for software quality prediction. This is due to the ability to represent uncertainty using probabilities and the ability to partly incorporate expert's knowledge into training data. The two models, Na$\ddot{i}$veBayes(NB) and Bayesian Belief Network(BBN), are constructed and dimensionality reduction of training data and test data are performed before model evaluation. Prediction accuracy of the model is evaluated using two prediction error measures, Type I error and Type II error, and compared with well-known prediction models, backpropagation neural network model and support vector machine model. The results show that the prediction performance of BBN model is slightly better than that of NB. For the data set with ambiguity, although the BBN model's prediction accuracy is not as good as the compared models, it achieves better performance than the compared models for the data set without ambiguity.

Text Categorization Using TextRank Algorithm (TextRank 알고리즘을 이용한 문서 범주화)

  • Bae, Won-Sik;Cha, Jeong-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.1
    • /
    • pp.110-114
    • /
    • 2010
  • We describe a new method for text categorization using TextRank algorithm. Text categorization is a problem that over one pre-defined categories are assigned to a text document. TextRank algorithm is a graph-based ranking algorithm. If we consider that each word is a vertex, and co-occurrence of two adjacent words is a edge, we can get a graph from a document. After that, we find important words using TextRank algorithm from the graph and make feature which are pairs of words which are each important word and a word adjacent to the important word. We use classifiers: SVM, Na$\ddot{i}$ve Bayesian classifier, Maximum Entropy Model, and k-NN classifier. We use non-cross-posted version of 20 Newsgroups data set. In consequence, we had an improved performance in whole classifiers, and the result tells that is a possibility of TextRank algorithm in text categorization.