• 제목/요약/키워드: Bayes method

Search Result 365, Processing Time 0.039 seconds

A Study on Incremental Learning Model for Naive Bayes Text Classifier (Naive Bayes 문서 분류기를 위한 점진적 학습 모델 연구)

  • 김제욱;김한준;이상구
    • The Journal of Information Technology and Database
    • /
    • v.8 no.1
    • /
    • pp.95-104
    • /
    • 2001
  • In the text classification domain, labeling the training documents is an expensive process because it requires human expertise and is a tedious, time-consuming task. Therefore, it is important to reduce the manual labeling of training documents while improving the text classifier. Selective sampling, a form of active learning, reduces the number of training documents that needs to be labeled by examining the unlabeled documents and selecting the most informative ones for manual labeling. We apply this methodology to Naive Bayes, a text classifier renowned as a successful method in text classification. One of the most important issues in selective sampling is to determine the criterion when selecting the training documents from the large pool of unlabeled documents. In this paper, we propose two measures that would determine this criterion : the Mean Absolute Deviation (MAD) and the entropy measure. The experimental results, using Renters 21578 corpus, show that this proposed learning method improves Naive Bayes text classifier more than the existing ones.

  • PDF

Bayesian Method for the Multiple Test of an Autoregressive Parameter in Stationary AR(L) Model (AR(1)모형에서 자기회귀계수의 다중검정을 위한 베이지안방법)

  • 김경숙;손영숙
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.1
    • /
    • pp.141-150
    • /
    • 2003
  • This paper presents the multiple testing method of an autoregressive parameter in stationary AR(1) model using the usual Bayes factor. As prior distributions of parameters in each model, uniform prior and noninformative improper priors are assumed. Posterior probabilities through the usual Bayes factors are used for the model selection. Finally, to check whether these theoretical results are correct, simulated data and real data are analyzed.

On using Bayes Risk for Data Association to Improve Single-Target Multi-Sensor Tracking in Clutter (Bayes Risk를 이용한 False Alarm이 존재하는 환경에서의 단일 표적-다중센서 추적 알고리즘)

  • 김경택;최대범;안병하;고한석
    • Proceedings of the IEEK Conference
    • /
    • 2001.06d
    • /
    • pp.159-162
    • /
    • 2001
  • In this Paper, a new multi-sensor single-target tracking method in cluttered environment is proposed. Unlike the established methods such as probabilistic data association filter (PDAF), the proposed method intends to reflect the information in detection phase into parameters in tracking so as to reduce uncertainty due to clutter. This is achieved by first modifying the Bayes risk in Bayesian detection criterion to incorporate the likelihood of measurements from multiple sensors. The final estimate is then computed by taking a linear combination of the likelihood and the estimate of measurements. We develop the procedure and discuss the results from representative simulations.

  • PDF

Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes, and Artificial Neural Networks

  • Sarkar, Kamal;Nasipuri, Mita;Ghose, Suranjan
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.693-712
    • /
    • 2012
  • The paper presents three machine learning based keyphrase extraction methods that respectively use Decision Trees, Na$\ddot{i}$ve Bayes, and Artificial Neural Networks for keyphrase extraction. We consider keyphrases as being phrases that consist of one or more words and as representing the important concepts in a text document. The three machine learning based keyphrase extraction methods that we use for experimentation have been compared with a publicly available keyphrase extraction system called KEA. The experimental results show that the Neural Network based keyphrase extraction method outperforms two other keyphrase extraction methods that use the Decision Tree and Na$\ddot{i}$ve Bayes. The results also show that the Neural Network based method performs better than KEA.

Estimation of entropy of the inverse weibull distribution under generalized progressive hybrid censored data

  • Lee, Kyeongjun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.659-668
    • /
    • 2017
  • The inverse Weibull distribution (IWD) can be readily applied to a wide range of situations including applications in medicines, reliability and ecology. It is generally known that the lifetimes of test items may not be recorded exactly. In this paper, therefore, we consider the maximum likelihood estimation (MLE) and Bayes estimation of the entropy of a IWD under generalized progressive hybrid censoring (GPHC) scheme. It is observed that the MLE of the entropy cannot be obtained in closed form, so we have to solve two non-linear equations simultaneously. Further, the Bayes estimators for the entropy of IWD based on squared error loss function (SELF), precautionary loss function (PLF), and linex loss function (LLF) are derived. Since the Bayes estimators cannot be obtained in closed form, we derive the Bayes estimates by revoking the Tierney and Kadane approximate method. We carried out Monte Carlo simulations to compare the classical and Bayes estimators. In addition, two real data sets based on GPHC scheme have been also analysed for illustrative purposes.

A Bayesian Criterion for a Multiple test of Two Multivariate Normal Populations

  • Kim, Hae-Jung;Son, Young-Sook
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.1
    • /
    • pp.97-107
    • /
    • 2001
  • A simultaneous test criterion for multiple hypotheses concerning comparison of two multivariate normal populations is considered by using the so called Bayes factor method. Fully parametric frequentist approach for the test is not available and thus Bayesian criterion is pursued using a Bayes factor that eliminates its arbitrariness problem induced by improper priors. Specifically, the fractional Bayes factor (FBF) by O'Hagan (1995) is used to derive the criterion. Necessary theories involved in the derivation an computation of the criterion are provided. Finally, an illustrative simulation study is given to show the properties of the criterion.

  • PDF

Mutual Information in Naive Bayes with Kernel Density Estimation (나이브 베이스에서의 커널 밀도 측정과 상호 정보량)

  • Xiang, Zhongliang;Yu, Xiangru;Kang, Dae-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.86-88
    • /
    • 2014
  • Naive Bayes (NB) assumption has some harmful effects in classification to the real world data. To relax this assumption, we now propose approach called Naive Bayes Mutual Information Attribute Weighting with Smooth Kernel Density Estimation (NBMIKDE) that combine the smooth kernel for attribute and attribute weighting method based on mutual information measure.

  • PDF

A Bayes Criterion for Selecting Variables in MDA (MDA에서 판별변수 선택을 위한 베이즈 기준)

  • 김혜중;유희경
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.2
    • /
    • pp.435-449
    • /
    • 1998
  • In this article we have introduced a Bayes criterion for the variable selection in multiple discriminant analysis (MDA). The criterion is a default Bayes factor for the comparision of homo/heteroscadasticity of the multivariate normal means. The default Bayes factor is obtained from a development of the imaginary training sample method introduced by Spiegelhalter and Smith (1982). Based an the criterion, we also provided a test for additional discrimination in MDA. The advantage of the criterion is that it is not only applicable for the optimal subset selection method but for the stepwise method. More over, the criterion can be reduced to that for two-group discriminant analysis. Thus the criterion can be regarded as an unified alternative to variable selection criteria suggested by various sampling theory approaches. To illustrate the performance of the criterion, a numerical study has bean done via Monte Carlo experiment.

  • PDF

Effective Fingerprint Classification using Subsumed One-Vs-All Support Vector Machines and Naive Bayes Classifiers (포섭구조 일대다 지지벡터기계와 Naive Bayes 분류기를 이용한 효과적인 지문분류)

  • Hong, Jin-Hyuk;Min, Jun-Ki;Cho, Ung-Keun;Cho, Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.10
    • /
    • pp.886-895
    • /
    • 2006
  • Fingerprint classification reduces the number of matches required in automated fingerprint identification systems by categorizing fingerprints into a predefined class. Support vector machines (SVMs), widely used in pattern classification, have produced a high accuracy rate when performing fingerprint classification. In order to effectively apply SVMs to multi-class fingerprint classification systems, we propose a novel method in which SVMs are generated with the one-vs-all (OVA) scheme and dynamically ordered with $na{\ddot{i}}ve$ Bayes classifiers. More specifically, it uses representative fingerprint features such as the FingerCode, singularities and pseudo ridges to train the OVA SVMs and $na{\ddot{i}}ve$ Bayes classifiers. The proposed method has been validated on the NIST-4 database and produced a classification accuracy of 90.8% for 5-class classification. Especially, it has effectively managed tie problems usually occurred in applying OVA SVMs to multi-class classification.

Breast Cancer Diagnosis using Naive Bayes Analysis Techniques (Naive Bayes 분석기법을 이용한 유방암 진단)

  • Park, Na-Young;Kim, Jang-Il;Jung, Yong-Gyu
    • Journal of Service Research and Studies
    • /
    • v.3 no.1
    • /
    • pp.87-93
    • /
    • 2013
  • Breast cancer is known as a disease that occurs in a lot of developed countries. However, in recent years, the incidence of Korea's modern woman is increased steadily. As well known, breast cancer usually occurs in women over 50. In the case of Korea, however, the incidence of 40s with young women is increased steadily than the West. Therefore, it is a very urgent task to build a manual to the accurate diagnosis of breast cancer in adult women in Korea. In this paper, we show how using data mining techniques to predict breast cancer. Data mining refers to the process of finding regular patterns or relationships among variables within the database. To this, sophisticated analysis using the model, you will find useful information that is easily revealed. In this paper, through experiments Deicion Tree Naive Bayes analysis techniques were compared using analysis techniques to diagnose breast cancer. Two algorithms was analyzed by applying C4.5 algorithm. Deicison Tree classification accuracy was fairly good. Naive Bayes classification method showed better accuracy compared to the Decision Tree method.

  • PDF