• Title/Summary/Keyword: Bayes method

Search Result 365, Processing Time 0.032 seconds

Sentiment Classification of Movie Reviews using Levenshtein Distance (Levenshtein 거리를 이용한 영화평 감성 분류)

  • Ahn, Kwang-Mo;Kim, Yun-Suk;Kim, Young-Hoon;Seo, Young-Hoon
    • Journal of Digital Contents Society
    • /
    • v.14 no.4
    • /
    • pp.581-587
    • /
    • 2013
  • In this paper, we propose a method of sentiment classification which uses Levenshtein distance. We generate BOW(Bag-Of-Word) applying Levenshtein daistance in sentiment features and used it as the training set. Then the machine learning algorithms we used were SVMs(Support Vector Machines) and NB(Naive Bayes). As the data set, we gather 2,385 reviews of movies from an online movie community (Daum movie service). From the collected reviews, we pick sentiment words up manually and sorted 778 words. In the experiment, we perform the machine learning using previously generated BOW which was applied Levenshtein distance in sentiment words and then we evaluate the performance of classifier by a method, 10-fold-cross validation. As the result of evaluation, we got 85.46% using Multinomial Naive Bayes as the accuracy when the Levenshtein distance was 3. According to the result of the experiment, we proved that it is less affected to performance of the classification in spelling errors in documents.

An Automatic Contour Detection of 2-D Echocardiograms Using the Heat Anisotropic Diffusion Method (Heat Anisotropic Diffusion 방법을 이용한 2차원 심초음파도에서 경계선 자동 검출)

  • 신동조;김동윤
    • Progress in Medical Physics
    • /
    • v.7 no.2
    • /
    • pp.79-90
    • /
    • 1996
  • In this paper, we present an automatic threshold decision method to detect the contour of the a 2-D echocarodiogram by using the Bayes estimator for the boundary-like region. The boundary-like region is constructed from the conduction coefficient of the heat anisotro-pic diffusion method which enforces the blurred image during the preprocessing step. For the boundary-like region, we used the Bayes estimator to select an optimal threshold level. From this threshold value, the contour of the echocardigrams can be detected automatically Finally by overlapping the estimated contour to the original echocardiogram, we can obtain the contour enforced ultrasound echocardiogram.

  • PDF

An Active Learning-based Method for Composing Training Document Set in Bayesian Text Classification Systems (베이지언 문서분류시스템을 위한 능동적 학습 기반의 학습문서집합 구성방법)

  • 김제욱;김한준;이상구
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.12
    • /
    • pp.966-978
    • /
    • 2002
  • There are two important problems in improving text classification systems based on machine learning approach. The first one, called "selection problem", is how to select a minimum number of informative documents from a given document collection. The second one, called "composition problem", is how to reorganize selected training documents so that they can fit an adopted learning method. The former problem is addressed in "active learning" algorithms, and the latter is discussed in "boosting" algorithms. This paper proposes a new learning method, called AdaBUS, which proactively solves the above problems in the context of Naive Bayes classification systems. The proposed method constructs more accurate classification hypothesis by increasing the valiance in "weak" hypotheses that determine the final classification hypothesis. Consequently, the proposed algorithm yields perturbation effect makes the boosting algorithm work properly. Through the empirical experiment using the Routers-21578 document collection, we show that the AdaBUS algorithm more significantly improves the Naive Bayes-based classification system than other conventional learning methodson system than other conventional learning methods

Scalable and Accurate Intrusion Detection using n-Gram Augmented Naive Bayes and Generalized k-Truncated Suffix Tree (N-그램 증강 나이브 베이스 알고리즘과 일반화된 k-절단 서픽스트리를 이용한 확장가능하고 정확한 침입 탐지 기법)

  • Kang, Dae-Ki;Hwang, Gi-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.4
    • /
    • pp.805-812
    • /
    • 2009
  • In many intrusion detection applications, n-gram approach has been widely applied. However, n-gram approach has shown a few problems including unscalability and double counting of features. To address those problems, we applied n-gram augmented Naive Bayes with k-truncated suffix tree (k-TST) storage mechanism directly to classify intrusive sequences and compared performance with those of Naive Bayes and Support Vector Machines (SVM) with n-gram features by the experiments on host-based intrusion detection benchmark data sets. Experimental results on the University of New Mexico (UNM) benchmark data sets show that the n-gram augmented method, which solves the problem of independence violation that happens when n-gram features are directly applied to Naive Bayes (i.e. Naive Bayes with n-gram features), yields intrusion detectors with higher accuracy than those from Naive Bayes with n-gram features and shows comparable accuracy to those from SVM with n-gram features. For the scalable and efficient counting of n-gram features, we use k-truncated suffix tree mechanism for storing n-gram features. With the k-truncated suffix tree storage mechanism, we tested the performance of the classifiers up to 20-gram, which illustrates the scalability and accuracy of n-gram augmented Naive Bayes with k-truncated suffix tree storage mechanism.

Preference Prediction System using Similarity Weight granted Bayesian estimated value and Associative User Clustering (베이지안 추정치가 부여된 유사도 가중치와 연관 사용자 군집을 이용한 선호도 예측 시스템)

  • 정경용;최성용;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.316-325
    • /
    • 2003
  • A user preference prediction method using an exiting collaborative filtering technique has used the nearest-neighborhood method based on the user preference about items and has sought the user's similarity from the Pearson correlation coefficient. Therefore, it does not reflect any contents about items and also solve the problem of the sparsity. This study suggests the preference prediction system using the similarity weight granted Bayesian estimated value and the associative user clustering to complement problems of an exiting collaborative preference prediction method. This method suggested in this paper groups the user according to the Genre by using Association Rule Hypergraph Partitioning Algorithm and the new user is classified into one of these Genres by Naive Bayes classifier to slove the problem of sparsity in the collaborative filtering system. Besides, for get the similarity between users belonged to the classified genre and new users, this study allows the different estimated value to item which user vote through Naive Bayes learning. If the preference with estimated value is applied to the exiting Pearson correlation coefficient, it is able to promote the precision of the prediction by reducing the error of the prediction because of missing value. To estimate the performance of suggested method, the suggested method is compared with existing collaborative filtering techniques. As a result, the proposed method is efficient for improving the accuracy of prediction through solving problems of existing collaborative filtering techniques.

On Practical Choice of Smoothing Parameter in Nonparametric Classification (베이즈 리스크를 이용한 커널형 분류에서 평활모수의 선택)

  • Kim, Rae-Sang;Kang, Kee-Hoon
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.2
    • /
    • pp.283-292
    • /
    • 2008
  • Smoothing parameter or bandwidth plays a key role in nonparametric classification based on kernel density estimation. We consider choosing smoothing parameter in nonparametric classification, which optimize the Bayes risk. Hall and Kang (2005) clarified the theoretical properties of smoothing parameter in terms of minimizing Bayes risk and derived the optimal order of it. Bootstrap method was used in their exploring numerical properties. We compare cross-validation and bootstrap method numerically in terms of optimal order of bandwidth. Effects on misclassification rate are also examined. We confirm that bootstrap method is superior to cross-validation in both cases.

Watermark Detection Algorithm Using Statistical Decision Theory (통계적 판단 이론을 이용한 워터마크 검출 알고리즘)

  • 권성근;김병주;이석환;권기구;권기용;이건일
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.1
    • /
    • pp.39-49
    • /
    • 2003
  • Watermark detection has a crucial role in copyright protection of and authentication for multimedia and has classically been tackled by means of correlation-based algorithms. Nevertheless, when watermark embedding does not obey an additive rule, correlation-based detection is not the optimum choice. So a new detection algorithm is proposed which is optimum for non-additive watermark embedding. By relying on statistical decision theory, the proposed method is derived according to the Bayes decision theory, Neyman-Pearson criterion, and distribution of wavelet coefficients, thus permitting to minimize the missed detection probability subject to a given false detection probability. The superiority of the proposed method has been tested from a robustness perspective. The results confirm the superiority of the proposed technique over classical correlation- based method.

A Novel Method for a Reliable Classifier using Gradients

  • Han, Euihwan;Cha, Hyungtai
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.1
    • /
    • pp.18-20
    • /
    • 2017
  • In this paper, we propose a new classification method to complement a $na{\ddot{i}}ve$ Bayesian classifier. This classifier assumes data distribution to be Gaussian, finds the discriminant function, and derives the decision curve. However, this method does not investigate finding the decision curve in much detail, and there are some minor problems that arise in finding an accurate discriminant function. Our findings also show that this method could produce errors when finding the decision curve. The aim of this study has therefore been to investigate existing problems and suggest a more reliable classification method. To do this, we utilize the gradient to find the decision curve. We then compare/analyze our algorithm with the $na{\ddot{i}}ve$ Bayesian method. Performance evaluation indicates that the average accuracy of our classification method is about 10% higher than $na{\ddot{i}}ve$ Bayes.

Bayesian and maximum likelihood estimations from exponentiated log-logistic distribution based on progressive type-II censoring under balanced loss functions

  • Chung, Younshik;Oh, Yeongju
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.5
    • /
    • pp.425-445
    • /
    • 2021
  • A generalization of the log-logistic (LL) distribution called exponentiated log-logistic (ELL) distribution on lines of exponentiated Weibull distribution is considered. In this paper, based on progressive type-II censored samples, we have derived the maximum likelihood estimators and Bayes estimators for three parameters, the survival function and hazard function of the ELL distribution. Then, under the balanced squared error loss (BSEL) and the balanced linex loss (BLEL) functions, their corresponding Bayes estimators are obtained using Lindley's approximation (see Jung and Chung, 2018; Lindley, 1980), Tierney-Kadane approximation (see Tierney and Kadane, 1986) and Markov Chain Monte Carlo methods (see Hastings, 1970; Gelfand and Smith, 1990). Here, to check the convergence of MCMC chains, the Gelman and Rubin diagnostic (see Gelman and Rubin, 1992; Brooks and Gelman, 1997) was used. On the basis of their risks, the performances of their Bayes estimators are compared with maximum likelihood estimators in the simulation studies. In this paper, research supports the conclusion that ELL distribution is an efficient distribution to modeling data in the analysis of survival data. On top of that, Bayes estimators under various loss functions are useful for many estimation problems.

Bayesian Estimation of the Reliability Function of the Burr Type XII Model under Asymmetric Loss Function

  • Kim, Chan-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.2
    • /
    • pp.389-399
    • /
    • 2007
  • In this paper, Bayes estimates for the parameters k, c and reliability function of the Burr type XII model based on a type II censored samples under asymmetric loss functions viz., LINEX and SQUAREX loss functions are obtained. An approximation based on the Laplace approximation method (Tierney and Kadane, 1986) is used for obtaining the Bayes estimators of the parameters and reliability function. In order to compare the Bayes estimators under squared error loss, LINEX and SQUAREX loss functions respectively and the maximum likelihood estimator of the parameters and reliability function, Monte Carlo simulations are used.