• Title/Summary/Keyword: Bayes decision

Search Result 120, Processing Time 0.023 seconds

Automatic Identification of Database Workloads by using SVM Workload Classifier (SVM 워크로드 분류기를 통한 자동화된 데이터베이스 워크로드 식별)

  • Kim, So-Yeon;Roh, Hong-Chan;Park, Sang-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.4
    • /
    • pp.84-90
    • /
    • 2010
  • DBMS is used for a range of applications from data warehousing through on-line transaction processing. As a result of this demand, DBMS has continued to grow in terms of its size. This growth invokes the most important issue of manually tuning the performance of DBMS. The DBMS tuning should be adaptive to the type of the workload put upon it. But, identifying workloads in mixed database applications might be quite difficult. Therefore, a method is necessary for identifying workloads in the mixed database environment. In this paper, we propose a SVM workload classifier to automatically identify a DBMS workload. Database workloads are collected in TPC-C and TPC-W benchmark while changing the resource parameters. Parameters for SVM workload classifier, C and kernel parameter, were chosen experimentally. The experiments revealed that the accuracy of the proposed SVM workload classifier is about 9% higher than that of Decision tree, Naive Bayes, Multilayer perceptron and K-NN classifier.

Effective Korean sentiment classification method using word2vec and ensemble classifier (Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안)

  • Park, Sung Soo;Lee, Kun Chang
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.133-140
    • /
    • 2018
  • Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.

Convergence study to detect metabolic syndrome risk factors by gender difference (성별에 따른 대사증후군의 위험요인 탐색을 위한 융복합 연구)

  • Lee, So-Eun;Rhee, Hyun-Sill
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.477-486
    • /
    • 2021
  • This study was conducted to detect metabolic syndrome risk factors and gender difference in adults. 18,616 cases of adults are collected by Korea Health and Nutrition Examination Study from 2016 to 2019. Using 4 types of machine Learning(Logistic Regression, Decision Tree, Naïve Bayes, Random Forest) to predict Metabolic Syndrome. The results showed that the Random Forest was superior to other methods in men and women. In both of participants, BMI, diet(fat, vitamin C, vitamin A, protein, energy intake), number of underlying chronic disease and age were the upper importance. In women, education level, menarche age, menopause was additional upper importance and age, number of underlying chronic disease were more powerful importance than men. Future study have to verify various strategy to prevent metabolic syndrome.

A comparative study of the performance of machine learning algorithms to detect malicious traffic in IoT networks (IoT 네트워크에서 악성 트래픽을 탐지하기 위한 머신러닝 알고리즘의 성능 비교연구)

  • Hyun, Mi-Jin
    • Journal of Digital Convergence
    • /
    • v.19 no.9
    • /
    • pp.463-468
    • /
    • 2021
  • Although the IoT is showing explosive growth due to the development of technology and the spread of IoT devices and activation of services, serious security risks and financial damage are occurring due to the activities of various botnets. Therefore, it is important to accurately and quickly detect the activities of these botnets. As security in the IoT environment has characteristics that require operation with minimum processing performance and memory, in this paper, the minimum characteristics for detection are selected, and KNN (K-Nearest Neighbor), Naïve Bayes, Decision Tree, Random A comparative study was conducted on the performance of machine learning algorithms such as Forest to detect botnet activity. Experimental results using the Bot-IoT dataset showed that KNN can detect DDoS, DoS, and Reconnaissance attacks most effectively and efficiently among the applied machine learning algorithms.

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

The Modified LVQ method for Performance Improvement of Pattern Classification (패턴 분류 성능을 개선하기 위한 수정된 LVQ 방식)

  • Eom Ki-Hwan;Jung Kyung-Kwon;Chung Sung-Boo
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.2 s.308
    • /
    • pp.33-39
    • /
    • 2006
  • This paper presents the modified LVQ method for performance improvement of pattern classification. The proposed method uses the skewness of probability distribution between the input vectors and the reference vectors. During training, the reference vectors are closest to the input vectors using the probabilistic distribution of the input vectors, and they are positioned to approximate the decision surfaces of the theoretical Bayes classifier. In order to verify the effectiveness of the proposed method, we performed experiments on the Gaussian distribution data set, and the Fisher's IRIS data set. The experimental results show that the proposed method considerably improves on the performance of the LVQ1, LVQ2, and GLVQ.

A comparison of activity recognition using a triaxial accelerometer sensor (3축 가속도 센서를 이용한 행동 인식 비교)

  • Wang, ChangWon;Ho, JongGab;Na, YeJi;Jung, HwaYung;Nam, YunYoung;Min, Se Dong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.1361-1364
    • /
    • 2015
  • 본 연구에서는 노인들이 일상에서 많이 행동하는 7가지 유형의 행동의 특징을 추출하고, 총 7가지 분류 알고리즘에 적용하여 가장 인식률이 높은 알고리즘을 도출하고자 하였다. 행동패턴은 정상보행, 절름발이, 지팡이, 느린 보행, 허리가 굽은 상태에서 보행, 스스로 휠체어 끌 때 그리고 누군가가 휠체어를 끌어줄 때 총 7가지로 구성하였다. 행동패턴의 특징은 3축 가속도 센서의 값, 평균, 표준편차, 수직 및 수평축의 데이터를 사용하였다. 분류 알고리즘은 Naive Bayes, Bayes Net, k-NN, SVM, Decision Tree, Multilayer perception, Logistic regression을 사용하였다. 연구결과 k-NN 알고리즘의 인식률이 98.7%로 다른 분류알고리즘에 비해 인식률이 높게 나타났다.

An Analysis of the Characteristics of Companies introducing Smart Factory System Using Data Mining Technique (데이터 마이닝 기법을 활용한 스마트팩토리 도입 기업의 특성 분석)

  • Oh, Jeong-yoon;Choi, Sang-hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.5
    • /
    • pp.179-189
    • /
    • 2018
  • Currently, research on smart factories is steadily being carried out in terms of implementation strategies and considerations in construction. Various studies have not been conducted on companies that introduced smart factories. This study conducted a questionnaire survey for SMEs applying the basic stage of smart factory. And the cluster analysis was conducted to examine the characteristics of the company. In addition, we conducted Decision Tree and Naive Bay to examine how the characteristics of a company are derived and compare the results. As a result of the cluster analysis, it was confirmed that the group was divided into the high satisfaction group and the low satisfaction group. The decision tree and the Naive Bay analysis showed that the higher satisfaction group has high productivity.

Comparative Study of Tokenizer Based on Learning for Sentiment Analysis (고객 감성 분석을 위한 학습 기반 토크나이저 비교 연구)

  • Kim, Wonjoon
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.3
    • /
    • pp.421-431
    • /
    • 2020
  • Purpose: The purpose of this study is to compare and analyze the tokenizer in natural language processing for customer satisfaction in sentiment analysis. Methods: In this study, a supervised learning-based tokenizer Mecab-Ko and an unsupervised learning-based tokenizer SentencePiece were used for comparison. Three algorithms: Naïve Bayes, k-Nearest Neighbor, and Decision Tree were selected to compare the performance of each tokenizer. For performance comparison, three metrics: accuracy, precision, and recall were used in the study. Results: The results of this study are as follows; Through performance evaluation and verification, it was confirmed that SentencePiece shows better classification performance than Mecab-Ko. In order to confirm the robustness of the derived results, independent t-tests were conducted on the evaluation results for the two types of the tokenizer. As a result of the study, it was confirmed that the classification performance of the SentencePiece tokenizer was high in the k-Nearest Neighbor and Decision Tree algorithms. In addition, the Decision Tree showed slightly higher accuracy among the three classification algorithms. Conclusion: The SentencePiece tokenizer can be used to classify and interpret customer sentiment based on online reviews in Korean more accurately. In addition, it seems that it is possible to give a specific meaning to a short word or a jargon, which is often used by users when evaluating products but is not defined in advance.