• Title/Summary/Keyword: naive bayes

Search Result 238, Processing Time 0.02 seconds

Comparison of Machine Learning Techniques for Cyberbullying Detection on YouTube Arabic Comments

  • Alsubait, Tahani;Alfageh, Danyah
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.1-5
    • /
    • 2021
  • Cyberbullying is a problem that is faced in many cultures. Due to their popularity and interactive nature, social media platforms have also been affected by cyberbullying. Social media users from Arab countries have also reported being a target of cyberbullying. Machine learning techniques have been a prominent approach used by scientists to detect and battle this phenomenon. In this paper, we compare different machine learning algorithms for their performance in cyberbullying detection based on a labeled dataset of Arabic YouTube comments. Three machine learning models are considered, namely: Multinomial Naïve Bayes (MNB), Complement Naïve Bayes (CNB), and Linear Regression (LR). In addition, we experiment with two feature extraction methods, namely: Count Vectorizer and Tfidf Vectorizer. Our results show that, using count vectroizer feature extraction, the Logistic Regression model can outperform both Multinomial and Complement Naïve Bayes models. However, when using Tfidf vectorizer feature extraction, Complement Naive Bayes model can outperform the other two models.

A Study of Data Mining Methodology for Effective Analysis of False Alarm Event on Mechanical Security System (기계경비시스템 오경보 이벤트 분석을 위한 데이터마이닝 기법 연구)

  • Kim, Jong-Min;Choi, Kyong-Ho;Lee, Dong-Hwi
    • Convergence Security Journal
    • /
    • v.12 no.2
    • /
    • pp.61-70
    • /
    • 2012
  • The objective of this study is to achieve the most optimal data mining for effective analysis of false alarm event on mechanical security system. To perform this, this study searches the cause of false alarm and suggests the data conversion and analysis methods to apply to several algorithm of WEKA, which is a data mining program, based on statistical data for the number of case on movement by false alarm, false alarm rate and cause of false alarm. Analysis methods are used to estimate false alarm and set more effective reaction for false alarm by applying several algorithm. To use the suitable data for effective analysis of false alarm event on mechanical security analysis this study uses Decision Tree, Naive Bayes, BayesNet Apriori and J48Tree algorithm, and applies the algorithm by deducting the highest value.

Development of Supervised Machine Learning based Catalog Entry Classification and Recommendation System (지도학습 머신러닝 기반 카테고리 목록 분류 및 추천 시스템 구현)

  • Lee, Hyung-Woo
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.57-65
    • /
    • 2019
  • In the case of Domeggook B2B online shopping malls, it has a market share of over 70% with more than 2 million members and 800,000 items are sold per one day. However, since the same or similar items are stored and registered in different catalog entries, it is difficult for the buyer to search for items, and problems are also encountered in managing B2B large shopping malls. Therefore, in this study, we developed a catalog entry auto classification and recommendation system for products by using semi-supervised machine learning method based on previous huge shopping mall purchase information. Specifically, when the seller enters the item registration information in the form of natural language, KoNLPy morphological analysis process is performed, and the Naïve Bayes classification method is applied to implement a system that automatically recommends the most suitable catalog information for the article. As a result, it was possible to improve both the search speed and total sales of shopping mall by building accuracy in catalog entry efficiently.

A novel classification approach based on Naïve Bayes for Twitter sentiment analysis

  • Song, Junseok;Kim, Kyung Tae;Lee, Byungjun;Kim, Sangyoung;Youn, Hee Yong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.6
    • /
    • pp.2996-3011
    • /
    • 2017
  • With rapid growth of web technology and dissemination of smart devices, social networking service(SNS) is widely used. As a result, huge amount of data are generated from SNS such as Twitter, and sentiment analysis of SNS data is very important for various applications and services. In the existing sentiment analysis based on the $Na{\ddot{i}}ve$ Bayes algorithm, a same number of attributes is usually employed to estimate the weight of each class. Moreover, uncountable and meaningless attributes are included. This results in decreased accuracy of sentiment analysis. In this paper two methods are proposed to resolve these issues, which reflect the difference of the number of positive words and negative words in calculating the weights, and eliminate insignificant words in the feature selection step using Multinomial $Na{\ddot{i}}ve$ Bayes(MNB) algorithm. Performance comparison demonstrates that the proposed scheme significantly increases the accuracy compared to the existing Multivariate Bernoulli $Na{\ddot{i}}ve$ Bayes(BNB) algorithm and MNB scheme.

Intelligent Traffic Prediction by Multi-sensor Fusion using Multi-threaded Machine Learning

  • Aung, Swe Sw;Nagayama, Itaru;Tamaki, Shiro
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.6
    • /
    • pp.430-439
    • /
    • 2016
  • Estimation and analysis of traffic jams plays a vital role in an intelligent transportation system and advances safety in the transportation system as well as mobility and optimization of environmental impact. For these reasons, many researchers currently mainly focus on the brilliant machine learning-based prediction approaches for traffic prediction systems. This paper primarily addresses the analysis and comparison of prediction accuracy between two machine learning algorithms: Naïve Bayes and K-Nearest Neighbor (K-NN). Based on the fact that optimized estimation accuracy of these methods mainly depends on a large amount of recounted data and that they require much time to compute the same function heuristically for each action, we propose an approach that applies multi-threading to these heuristic methods. It is obvious that the greater the amount of historical data, the more processing time is necessary. For a real-time system, operational response time is vital, and the proposed system also focuses on the time complexity cost as well as computational complexity. It is experimentally confirmed that K-NN does much better than Naïve Bayes, not only in prediction accuracy but also in processing time. Multi-threading-based K-NN could compute four times faster than classical K-NN, whereas multi-threading-based Naïve Bayes could process only twice as fast as classical Bayes.

Empirical Bayes Interval Estimation by a Sample Reuse Method

  • Cho, Kil-Ho;Choi, Dal-Woo;Chae, Hyeon-Sook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.8 no.1
    • /
    • pp.41-48
    • /
    • 1997
  • We construct the empirical Bayes(EB) confidence intervals that attain a specified level of EB coverage for the unknown scale parameter in the Weibull distribution with the known shape parameter under the type II censored data. Our general approach is to use an EB bootstrap samples introduced by Larid and Louis(1987). Also, we compare the coverage probability and the expected interval length for these bootstrap intervals with those of the naive intervals through Monte Carlo simulation.

  • PDF

Development of Probability Based Defect Verification Algorithm for Automatic Visual Inspection (자동외관검사를 위한 확률기반 불량 확인 알고리즘 개발)

  • Kim, Youngheub;Ryu, Sun-Joong
    • Journal of the Semiconductor & Display Technology
    • /
    • v.16 no.2
    • /
    • pp.1-8
    • /
    • 2017
  • The visual inspection of electronic parts consists of two steps: automatic visual inspection and verification inspection. In the stage of a verification inspection, the human inspector sequentially inspects all the areas which detected in the automatic inspection. In this study, we propose an algorithm to determine the order of verification inspection by Bayes inference well known in the field of machine learning. This is a method of prioritizing a region estimated to have a high probability of defect using experience data of past inspection. This algorithm was applied to the visual inspection of ultraviolet filters to verify its effectiveness. As a result of the comparison experiment, it was confirmed that the verification inspection can be completed 30% of the conventional method by adapting proposed algorithm.

  • PDF

A Model to Infer Users' Behavior Patterns for Personalized Recommendation Service based Context-Awareness (컨텍스트 인식 기반 개인화 추천 서비스를 위한 사용자 행동패턴 추론 모델)

  • Seo, Hyo-Seok;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.10 no.2
    • /
    • pp.293-297
    • /
    • 2012
  • In order to provide with personalized recommendation service in context-awareness environment, the collected context data should be analyzed fast and the objective of user should be able to inferred effectively. But, the context collected from the mobile devices is not suitable for applying the existing inference algorithms as they are due to the omission or uncertainty of information and the efficient algorithms are required for mobile environment. In this paper, the behavior pattern was classified using naive bayes classification for minimize the loss caused by the omission or error of information. And pattern matching was used to effectively learn of the users inclination and infer the behavior purpose. The accuracy of the suggested inference model was evaluated by applying to the application recommendation service in the smart phones.

A Secure Encryption-Based Malware Detection System

  • Lin, Zhaowen;Xiao, Fei;Sun, Yi;Ma, Yan;Xing, Cong-Cong;Huang, Jun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.4
    • /
    • pp.1799-1818
    • /
    • 2018
  • Malware detections continue to be a challenging task as attackers may be aware of the rules used in malware detection mechanisms and constantly generate new breeds of malware to evade the current malware detection mechanisms. Consequently, novel and innovated malware detection techniques need to be investigated to deal with this circumstance. In this paper, we propose a new secure malware detection system in which API call fragments are used to recognize potential malware instances, and these API call fragments together with the homomorphic encryption technique are used to construct a privacy-preserving Naive Bayes classifier (PP-NBC). Experimental results demonstrate that the proposed PP-NBC can successfully classify instances of malware with a hit-rate as high as 94.93%.

Android Malware Detection Using Permission-Based Machine Learning Approach (머신러닝을 이용한 권한 기반 안드로이드 악성코드 탐지)

  • Kang, Seongeun;Long, Nguyen Vu;Jung, Souhwan
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.3
    • /
    • pp.617-623
    • /
    • 2018
  • This study focuses on detection of malicious code through AndroidManifest permissoion feature extracted based on Android static analysis. Features are built on the permissions of AndroidManifest, which can save resources and time for analysis. Malicious app detection model consisted of SVM (support vector machine), NB (Naive Bayes), Gradient Boosting Classifier (GBC) and Logistic Regression model which learned 1,500 normal apps and 500 malicious apps and 98% detection rate. In addition, malicious app family identification is implemented by multi-classifiers model using algorithm SVM, GPC (Gaussian Process Classifier) and GBC (Gradient Boosting Classifier). The learned family identification machine learning model identified 92% of malicious app families.