• Title/Summary/Keyword: Bayes method

Search Result 365, Processing Time 0.037 seconds

Detection of Depression Trends in Literary Cyber Writers Using Sentiment Analysis and Machine Learning

  • Faiza Nasir;Haseeb Ahmad;CM Nadeem Faisal;Qaisar Abbas;Mubarak Albathan;Ayyaz Hussain
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.67-80
    • /
    • 2023
  • Rice is an important food crop for most of the population in Nowadays, psychologists consider social media an important tool to examine mental disorders. Among these disorders, depression is one of the most common yet least cured disease Since abundant of writers having extensive followers express their feelings on social media and depression is significantly increasing, thus, exploring the literary text shared on social media may provide multidimensional features of depressive behaviors: (1) Background: Several studies observed that depressive data contains certain language styles and self-expressing pronouns, but current study provides the evidence that posts appearing with self-expressing pronouns and depressive language styles contain high emotional temperatures. Therefore, the main objective of this study is to examine the literary cyber writers' posts for discovering the symptomatic signs of depression. For this purpose, our research emphases on extracting the data from writers' public social media pages, blogs, and communities; (3) Results: To examine the emotional temperatures and sentences usage between depressive and not depressive groups, we employed the SentiStrength algorithm as a psycholinguistic method, TF-IDF and N-Gram for ranked phrases extraction, and Latent Dirichlet Allocation for topic modelling of the extracted phrases. The results unearth the strong connection between depression and negative emotional temperatures in writer's posts. Moreover, we used Naïve Bayes, Support Vector Machines, Random Forest, and Decision Tree algorithms to validate the classification of depressive and not depressive in terms of sentences, phrases and topics. The results reveal that comparing with others, Support Vectors Machines algorithm validates the classification while attaining highest 79% f-score; (4) Conclusions: Experimental results show that the proposed system outperformed for detection of depression trends in literary cyber writers using sentiment analysis.

Durability Analysis and Development of Probability-Based Carbonation Prediction Model in Concrete Structure (콘크리트 구조물의 확률론적 탄산화 예측 모델 개발 및 내구성 해석)

  • Jung, Hyunjun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.30 no.4A
    • /
    • pp.343-352
    • /
    • 2010
  • Recently, many researchers have been carried out to estimate more controlled service life and long-term performance of carbonated concrete structures. Durability analysis and design based on probability have been induced to new concrete structures for design. This paper provides a carbonation prediction model based on the Fick's 1st law of diffusion using statistic data of carbonated concrete structures and the probabilistic analysis of the durability performance has been carried out by using a Bayes' theorem. The influence of concerned design parameters such as $CO_2$ diffusion coefficient, atmospheric $CO_2$ concentration, absorption quantity of $CO_2$ and the degree of hydration was investigated. Using a monitoring data, this model which was based on probabilistic approach was predicted a carbonation depth and a remaining service life at a variety of environmental concrete structures. Form the result, the application method using a realistic carbonation prediction model can be to estimate erosion-open-time, controlled durability and to determine a making decision for suitable repair and maintenance of carbonated concrete structures.

Improving SARIMA model for reliable meteorological drought forecasting

  • Jehanzaib, Muhammad;Shah, Sabab Ali;Son, Ho Jun;Kim, Tae-Woong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.141-141
    • /
    • 2022
  • Drought is a global phenomenon that affects almost all landscapes and causes major damages. Due to non-linear nature of contributing factors, drought occurrence and its severity is characterized as stochastic in nature. Early warning of impending drought can aid in the development of drought mitigation strategies and measures. Thus, drought forecasting is crucial in the planning and management of water resource systems. The primary objective of this study is to make improvement is existing drought forecasting techniques. Therefore, we proposed an improved version of Seasonal Autoregressive Integrated Moving Average (SARIMA) model (MD-SARIMA) for reliable drought forecasting with three years lead time. In this study, we selected four watersheds of Han River basin in South Korea to validate the performance of MD-SARIMA model. The meteorological data from 8 rain gauge stations were collected for the period 1973-2016 and converted into watershed scale using Thiessen's polygon method. The Standardized Precipitation Index (SPI) was employed to represent the meteorological drought at seasonal (3-month) time scale. The performance of MD-SARIMA model was compared with existing models such as Seasonal Naive Bayes (SNB) model, Exponential Smoothing (ES) model, Trigonometric seasonality, Box-Cox transformation, ARMA errors, Trend and Seasonal components (TBATS) model, and SARIMA model. The results showed that all the models were able to forecast drought, but the performance of MD-SARIMA was robust then other statistical models with Wilmott Index (WI) = 0.86, Mean Absolute Error (MAE) = 0.66, and Root mean square error (RMSE) = 0.80 for 36 months lead time forecast. The outcomes of this study indicated that the MD-SARIMA model can be utilized for drought forecasting.

  • PDF

An Ensemble Classification of Mental Health in Malaysia related to the Covid-19 Pandemic using Social Media Sentiment Analysis

  • Nur 'Aisyah Binti Zakaria Adli;Muneer Ahmad;Norjihan Abdul Ghani;Sri Devi Ravana;Azah Anir Norman
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.370-396
    • /
    • 2024
  • COVID-19 was declared a pandemic by the World Health Organization (WHO) on 30 January 2020. The lifestyle of people all over the world has changed since. In most cases, the pandemic has appeared to create severe mental disorders, anxieties, and depression among people. Mostly, the researchers have been conducting surveys to identify the impacts of the pandemic on the mental health of people. Despite the better quality, tailored, and more specific data that can be generated by surveys,social media offers great insights into revealing the impact of the pandemic on mental health. Since people feel connected on social media, thus, this study aims to get the people's sentiments about the pandemic related to mental issues. Word Cloud was used to visualize and identify the most frequent keywords related to COVID-19 and mental health disorders. This study employs Majority Voting Ensemble (MVE) classification and individual classifiers such as Naïve Bayes (NB), Support Vector Machine (SVM), and Logistic Regression (LR) to classify the sentiment through tweets. The tweets were classified into either positive, neutral, or negative using the Valence Aware Dictionary or sEntiment Reasoner (VADER). Confusion matrix and classification reports bestow the precision, recall, and F1-score in identifying the best algorithm for classifying the sentiments.

A Method to Find Feature Set for Detecting Various Denial Service Attacks in Power Grid (전력망에서의 다양한 서비스 거부 공격 탐지 위한 특징 선택 방법)

  • Lee, DongHwi;Kim, Young-Dae;Park, Woo-Bin;Kim, Joon-Seok;Kang, Seung-Ho
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.2 no.2
    • /
    • pp.311-316
    • /
    • 2016
  • Network intrusion detection system based on machine learning method such as artificial neural network is quite dependent on the selected features in terms of accuracy and efficiency. Nevertheless, choosing the optimal combination of features, which guarantees accuracy and efficienty, from generally used many features to detect network intrusion requires extensive computing resources. In this paper, we deal with a optimal feature selection problem to determine 6 denial service attacks and normal usage provided by NSL-KDD data. We propose a optimal feature selection algorithm. Proposed algorithm is based on the multi-start local search algorithm, one of representative meta-heuristic algorithm for solving optimization problem. In order to evaluate the performance of our proposed algorithm, comparison with a case of all 41 features used against NSL-KDD data is conducted. In addtion, comparisons between 3 well-known machine learning methods (multi-layer perceptron., Bayes classifier, and Support vector machine) are performed to find a machine learning method which shows the best performance combined with the proposed feature selection method.

A study of Bayesian inference on auto insurance credibility application (자동차보험 신뢰도 적용에 대한 베이지안 추론 방식 연구)

  • Kim, Myung Joon;Kim, Yeong-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.4
    • /
    • pp.689-699
    • /
    • 2013
  • This paper studies the partial credibility application method by assuming the empirical prior or noninformative prior informations in auto insurnace business where intensive rating segmentation is expanded because of premium competition. Expanding of rating factor segmetation brings the increase of pricing cells, as a result, the number of cells for partial credibility application will increase correspondingly. This study is trying to suggest more accurate estimation method by considering the Bayesian framework. By using empirically well-known or noninformative information, inducing the proper posterior distribution and applying the Bayes estimate which is minimizing the error loss into the credibility method, we will show the advantage of Bayesian inference by comparison with current approaches. The comparison is implemented with square root rule which is a widely accepted method in insurance business. The convergence level towarding to the true risk will be compared among various approaches. This study introduces the alternative way of redcuing the error to the auto insurance business fields in need of various methods because of more segmentations.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

The big data method for flash flood warning (돌발홍수 예보를 위한 빅데이터 분석방법)

  • Park, Dain;Yoon, Sanghoo
    • Journal of Digital Convergence
    • /
    • v.15 no.11
    • /
    • pp.245-250
    • /
    • 2017
  • Flash floods is defined as the flooding of intense rainfall over a relatively small area that flows through river and valley rapidly in short time with no advance warning. So that it can cause damage property and casuality. This study is to establish the flash-flood warning system using 38 accident data, reported from the National Disaster Information Center and Land Surface Model(TOPLATS) between 2009 and 2012. Three variables were used in the Land Surface Model: precipitation, soil moisture, and surface runoff. The three variables of 6 hours preceding flash flood were reduced to 3 factors through factor analysis. Decision tree, random forest, Naive Bayes, Support Vector Machine, and logistic regression model are considered as big data methods. The prediction performance was evaluated by comparison of Accuracy, Kappa, TP Rate, FP Rate and F-Measure. The best method was suggested based on reproducibility evaluation at the each points of flash flood occurrence and predicted count versus actual count using 4 years data.

Prototype based Classification by Generating Multidimensional Spheres per Class Area (클래스 영역의 다차원 구 생성에 의한 프로토타입 기반 분류)

  • Shim, Seyong;Hwang, Doosung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.21-28
    • /
    • 2015
  • In this paper, we propose a prototype-based classification learning by using the nearest-neighbor rule. The nearest-neighbor is applied to segment the class area of all the training data into spheres within which the data exist from the same class. Prototypes are the center of spheres and their radii are computed by the mid-point of the two distances to the farthest same class point and the nearest another class point. And we transform the prototype selection problem into a set covering problem in order to determine the smallest set of prototypes that include all the training data. The proposed prototype selection method is based on a greedy algorithm that is applicable to the training data per class. The complexity of the proposed method is not complicated and the possibility of its parallel implementation is high. The prototype-based classification learning takes up the set of prototypes and predicts the class of test data by the nearest neighbor rule. In experiments, the generalization performance of our prototype classifier is superior to those of the nearest neighbor, Bayes classifier, and another prototype classifier.

Analysis of Traffic Accident Reduction Performance of High-quality and Long-life Pavement Marking Materials (고기능·장수명 차선도료의 교통사고 감소효과 분석)

  • Lee, Myunghwan;Choi, Keechoo;Oh, In Seop;Kim, Junghwa
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.35 no.4
    • /
    • pp.921-929
    • /
    • 2015
  • Road conditions which obstruct the driver from seeing road markings, such as in the evening or in the rain, due to the lack of a light source gives way to increasing the risk of traffic accidents. In order to increase road safety, the Korea Expressway Corporation set up a pilot project which used high-quality materials along a section of the Gyeongbu Expressway between Daejeon and Pangyo. There is little research on high-quality materials in the field of pavement marking. This study introduces high-quality materials used in pavement materials, presents the results of a survey conducted to examine the effect of the pilot project and analyzes traffic accident data from before and after the implementation of the pilot project. The survey results show that 87% of the respondents were highly satisfied with the pilot project. With the goal of evaluating the effect of the pilot project, this study used traffic accident data provided by the Korea Expressway Corporation accident data and performed a before-after study of the number of traffic. The results of data analysis show that there were 62 and 48 traffic accidents before and after the implementation of the pilot project. In addition, the result of Emperical Bayes Method indicates that there is an 41.7% decrease in the number of traffic accidents as an effect of High-quality Pavement Marking Materials.