• Title/Summary/Keyword: Bayes Factors

Search Result 107, Processing Time 0.027 seconds

Enhancing Workers' Job Tenure Using Directions Derived from Data Mining Techniques (데이터 마이닝 기법을 활용한 근로자의 고용유지 강화 방안 개발)

  • An, Minuk;Kim, Taeun;Yoo, Donghee
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.5
    • /
    • pp.265-279
    • /
    • 2018
  • This study conducted an experiment using data mining techniques to develop prediction models of worker job turnover. The experiment used data from the '2015 Graduate Occupational Mobility Survey' by the Korea Employment Information Service. We developed the prediction models using a decision tree, Bayes net, and artificial neural network. We found that the decision tree-based prediction model reported the best accuracy. We also found that the six influential factors affecting employees' turnover intention are type of working time, job status, full-time or not full-time, regular working hours per week, regular working days per week, and personal development opportunities. From the decision tree-based prediction model, we derived 12 rules of employee turnover for all job types. Using the derived rules, we proposed helpful directions for enhancing workers' job tenure. In addition, we analyzed the influential factors affecting employees' job turnover intention according to four job types and derived rules for each: office (ten rules), culture and art (nine rules), construction (four rules), and information technology (six rules). Using the derived rules, we proposed customized directions for improving the job tenure for each group.

A Study on the Effects of Online Word-of-Mouth on Game Consumers Based on Sentimental Analysis (감성분석 기반의 게임 소비자 온라인 구전효과 연구)

  • Jung, Keun-Woong;Kim, Jong Uk
    • Journal of Digital Convergence
    • /
    • v.16 no.3
    • /
    • pp.145-156
    • /
    • 2018
  • Unlike the past, when distributors distributed games through retail stores, they are now selling digital content, which is based on online distribution channels. This study analyzes the effects of eWOM (electronic Word of Mouth) on sales volume of game sold on Steam, an online digital content distribution channel. Recently, data mining techniques based on Big Data have been studied. In this study, emotion index of eWOM is derived by emotional analysis which is a text mining technique that can analyze the emotion of each review among factors of eWOM. Emotional analysis utilizes Naive Bayes and SVM classifier and calculates the emotion index through the SVM classifier with high accuracy. Regression analysis is performed on the dependent variable, sales variation, using the emotion index, the number of reviews of each game, the size of eWOM, and the user score of each game, which is a rating of eWOM. Regression analysis revealed that the size of the independent variable eWOM and the emotion index of the eWOM were influential on the dependent variable, sales variation. This study suggests the factors of eWOM that affect the sales volume when Korean game companies enter overseas markets based on steam.

A Study on Fault Classification by EEMD Application of Gear Transmission Error (전달오차의 EEMD적용을 통한 기어 결함분류연구)

  • Park, Sungho;Choi, Joo-Ho
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.30 no.2
    • /
    • pp.169-177
    • /
    • 2017
  • In this paper, classification of spall and crack faults of gear teeth is studied by applying the ensemble empirical mode decomposition(EEMD) for the gear transmission error(TE). Finite element models of the gears with the two faults are built, and TE is obtained by simulation of the gears under loaded contact. EEMD is applied to the residuals of the TE which are the difference between the normal and faulty signal. From the result, the difference of spall and crack faults are clearly identified by the intrinsic mode functions(IMF). A simple test bed is installed to illustrate the approach, which consists of motor, brake and a pair of spur gears. Two gears are employed to obtain the TE for the normal, spalled, and cracked gears, and the type of the faults are separated by the same EEMD application process. In order to quantify the results, crest factors are applied to each IMF. Characteristics of spall and crack are well represented by the crest factors of the first and the third IMF, which are used as the feature signals. The classification is carried out using the Bayes decision theory using the feature signals acquired through the experiments.

Statistical Analysis for Risk Factors and Prediction of Hypertension based on Health Behavior Information (건강행위정보기반 고혈압 위험인자 및 예측을 위한 통계분석)

  • Heo, Byeong Mun;Kim, Sang Yeob;Ryu, Keun Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.4
    • /
    • pp.685-692
    • /
    • 2018
  • The purpose of this study is to develop a prediction model of hypertension in middle-aged adults using Statistical analysis. Statistical analysis and prediction models were developed using the National Health and Nutrition Survey (2013-2016).Binary logistic regression analysis showed statistically significant risk factors for hypertension, and a predictive model was developed using logistic regression and the Naive Bayes algorithm using Wrapper approach technique. In the statistical analysis, WHtR(p<0.0001, OR = 2.0242) in men and AGE (p<0.0001, OR = 3.9185) in women were the most related factors to hypertension. In the performance evaluation of the prediction model, the logistic regression model showed the best predictive power in men (AUC = 0.782) and women (AUC = 0.858). Our findings provide important information for developing large-scale screening tools for hypertension and can be used as the basis for hypertension research.

An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning (기계학습에 기초한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.37-62
    • /
    • 2018
  • This study examined the factors affecting the performance of automatic classification based on machine learning for domestic journal articles in the field of LIS. In particular, In view of the classification performance that assigning automatically the class labels to the articles in "Journal of the Korean Society for Information Management", I investigated the characteristics of the key factors(weighting schemes, training set size, classification algorithms, label assigning methods) through the diversified experiments. Consequently, It is effective to apply each element appropriately according to the classification environment and the characteristics of the document set, and a fairly good performance can be obtained by using a simpler model. In addition, the classification of domestic journals can be considered as a multi-label classification that assigns more than one category to a specific article. Therefore, I proposed an optimal classification model using simple and fast classification algorithm and small learning set considering this environment.

A New Fine-grain SMS Corpus and Its Corresponding Classifier Using Probabilistic Topic Model

  • Ma, Jialin;Zhang, Yongjun;Wang, Zhijian;Chen, Bolun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.2
    • /
    • pp.604-625
    • /
    • 2018
  • Nowadays, SMS spam has been overflowing in many countries. In fact, the standards of filtering SMS spam are different from country to country. However, the current technologies and researches about SMS spam filtering all focus on dividing SMS message into two classes: legitimate and illegitimate. It does not conform to the actual situation and need. Furthermore, they are facing several difficulties, such as: (1) High quality and large-scale SMS spam corpus is very scarce, fine categorized SMS spam corpus is even none at all. This seriously handicaps the researchers' studies. (2) The limited length of SMS messages lead to lack of enough features. These factors seriously degrade the performance of the traditional classifiers (such as SVM, K-NN, and Bayes). In this paper, we present a new fine categorized SMS spam corpus which is unique and the largest one as far as we know. In addition, we propose a classifier, which is based on the probability topic model. The classifier can alleviate feature sparse problem in the task of SMS spam filtering. Moreover, we compare the approach with three typical classifiers on the new SMS spam corpus. The experimental results show that the proposed approach is more effective for the task of SMS spam filtering.

A tutorial on generalizing the default Bayesian t-test via posterior sampling and encompassing priors

  • Faulkenberry, Thomas J.
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.217-238
    • /
    • 2019
  • With the advent of so-called "default" Bayesian hypothesis tests, scientists in applied fields have gained access to a powerful and principled method for testing hypotheses. However, such default tests usually come with a compromise, requiring the analyst to accept a one-size-fits-all approach to hypothesis testing. Further, such tests may not have the flexibility to test problems the scientist really cares about. In this tutorial, I demonstrate a flexible approach to generalizing one specific default test (the JZS t-test) (Rouder et al., Psychonomic Bulletin & Review, 16, 225-237, 2009) that is becoming increasingly popular in the social and behavioral sciences. The approach uses two results, the Savage-Dickey density ratio (Dickey and Lientz, 1980) and the technique of encompassing priors (Klugkist et al., Statistica Neerlandica, 59, 57-69, 2005) in combination with MCMC sampling via an easy-to-use probabilistic modeling package for R called Greta. Through a comprehensive mathematical description of the techniques as well as illustrative examples, the reader is presented with a general, flexible workflow that can be extended to solve problems relevant to his or her own work.

Corporate Corruption Prediction Evidence From Emerging Markets

  • Kim, Yang Sok;Na, Kyunga;Kang, Young-Hee
    • Asia-Pacific Journal of Business
    • /
    • v.12 no.4
    • /
    • pp.13-40
    • /
    • 2021
  • Purpose - The purpose of this study is to predict corporate corruption in emerging markets such as Brazil, Russia, India, and China (BRIC) using different machine learning techniques. Since corruption is a significant problem that can affect corporate performance, particularly in emerging markets, it is important to correctly identify whether a company engages in corrupt practices. Design/methodology/approach - In order to address the research question, we employ predictive analytic techniques (machine learning methods). Using the World Bank Enterprise Survey Data, this study evaluates various predictive models generated by seven supervised learning algorithms: k-Nearest Neighbour (k-NN), Naïve Bayes (NB), Decision Tree (DT), Decision Rules (DR), Logistic Regression (LR), Support Vector Machines (SVM), and Artificial Neural Network (ANN). Findings - We find that DT, DR, SVM and ANN create highly accurate models (over 90% of accuracy). Among various factors, firm age is the most significant, while several other determinants such as source of working capital, top manager experience, and the number of permanent full-time employees also contribute to company corruption. Research implications or Originality - This research successfully demonstrates how machine learning can be applied to predict corporate corruption and also identifies the major causes of corporate corruption.

Prediction Model for Gastric Cancer via Class Balancing Techniques

  • Danish, Jamil ;Sellappan, Palaniappan;Sanjoy Kumar, Debnath;Muhammad, Naseem;Susama, Bagchi ;Asiah, Lokman
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.53-63
    • /
    • 2023
  • Many researchers are trying hard to minimize the incidence of cancers, mainly Gastric Cancer (GC). For GC, the five-year survival rate is generally 5-25%, but for Early Gastric Cancer (EGC), it is almost 90%. Predicting the onset of stomach cancer based on risk factors will allow for an early diagnosis and more effective treatment. Although there are several models for predicting stomach cancer, most of these models are based on unbalanced datasets, which favours the majority class. However, it is imperative to correctly identify cancer patients who are in the minority class. This research aims to apply three class-balancing approaches to the NHS dataset before developing supervised learning strategies: Oversampling (Synthetic Minority Oversampling Technique or SMOTE), Undersampling (SpreadSubsample), and Hybrid System (SMOTE + SpreadSubsample). This study uses Naive Bayes, Bayesian Network, Random Forest, and Decision Tree (C4.5) methods. We measured these classifiers' efficacy using their Receiver Operating Characteristics (ROC) curves, sensitivity, and specificity. The validation data was used to test several ways of balancing the classifiers. The final prediction model was built on the one that did the best overall.

The Impact of Capital on Growth of Small and Medium Enterprises: Evidence from Vietnam

  • HA, Van Dung;NGUYEN, Van Tung;DANG, Truong Thanh Nhan
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.9 no.1
    • /
    • pp.353-362
    • /
    • 2022
  • Small and medium businesses (SMEs) play a critical role in the economy, yet they are plagued by a shortage of finance. Determining the influence of cash sources both inside and outside the firm is critical to the company's survival and growth. As a result, the purpose of this research is to determine the impact of capital on the growth of SMEs in Vietnam. The key factors of this research are equity and liabilities, which are two proxies for a firm's capital. The data is based on the results of a survey conducted every two years from 2005 to 2015, which included over 2,600 SMEs in 20 processing and manufacturing industries in ten provinces and cities, including Hanoi, Hai Phong, Ho Chi Minh City, Ha Tay, Phu Tho, Nghe An, Quang Nam, Khanh Hoa, Lam Dong, and Long An. The findings show that characteristics such as equity capital, total workforce growth rate, and male entrepreneurs have a positive impact on enterprise growth, whereas liabilities, firm age, and export have a negative impact on enterprise growth. The study has demonstrated that equity has a positive impact while liabilities have a negative impact on the growth of Vietnamese SMEs.