• Title/Summary/Keyword: Decision Tree Regression

Search Result 328, Processing Time 0.023 seconds

A Study of Performance Comparison of MOOC Dropout Prediction utilizing Machine Learning (기계학습 방법을 이용한 MOOC 학습자의 중도 포기 예측 성능 비교 연구)

  • Hur, Yun-A;Lim, Heui-Seok
    • Annual Conference of KIPS
    • /
    • 2016.10a
    • /
    • pp.323-326
    • /
    • 2016
  • 웹 서비스를 기반으로 이루어진 MOOC(Massive Open Online Course)는 대규모 학습자에게 공개된 온라인 교육이다. MOOC는 교수와 학습자 사이 커뮤니티를 통해 상호 참여적으로 수업을 진행한다. 그러나 무료로 강의를 들을 수 있고 성적을 내지 않기 때문에 학습자들에게 큰 동기 부여가 되지 않아 등록하는 학습자는 많지만 수료하는 학습자는 현저히 적게 나타났다. 본 논문은 이러한 문제 해결 방안 마련을 위해 KDD Cup 2015에서 제공한 MOOC 데이터를 통해 중도 포기와 관련된 변수들을 선정하였으며, Decision Tree, KNN, Logistic Regression, Naive Bayesian, SVM, Neural Network인 6가지 머신 러닝 알고리즘을 통해 데이터 예측의 정확률을 확인하였다. 그 결과 Naive Bayesian이 89.3%로 가장 높은 정확률을 보였다. 본 연구를 통해 중도포기를 정확히 예측하며, 향후 학습자들에게 특정 동기부여의 효과로 학습을 수료하는 결과를 기대할 수 있다.

Main Gene Combinations and Genotype Identification of Hanwoo Quality with SNPHarvester

  • Bae, Jae-Young;Lee, Jea-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.799-808
    • /
    • 2012
  • It is known that human disease and the economic traits of livestock are significantly affected by a gene combination effect rather than a single gene effect. Existing methods to study this gene combination effect have disadvantages such as heavy computing, cost and time; therefore, to overcome those drawbacks, the SNPHarvester was developed to find the main gene combinations. In this paper, we looked for gene combinations using an adjusted linear regression model. This research finds that superior gene combinations which are related to the quality of the Korean beef cattle among sets of SNPs using SNPHarvester. We also identify the superior genotypes using a decision tree that can enhance the various qualities of Korean beef among selected a SNP combination.

Neuro-Fuzzy System and Its Application Using CART Algorithm and Hybrid Parameter Learning (CART 알고리즘과 하이브리드 학습을 통한 뉴로-퍼지 시스템과 응용)

  • Oh, B.K.;Kwak, K.C.;Ryu, J.W.
    • Proceedings of the KIEE Conference
    • /
    • 1998.07b
    • /
    • pp.578-580
    • /
    • 1998
  • The paper presents an approach to the structure identification based on the CART (Classification And Regression Tree) algorithm and to the parameter identification by hybrid learning method in neuro-fuzzy system. By using the CART algorithm, the proposed method can roughly estimate the numbers of membership function and fuzzy rule using the centers of decision regions. Then the parameter identification is carried out by the hybrid learning scheme using BP (Back-propagation) and RLSE (Recursive Least Square Estimation) from the numerical data. Finally, we will show it's usefulness for fuzzy modeling to truck backer upper control.

  • PDF

Predicting stock price direction by using data mining methods : Emphasis on comparing single classifiers and ensemble classifiers

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.111-116
    • /
    • 2017
  • This paper proposes a data mining approach to predicting stock price direction. Stock market fluctuates due to many factors. Therefore, predicting stock price direction has become an important issue in the field of stock market analysis. However, in literature, there are few studies applying data mining approaches to predicting the stock price direction. To contribute to literature, this paper proposes comparing single classifiers and ensemble classifiers. Single classifiers include logistic regression, decision tree, neural network, and support vector machine. Ensemble classifiers we consider are adaboost, random forest, bagging, stacking, and vote. For the sake of experiments, we garnered dataset from Korea Stock Exchange (KRX) ranging from 2008 to 2015. Data mining experiments using WEKA revealed that random forest, one of ensemble classifiers, shows best results in terms of metrics such as AUC (area under the ROC curve) and accuracy.

A study on Reliability Enhancement Method and the Prediction Model Construction of Medium-Voltage Customers Causing Distribution Line Fault Using Data Mining Techniques (데이터 마이닝 기법을 이용한 특별고압 파급고장 발생가능 고객 예측모델 구축 및 신뢰도 향상방안에 관한 연구)

  • Bae, Sung-Hwan;Kim, Ja-Hee;Hong, Jung-Sik;Lim, Han-Seung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.10
    • /
    • pp.1869-1880
    • /
    • 2009
  • Distribution line fault has been reduced gradually by the efforts on improving the quality of electrical materials and distribution system maintenance. However faults caused by medium voltage customers have been increased gradually even though we have done many efforts. The problem is that we don't know which customer will cause the fault. This paper presents the concept to find these customers using data mining techniques, which is based on accumulated fault records of medium voltage customers in the past. It also suggests the prediction model construction of medium voltage customers causing distribution line fault and methods to enhance the reliability of distribution system. We expect that we can effectively reduce faults resulted from medium voltage customers, which is 30% of total faults.

Performance Analysis of Opinion Mining using Word2vec (Word2vec을 이용한 오피니언 마이닝 성과분석 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.7-8
    • /
    • 2018
  • This study proposes an analysis of the Word2vec-based machine learning classifiers for the sake of opinion mining tasks. As a bench-marking method, BOW (Bag-of-Words) was adopted. On the basis of utilizing the Word2vec and BOW as feature extraction methods, we applied Laptop and Restaurant dataset to LR, DT, SVM, RF classifiers. The results showed that the Word2vec feature extraction yields more improved performance.

  • PDF

The Model for Churn Defense using Alternation of Mobile-phone (핸드폰 기기변경을 통한 해지방어 모델 개발)

  • Seo, Jong-Hyen;Chang, Yoong-Soon
    • 한국IT서비스학회:학술대회논문집
    • /
    • 2006.11a
    • /
    • pp.375-380
    • /
    • 2006
  • 각 이동통신사들은 통화품질 수준이 비슷하게 유지 됨으로써 단말기 불만이 해지의 중요한 요소로 파악되고 있으며, 기기변경을 통해 고객의 이탈을 방지하는 전략을 펴고 있으며 이는 실제 상당한 효과가 있는 것으로 알려져 있다. 또한 2004년부터 시행된 번호이동성 도입으로 기기변경에 대한 캠페인이 중요한 이슈로 부각되고 있다. 따라서 이 연구에서는 고객 분석을 통해 기기변경에 대한 욕구가 높은 고객을 선별할 수 있는 모델을 수립하고 타겟된 고객을 기변 요인에 따라 그룹핑하여 개인화된 캠페인을 실시하는 방법을 제시하고자 한다. 기존의 regression, decision tree, neural networks 등을 이용하여 기변에 큰 영향을 미치는 변수들을 선별하고, 이를 바탕으로 캠페인 성공률을 높일 수 있는 고객을 선별하게 된다.

  • PDF

Evaluation of Predictive Models for Early Identification of Dropout Students

  • Lee, JongHyuk;Kim, Mihye;Kim, Daehak;Gil, Joon-Min
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.630-644
    • /
    • 2021
  • Educational data analysis is attracting increasing attention with the rise of the big data industry. The amounts and types of learning data available are increasing steadily, and the information technology required to analyze these data continues to develop. The early identification of potential dropout students is very important; education is important in terms of social movement and social achievement. Here, we analyze educational data and generate predictive models for student dropout using logistic regression, a decision tree, a naïve Bayes method, and a multilayer perceptron. The multilayer perceptron model using independent variables selected via the variance analysis showed better performance than the other models. In addition, we experimentally found that not only grades but also extracurricular activities were important in terms of preventing student dropout.

Performance Comparison of Word Embeddings for Sentiment Classification (감성 분류를 위한 워드 임베딩 성능 비교)

  • Yoon, Hye-Jin;Koo, Jahwan;Kim, Ung-Mo
    • Annual Conference of KIPS
    • /
    • 2021.11a
    • /
    • pp.760-763
    • /
    • 2021
  • 텍스트를 자연어 처리를 위한 모델에 적용할 수 있게 언어적인 특성을 반영해서 단어를 수치화하는 방법 중 단어를 벡터로 표현하여 나타내는 워드 임베딩은 컴퓨터가 인간의 언어를 이해하고 분석 가능한 언어 모델의 필수 요소가 되었다. Word2vec 등 다양한 워드 임베딩 기법이 제안되었고 자연어를 처리할 때에 감성 분류는 중요한 요소이지만 다양한 임베딩 기법에 따른 감성 분류 모델에 대한 성능 비교 연구는 여전히 부족한 실정이다. 본 논문에서는 Emotion-stimulus 데이터를 활용하여 7가지의 감성과 2가지의 감성을 5가지의 임베딩 기법과 3종류의 분류 모델로 감성 분류 학습을 진행하였다. 감성 분류를 위해 Logistic Regression, Decision Tree, Random Forest 모델 등과 같은 보편적으로 많이 사용하는 머신러닝 분류 모델을 사용하였으며, 각각의 결과를 훈련 정확도와 테스트 정확도로 비교하였다. 실험 결과, 7가지 감성 분류 및 2가지 감성 분류 모두 사전훈련된 Word2vec가 대체적으로 우수한 정확도 성능을 보였다.

Investigating Factors Influencing University Students' Intention to Dropout based on Education Satisfaction (교육만족도 관점에서 학생의 학업중단 의도에 대한 연구)

  • Han, Dong-Wook;Kang, Min-Chae
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.11
    • /
    • pp.63-71
    • /
    • 2016
  • The purpose of this study is to investigate factors affecting dropout intention based on education satisfaction survey analysis of local J university. Total 7,248 survey data which has high trustability were analyzed. Analysis of variance was performed to verify differences between each grade and credits level. There are significant differences between the year grade and credit level. Especially the result show that the satisfaction of freshman is higher than the other grade students. To verify relation between intention to dropout and satisfaction of university education logistic regression analysis method has been applied and satisfaction of academic guidance, vocational guidance, environment of education and self-satisfaction of university life are significantly related to the dropout intention. The most important variable is self-satisfaction of university life which determine dropout intention through decision tree analysis.