• Title/Summary/Keyword: tree based learning

Search Result 435, Processing Time 0.031 seconds

Performance Comparison of Machine Learning based Prediction Models for University Students Dropout (머신러닝 기반 대학생 중도 탈락 예측 모델의 성능 비교)

  • Seok-Bong Jeong;Du-Yon Kim
    • Journal of the Korea Society for Simulation
    • /
    • v.32 no.4
    • /
    • pp.19-26
    • /
    • 2023
  • The increase in the dropout rate of college students nationwide has a serious negative impact on universities and society as well as individual students. In order to proactive identify students at risk of dropout, this study built a decision tree, random forest, logistic regression, and deep learning-based dropout prediction model using academic data that can be easily obtained from each university's academic management system. Their performances were subsequently analyzed and compared. The analysis revealed that while the logistic regression-based prediction model exhibited the highest recall rate, its f-1 value and ROC-AUC (Receiver Operating Characteristic - Area Under the Curve) value were comparatively lower. On the other hand, the random forest-based prediction model demonstrated superior performance across all other metrics except recall value. In addition, in order to assess model performance over distinct prediction periods, we divided these periods into short-term (within one semester), medium-term (within two semesters), and long-term (within three semesters). The results underscored that the long-term prediction yielded the highest predictive efficacy. Through this study, each university is expected to be able to identify students who are expected to be dropped out early, reduce the dropout rate through intensive management, and further contribute to the stabilization of university finances.

A research on the key factors for classification of diabetes based on random forest

  • Shin, Yong sub;Lee, Namju;Hwang, Chigon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.102-107
    • /
    • 2020
  • Recently, the number of people visiting the hospital is increasing due to diabetes. According to the Korean Diabetes Association, statistically, 1 in 7 adults over the age of 30 are suffering from diabetes. As such, diabetes is one of the most common diseases among modern people. In this paper, in addition to blood sugar, which is widely used for diabetes awareness, BMI, which is known to be related to diabetes, triglycerides and cholesterol that cause various complications in diabetics it was studied using random forest techniques and decision trees known to be effective for classification. The importance of each element was confirmed using the results and characteristic importance derived using two techniques. Through this, we studied the diabetes-related relationship between BMI, triglyceride, and cholesterol as well as blood sugar, a factor that diabetic patients should pay much attention to.

Performance Analysis of Opinion Mining using Word2vec (Word2vec을 이용한 오피니언 마이닝 성과분석 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.7-8
    • /
    • 2018
  • This study proposes an analysis of the Word2vec-based machine learning classifiers for the sake of opinion mining tasks. As a bench-marking method, BOW (Bag-of-Words) was adopted. On the basis of utilizing the Word2vec and BOW as feature extraction methods, we applied Laptop and Restaurant dataset to LR, DT, SVM, RF classifiers. The results showed that the Word2vec feature extraction yields more improved performance.

  • PDF

A Detailed Analysis of Classifier Ensembles for Intrusion Detection in Wireless Network

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1203-1212
    • /
    • 2017
  • Intrusion detection systems (IDSs) are crucial in this overwhelming increase of attacks on the computing infrastructure. It intelligently detects malicious and predicts future attack patterns based on the classification analysis using machine learning and data mining techniques. This paper is devoted to thoroughly evaluate classifier ensembles for IDSs in IEEE 802.11 wireless network. Two ensemble techniques, i.e. voting and stacking are employed to combine the three base classifiers, i.e. decision tree (DT), random forest (RF), and support vector machine (SVM). We use area under ROC curve (AUC) value as a performance metric. Finally, we conduct two statistical significance tests to evaluate the performance differences among classifiers.

A Study on Methods to Prevent Pima Indians Diabetes using SVM

  • YOU, Sanghyuck;KANG, Minsoo
    • Korean Journal of Artificial Intelligence
    • /
    • v.8 no.2
    • /
    • pp.7-10
    • /
    • 2020
  • In this paper, a study was conducted to find main factorsto Pima Indians Diabetes based on machine learning. Diabetes is a type of metabolic disease such as insufficient secretion of insulin or inability to function normally and is characterized by a high blood glucose concentration. According to a situation report from WHO(World Health Organization), Diabetes is a chronic, metabolic disease characterized by elevated levels of blood glucose (or blood sugar), which leads over time to serious damage to the heart, blood vessels, eyes, kidneys and nerves. And also about 422 million people worldwide have diabetes, the majority living in low-and middle-income countries, and 1.6 million deaths are directly attributed to diabetes each year. Both the number of cases and the prevalence of diabetes have been steadily increasing over the past few decades. Therefore, in this study, we used Support Vector Machine (SVM), Decision Tree, and correlation analysisto discover three important factorsthat predict Pima Indians diabetes with 70% accuracy. Applying the results suggested in this paper, doctors can quickly diagnose potential Pima Indians diabetics and prevent Pima Indians diabetes.

Random Forest Classifier-based Ship Type Prediction with Limited Ship Information of AIS and V-Pass

  • Jeon, Ho-Kun;Han, Jae Rim
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.4
    • /
    • pp.435-446
    • /
    • 2022
  • Identifying ship types is an important process to prevent illegal activities on territorial waters and assess marine traffic of Vessel Traffic Services Officer (VTSO). However, the Terrestrial Automatic Identification System (T-AIS) collected at the ground station has over 50% of vessels that do not contain the ship type information. Therefore, this study proposes a method of identifying ship types through the Random Forest Classifier (RFC) from dynamic and static data of AIS and V-Pass for one year and the Ulsan waters. With the hypothesis that six features, the speed, course, length, breadth, time, and location, enable to estimate of the ship type, four classification models were generated depending on length or breadth information since 81.9% of ships fully contain the two information. The accuracy were average 96.4% and 77.4% in the presence and absence of size information. The result shows that the proposed method is adaptable to identifying ship types.

Machine Learning Based Blog Text Opinion Classification System Using Opinion Word Centered-Dependency Tree Pattern Features (의견어중심의 의존트리패턴자질을 이용한 기계학습기반 한국어 블로그 문서 의견분류시스템)

  • Kwak, Dong-Min;Lee, Seung-Wook
    • Annual Conference of KIPS
    • /
    • 2009.11a
    • /
    • pp.337-338
    • /
    • 2009
  • 블로그문서의 의견극성분류 연구는 주로 기계학습기법에 기반한 방법이었고, 이때 주로 활용된 자질은 명사, 동사 등의 품사정보와 의견어 어휘정보였다. 하지만 하나의 의견어 어휘만을 고려한다면 그 극성을 판별하는데 필요한 정보가 충분하지 않아 부정확한 결과를 도출하는 경우가 발생할 수 있다. 본 논문에서는 여러 어휘를 동시에 고려하였을 때 보다 정확한 의견분류를 수행할 수 있을 것이라는 가정을 세웠다. 본 논문에서는 효과적인 의견어휘자질의 추출을 위하여 의견이 내포될 가능성이 높은 의견어휘를 기반으로 의존구문분석을 통해 의존트리패턴을 추출하였고, 제안하는 PF-IDF가중치를 적용하여 지지벡터기계(SVM)와 다항시행접근 단순베이지안(MNNB)알고리즘으로 비교 실험을 수행하였다. 기준시스템인 TF-IDF가중치 기법에 비해 정확도(accuracy)가 지지벡터기계에서 5%, 다항시행접근 단순베이지안에서 8.9% 향상된 성능을 보였다.

A Study on the Prediction of Mortality Rate after Lung Cancer Diagnosis for the Elderly in their 80s and 90s Based on Deep Learning (딥러닝 기반 80대·90대 노령자 대상 폐암 진단 후 사망률 예측에 관한 연구)

  • Byun, Kyungkeun;Lee, Deoggyu;Shin, Youngtae
    • Annual Conference of KIPS
    • /
    • 2022.05a
    • /
    • pp.452-455
    • /
    • 2022
  • 4차 산업혁명의 확산으로 의학계에서도 딥러닝 기술을 이용한 질병의 치료결과 예측 연구가 활발하다. 이와 관련, 일부 연구에서 국소적인 환자 데이터의 활용으로 인해 도출된 연구 결과의 일반화가 어려웠으며 예측률 제고를 위해 특정 딥러닝 알고리즘을 중심으로 한 실험이 추진되어 다양한 알고리즘별 예측률의 비교·분석 결과를 제시하는 연구도 미흡하였다. 이에, 건강보험심사평가원의 대규모 진료 정보와 다종의 알고리즘을 제공하는 AutoML을 이용, 사망률이 높은 80대·90대 노령자 대상 폐암 진단 후 84개월간의 사망률을 예측하는 Decision Tree 등 5개 알고리즘별 모델을 생성하고 이를 활용, 사망률의 예측 성능을 비교하고 사망률에 영향을 미치는 요인에 대한 분석 결과를 도출하였다.

Food Exchange Table Organization Model Based on Decision Tree Using Machine Learning (머신러닝을 이용한 의사결정트리 기반의 식품교환표 구성 모델)

  • Kim, JiYun;Lee, Sangmin;Jeon, Hyeongjun;Kim, Gaeun;Kim, Ji-Hyun;Park, Naeun;Jin, ChangGyun;Kwon, Jin young;Kim Jongwan
    • Annual Conference of KIPS
    • /
    • 2020.11a
    • /
    • pp.680-684
    • /
    • 2020
  • 최근 국내에서는 식품에 대한 관심도가 높아짐에 따라 먹거리에 건강·환경·미래지향적 가치가 부여되고 있으며 식품 산업에서도 신규 식품 개발이 증가하는 추세이다. 식단을 구성할 때 기준이 되는 식품교환표는 개정과정에서 많은 인력과 시간이 소요되기 때문에 식품 섭취 변화를 신속하게 반영하기 어렵다. 본 논문에서는 식품교환표의 활용도를 높이기 위한 식품교환표 갱신 기법을 제안한다. 제안 기법은 의사결정트리 모델을 학습하여 새롭게 추가된 식품의 정보를 바탕으로 식품군을 분류하여 식품교환표를 갱신한다. 이는 영양 관리가 필요한 당뇨병 환자 등에게 실용적이며 기호성·다양성이 높은 식단을 구성하는 데 도움을 준다.

Machine Learning based Fall Detection (기계학습 기반의 낙상 검출)

  • Kim, InKyung;Kim, DaeHee;Heo, Seongsil;Lee, JaeKoo
    • Annual Conference of KIPS
    • /
    • 2020.05a
    • /
    • pp.547-550
    • /
    • 2020
  • 노인인구의 급증에 따라 노인 건강에 대한 관심이 증가하였고 노인 낙상을 발견하는 방법에 대한 관심도 함께 대두되기 시작하였다. 낙상 사고의 경우 낙상을 일으킨 원인보다 낙상이 제때 감지되지 않아 발생하는 이후의 상황이 더욱 심각한 결과를 초래한다. 따라서 낙상이 발생했을 때, 바로 낙상을 감지할 수 있는 시스템 구축이 필요하다. 다양한 낙상 검출을 위한 방법이 존재하지만 그 중 착용이 쉽고 원격지에서 관찰 및 관리가 가능한 웨어러블(Wearable) 기기의 센서 데이터를 사용한 낙상 검출을 진행하였다. 본 논문에서는 머신 러닝 모델들을 사용해서 낙상 검출 성능 비교 및 적절한 모델을 제안한다. 기계 학습 기반의 모델인 결정 트리(Decision Tree), 랜덤 포래스트(Random Forest), SVM(Support Vector Machine)을 사용하여 실제 측정된 데이터에 낙상 검출 학습 능력을 정량화하였다. 또한, 모델의 입력 값에 적용한 데이터 분할, 전처리 및 특징 추출 방법을 통해서 효율적인 낙상 검출을 위한 기계학습 관점에서의 타당성을 판단하고자 한다.