• Title/Summary/Keyword: Predictive decision tree

Search Result 115, Processing Time 0.026 seconds

Iowa Liquor Sales Data Predictive Analysis Using Spark

  • Ankita Paul;Shuvadeep Kundu;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.31 no.2
    • /
    • pp.185-196
    • /
    • 2021
  • The paper aims to analyze and predict sales of liquor in the state of Iowa by applying machine learning algorithms to models built for prediction. We have taken recourse of Azure ML and Spark ML for our predictive analysis, which is legacy machine learning (ML) systems and Big Data ML, respectively. We have worked on the Iowa liquor sales dataset comprising of records from 2012 to 2019 in 24 columns and approximately 1.8 million rows. We have concluded by comparing the models with different algorithms applied and their accuracy in predicting the sales using both Azure ML and Spark ML. We find that the Linear Regression model has the highest precision and Decision Forest Regression has the fastest computing time with the sample data set using the legacy Azure ML systems. Decision Tree Regression model in Spark ML has the highest accuracy with the quickest computing time for the entire data set using the Big Data Spark systems.

Forecasting Energy Consumption of Steel Industry Using Regression Model (회귀 모델을 활용한 철강 기업의 에너지 소비 예측)

  • Sung-Ho KANG;Hyun-Ki KIM
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.2
    • /
    • pp.21-25
    • /
    • 2023
  • The purpose of this study was to compare the performance using multiple regression models to predict the energy consumption of steel industry. Specific independent variables were selected in consideration of correlation among various attributes such as CO2 concentration, NSM, Week Status, Day of week, and Load Type, and preprocessing was performed to solve the multicollinearity problem. In data preprocessing, we evaluated linear and nonlinear relationships between each attribute through correlation analysis. In particular, we decided to select variables with high correlation and include appropriate variables in the final model to prevent multicollinearity problems. Among the many regression models learned, Boosted Decision Tree Regression showed the best predictive performance. Ensemble learning in this model was able to effectively learn complex patterns while preventing overfitting by combining multiple decision trees. Consequently, these predictive models are expected to provide important information for improving energy efficiency and management decision-making at steel industry. In the future, we plan to improve the performance of the model by collecting more data and extending variables, and the application of the model considering interactions with external factors will also be considered.

A Comparison of Predicting Movie Success between Artificial Neural Network and Decision Tree (기계학습 기반의 영화흥행예측 방법 비교: 인공신경망과 의사결정나무를 중심으로)

  • Kwon, Shin-Hye;Park, Kyung-Woo;Chang, Byeng-Hee
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.4
    • /
    • pp.593-601
    • /
    • 2017
  • In this paper, we constructed the model of production/investment, distribution, and screening by using variables that can be considered at each stage according to the value chain stage of the movie industry. To increase the predictive power of the model, a regression analysis was used to derive meaningful variables. Based on the given variables, we compared the difference in predictive power between the artificial neural network, which is a machine learning analysis method, and the decision tree analysis method. As a result, the accuracy of artificial neural network was higher than that of decision trees when all variables were added in production/ investment model and distribution model. However, decision trees were more accurate when selected variables were applied according to regression analysis results. In the screening model, the accuracy of the artificial neural network was higher than the accuracy of the decision tree regardless of whether the regression analysis result was reflected or not. This paper has an implication which we tried to improve the performance of movie prediction model by using machine learning analysis. In addition, we tried to overcome a limitation of linear approach by reflecting the results of regression analysis to ANN and decision tree model.

Decision Tree Model for Predicting Hospice Palliative Care Use in Terminal Cancer Patients

  • Lee, Hee-Ja;Na, Im-Il;Kang, Kyung-Ah
    • Journal of Hospice and Palliative Care
    • /
    • v.24 no.3
    • /
    • pp.184-193
    • /
    • 2021
  • Purpose: This study attempted to develop clinical guidelines to help patients use hospice and palliative care (HPC) at an appropriate time after writing physician orders for life-sustaining treatment (POLST) by identifying the characteristics of HPC use of patients with terminal cancer. Methods: This retrospective study was conducted to understand the characteristics of HPC use of patients with terminal cancer through decision tree analysis. The participants were 394 terminal cancer patients who were hospitalized at a cancer-specialized hospital in Seoul, South Korea and wrote POLST from January 1, 2019 to March 31, 2021. Results: The predictive model for the characteristics of HPC use showed three main nodes (living together, pain control, and period to death after writing POLST). The decision tree analysis of HPC use by terminal cancer patients showed that the most likely group to use HPC use was terminal cancer patients who had a cohabitant, received pain control, and died 2 months or more after writing a POLST. The probability of HPC usage rate in this group was 87.5%. The next most likely group to use HPC had a cohabitant and received pain control; 64.8% of this group used HPC. Finally, 55.1% of participants who had a cohabitant used HPC, which was a significantly higher proportion than that of participants who did not have a cohabitant (1.7%). Conclusion: This study provides meaningful clinical evidence to help make decisions on HPC use more easily at an appropriate time.

Analysis of Predictive Factors for Suicidal Ideation of Adolescents Using Decision Tree Analysis (의사결정나무 분석을 이용한 청소년의 자살 생각 예측 요인 분석: 2019년 아동·청소년 인권실태조사를 중심으로)

  • Han, Myeunghee
    • Journal of Korean Public Health Nursing
    • /
    • v.36 no.2
    • /
    • pp.157-169
    • /
    • 2022
  • Purpose: This study aimed to implement a model for predicting the presence or absence of suicidal ideation in adolescents by using the decision tree analysis method. Methods: This study is a secondary data analysis using the 2019 Child and Adolescent Human Rights Survey, the most recent data published by the Korea Youth Policy Institute. In order to identify the variables predicting suicidal ideation, a decision tree analysis with suicidal ideation as a dependent variable was performed. Results: This study found that the variables of life satisfaction, insults from parents, sex, and cyber-bullying experience of adolescents were selected as significant predictors of suicidal ideation. It is predicted that 58.2% of subjects with low life satisfaction would think of suicide. Among them, the probability of thinking of suicide increased to 72.7% in the case of unhappy people, and the probability of thinking of suicide in the case of a woman increase to 82.9%. Conclusions: It is necessary to consider family, school, and society environment to prevent suicidal ideation of adolescents.

An Analysis of the Determinants of Government-Funded Defense Companies using a Decision Tree (의사결정나무를 활용한 방산육성지원 수혜기업 결정요인 분석)

  • Gowoon Jeon;Seulah Baek;Jeonghwan Jeon;Donghee Yoo
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.1
    • /
    • pp.80-93
    • /
    • 2024
  • This study attempted to analyze the factors that influence the participation of beneficiary companies in the government's defense industry promotion support project. To this end, experimental data were analyzed by constructing a prediction model consisting of highly important variables in beneficiary company decisions among various company information using the decision tree model, one of the data mining techniques. In addition, various rules were derived to determine the beneficiary companies of the government's support project using the analysis results expressed as decision trees. Three policy measures were presented based on the important rules that repeatedly appear in different predictive models to increase the effect of the government's industrial development. Using the analysis methods presented in this study and the determinants of the beneficiary companies of the government support project will help create a sustainable future defense industry growth environment.

Analysis of Students Leaving Their Majors Using Decision Tree

  • Park, Cheol-Yong;Song, Gyu-Moon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.2
    • /
    • pp.157-165
    • /
    • 2002
  • Since 1997, when a new educational system that encourages faculties instead of departments in universities is first introduced, students have much more chance to choose and leave their majors than before. As a result, colleges of basic arts and sciences confront with a serious problem since lots of students have left their majors at the colleges. In this paper, we analyze and provide a predictive model for those students in a university using decision trees.

  • PDF

A Study on Predictive Modeling of I-131 Radioactivity Based on Machine Learning (머신러닝 기반 고용량 I-131의 용량 예측 모델에 관한 연구)

  • Yeon-Wook You;Chung-Wun Lee;Jung-Soo Kim
    • Journal of radiological science and technology
    • /
    • v.46 no.2
    • /
    • pp.131-139
    • /
    • 2023
  • High-dose I-131 used for the treatment of thyroid cancer causes localized exposure among radiology technologists handling it. There is a delay between the calibration date and when the dose of I-131 is administered to a patient. Therefore, it is necessary to directly measure the radioactivity of the administered dose using a dose calibrator. In this study, we attempted to apply machine learning modeling to measured external dose rates from shielded I-131 in order to predict their radioactivity. External dose rates were measured at 1 m, 0.3 m, and 0.1 m distances from a shielded container with the I-131, with a total of 868 sets of measurements taken. For the modeling process, we utilized the hold-out method to partition the data with a 7:3 ratio (609 for the training set:259 for the test set). For the machine learning algorithms, we chose linear regression, decision tree, random forest and XGBoost. To evaluate the models, we calculated root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE) to evaluate accuracy and R2 to evaluate explanatory power. Evaluation results are as follows. Linear regression (RMSE 268.15, MSE 71901.87, MAE 231.68, R2 0.92), decision tree (RMSE 108.89, MSE 11856.92, MAE 19.24, R2 0.99), random forest (RMSE 8.89, MSE 79.10, MAE 6.55, R2 0.99), XGBoost (RMSE 10.21, MSE 104.22, MAE 7.68, R2 0.99). The random forest model achieved the highest predictive ability. Improving the model's performance in the future is expected to contribute to lowering exposure among radiology technologists.

A Prediction Model for Internet Game Addiction in Adolescents: Using a Decision Tree Analysis (의사결정나무 분석기법을 이용한 청소년의 인터넷게임 중독 영향 요인 예측 모형 구축)

  • Kim, Ki-Sook;Kim, Kyung-Hee
    • Journal of Korean Academy of Nursing
    • /
    • v.40 no.3
    • /
    • pp.378-388
    • /
    • 2010
  • Purpose: This study was designed to build a theoretical frame to provide practical help to prevent and manage adolescent internet game addiction by developing a prediction model through a comprehensive analysis of related factors. Methods: The participants were 1,318 students studying in elementary, middle, and high schools in Seoul and Gyeonggi Province, Korea. Collected data were analyzed using the SPSS program. Decision Tree Analysis using the Clementine program was applied to build an optimum and significant prediction model to predict internet game addiction related to various factors, especially parent related factors. Results: From the data analyses, the prediction model for factors related to internet game addiction presented with 5 pathways. Causative factors included gender, type of school, siblings, economic status, religion, time spent alone, gaming place, payment to Internet cafe$\acute{e}$, frequency, duration, parent's ability to use internet, occupation (mother), trust (father), expectations regarding adolescent's study (mother), supervising (both parents), rearing attitude (both parents). Conclusion: The results suggest preventive and managerial nursing programs for specific groups by path. Use of this predictive model can expand the role of school nurses, not only in counseling addicted adolescents but also, in developing and carrying out programs with parents and approaching adolescents individually through databases and computer programming.

Decision Tree of Occupational Lung Cancer Using Classification and Regression Analysis

  • Kim, Tae-Woo;Koh, Dong-Hee;Park, Chung-Yill
    • Safety and Health at Work
    • /
    • v.1 no.2
    • /
    • pp.140-148
    • /
    • 2010
  • Objectives: Determining the work-relatedness of lung cancer developed through occupational exposures is very difficult. Aims of the present study are to develop a decision tree of occupational lung cancer. Methods: 153 cases of lung cancer surveyed by the Occupational Safety and Health Research Institute (OSHRI) from 1992-2007 were included. The target variable was whether the case was approved as work-related lung cancer, and independent variables were age, sex, pack-years of smoking, histological type, type of industry, latency, working period and exposure material in the workplace. The Classification and Regression Test (CART) model was used in searching for predictors of occupational lung cancer. Results: In the CART model, the best predictor was exposure to known lung carcinogens. The second best predictor was 8.6 years or higher latency and the third best predictor was smoking history of less than 11.25 pack-years. The CART model must be used sparingly in deciding the work-relatedness of lung cancer because it is not absolute. Conclusion: We found that exposure to lung carcinogens, latency and smoking history were predictive factors of approval for occupational lung cancer. Further studies for work-relatedness of occupational disease are needed.