• Title/Summary/Keyword: decision tree regression

Search Result 324, Processing Time 0.029 seconds

A prediction model for adolescents' skipping breakfast using the CART algorithm for decision trees: 7th (2016-2018) Korea National Health and Nutrition Examination Survey (의사결정나무 CART 알고리즘을 이용한 청소년 아침결식 예측 모형: 제7기 (2016-2018년) 국민건강영양조사 자료분석)

  • Sun A Choi;Sung Suk Chung;Jeong Ok Rho
    • Journal of Nutrition and Health
    • /
    • v.56 no.3
    • /
    • pp.300-314
    • /
    • 2023
  • Purpose: This study sought to predict the reasons for skipping breakfast by adolescents aged 13-18 years using the 7th Korea National Health and Nutrition Examination Survey (KNHANES). Methods: The participants included 1,024 adolescents. The data were analyzed using a complex-sample t-test, the Rao Scott χ2-test, and the classification and regression tree (CART) algorithm for decision tree analysis with SPSS v. 27.0. The participants were divided into two groups, one regularly eating breakfast and the other skipping it. Results: A total of 579 and 445 study participants were found to be breakfast consumers and breakfast skippers respectively. Breakfast consumers were significantly younger than those who skipped breakfast. In addition, breakfast consumers had a significantly higher frequency of eating dinner, had been taught about nutrition, and had a lower frequency of eating out. The breakfast skippers did so to lose weight. Children who skipped breakfast consumed less energy, carbohydrates, proteins, fats, fiber, cholesterol, vitamin C, vitamin A, calcium, vitamin B1, vitamin B2, phosphorus, sodium, iron, potassium, and niacin than those who consumed breakfast. The best predictor of skipping breakfast was identifying adolescents who sought to control their weight by not eating meals. Other participants who had low and middle-low household incomes, ate dinner 3-4 times a week, were more than 14.5 years old, and ate out once a day showed a higher frequency of skipping breakfast. Conclusion: Based on these results, nutrition education targeted at losing weight correctly and emphasizing the importance of breakfast, especially for adolescents, is required. Moreover, nutrition educators should consider designing and implementing specific action plans to encourage adolescents to improve their breakfast-eating practices by also eating dinner regularly and reducing eating out.

A Study on the Financial Strength of Households on House Investment Demand (가계 재무건전성이 주택투자수요에 미치는 영향에 관한 연구)

  • Rho, Sang-Youn;Yoon, Bo-Hyun;Choi, Young-Min
    • Journal of Distribution Science
    • /
    • v.12 no.4
    • /
    • pp.31-39
    • /
    • 2014
  • Purpose - This study investigates the following two issues. First, we attempt to find the important determinants of housing investment and to identify their significance rank using survey panel data. Recently, the expansion of global uncertainty in the real estate market has directly and indirectly influenced the Korean housing market; households demonstrate a sensitive reaction to changes in that market. Therefore, this study aims to draw conclusions from understanding how the impact of financial strength of the household is related to house investment. Second, we attempt to verify the effectiveness of diverse indices of financial strength such as DTI, LTV, and PIR as measures to monitor the housing market. In the continuous housing market recession after the global crisis, the government places top priority on residence stability. However, the government still imposes forceful restraints on indices of financial strength. We believe this study verifies the utility of these regulations when used in the housing market. Research design, data, and methodology - The data source for this study is the "National Survey of Tax and Benefit" from 2007 (1st) to 2011 (5th) by the Korea Institute of Public Finance. Based on this survey data, we use panel data of 3,838 households that have been surveyed continuously for 5 years. We sort the base variables according to relevance of house investment criteria using the decision tree model (DTM), which is the standard decision-making model for data-mining techniques. The DTM method is known as a powerful methodology to identify contributory variables for predictive power. In addition, we analyze how important explanatory variables and the financial strength index of households affect housing investment with the binary logistic multi-regressive model. Based on the analyses, we conclude that the financial strength index has a significant role in house investment demand. Results - The results of this research are as follows: 1) The determinants of housing investment are age, consumption expenditures, income, total assets, rent deposit, housing price, habits satisfaction, housing scale, number of household members, and debt related to housing. 2) The impact power of these determinants has changed more or less annually due to economic situations and housing market conditions. The level of consumption expenditure and income are the main determinants before 2009; however, the determinants of housing investment changed to indices of the financial strength of households, i.e., DTI, LTV, and PIR, after 2009. 3) Most of all, since 2009, housing loans has been a more important variable than the level of consumption in making housing market decisions. Conclusions - The results of this research show that sound financing of households has a stronger effect on housing investment than reduced consumption expenditures. At the same time, the key indices that must be monitored by the government under economic emergency conditions differ from those requiring monitoring under normal market conditions; therefore, political indices to encourage and promote the housing market must be divided based on market conditions.

Analysis of Enactment and Utilization of Korean Industrial Standards(KS) by Time Series Data Mining (시계열 자료의 데이터마이닝을 통한 한국산업표준의 제정과 활용 분석)

  • Yoon, Jaekwon;Kim, Wan;Lee, Heesang
    • Journal of Technology Innovation
    • /
    • v.23 no.3
    • /
    • pp.225-253
    • /
    • 2015
  • The standard is a nation's one of the most important industrial issues that improve the social and economic efficiency and also the basis of the industrial development and trade liberalization. This research analyzes the enactment and the utilization of Korean industrial standards(KS) of various industries. This paper examines Korean industries' KS utilization status based on the KS possession, enactments and inquiry records. First, we implement multidimensional scaling method to visualize and group the KS possession records and the nation's institutional issues. We develop several hypothesis to find the decision factors of how each group's KS possession status impacts on the standard enactment activities of similar industry sectors, and analyzes the data by implementing regression analysis. The results show that the capital intensity, R&D activities and sales revenues affect standardization activities. It suggests that the government should encourage companies with high capital intensity, sales revenues to lead the industry's standard activities, and link the policies with the industry's standard and patent related activities from R&D. Second, we analyze the impacts of each KS data's inquiry records, the year of enactments, the form and the industrial segment on the utilization status by implementing statistical analysis and decision tree method. The results show that the enactment year has significant impact on the KS utilization status and some KSs of specific form and industrial segment have high utilization records despite of short enactment history. Our study suggests that government should make policies to utilize the low-utilized KSs and also consider the utilization of standards during the enactment processes.

Severity-Adjusted LOS Model of AMI patients based on the Korean National Hospital Discharge in-depth Injury Survey Data (퇴원손상심층조사 자료를 기반으로 한 급성심근경색환자 재원일수의 중증도 보정 모형 개발)

  • Kim, Won-Joong;Kim, Sung-Soo;Kim, Eun-Ju;Kang, Sung-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.10
    • /
    • pp.4910-4918
    • /
    • 2013
  • This study aims to design a Severity-Adjusted LOS(Length of Stay) Model in order to efficiently manage LOS of AMI(Acute Myocardial Infarction) patients. We designed a Severity-Adjusted LOS Model with using data-mining methods(multiple regression analysis, decision trees, and neural network) which covered 6,074 AMI patients who showed the diagnosis of I21 from 2004-2009 Korean National Hospital Discharge in-depth Injury Survey. A decision tree model was chosen for the final model that produced superior results. This study discovered that the execution of CABG, status at discharge(alive or dead), comorbidity index, etc. were major factors affecting a Sevirity-Adjustment of LOS of AMI patients. The difference between real LOS and adjusted LOS resulted from hospital location and bed size. The efficient management of LOS of AMI patients requires that we need to perform various activities after identifying differentiating factors. These factors can be specified by applying each hospital's data into this newly designed Severity-Adjusted LOS Model.

Dynamic Growth Model for Pinus densiflora Stands in Anmyun-Island (안면도(安眠島) 소나무 임분(林分)의 동적(動的) 생장(生長)모델)

  • Seo, Jeong-Ho;Lee, Woo-Kyun;Son, Yowhan;Ham, Bo-Young
    • Journal of Korean Society of Forest Science
    • /
    • v.90 no.6
    • /
    • pp.725-733
    • /
    • 2001
  • In this study, the relationship between growth factors for Pinus densiflora stands in Anmyun-Island was analyzed and dynamic growth model was prepared. A total of 96 sample plots was investigated in which dbh and height of individual trees were measured. From these plot data, quadratic mean dbh, mean height, dominant tree height, stem number per ha, basal area per ha and volume per ha were estimated. Several regression equations between growth factors were derived using NLIN and REG procedure of SAS. And dynamic growth model, in which the equations were interactively linked, was prepared for the prediction of stand growth and yield under different management regime. The predictions of dynamic growth model were found to be coincided with general growth principles. The dynamic growth model was considered as adequate for predicting growth and yield of Pinus densiflora stand in Anmyun-Island. In practice, the dynamic growth model can be applied for predicting the growth and development of stand for various forest treatments and for decision-making in forest management.

  • PDF

Predicting Corporate Bankruptcy using Simulated Annealing-based Random Fores (시뮬레이티드 어니일링 기반의 랜덤 포레스트를 이용한 기업부도예측)

  • Park, Hoyeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.155-170
    • /
    • 2018
  • Predicting a company's financial bankruptcy is traditionally one of the most crucial forecasting problems in business analytics. In previous studies, prediction models have been proposed by applying or combining statistical and machine learning-based techniques. In this paper, we propose a novel intelligent prediction model based on the simulated annealing which is one of the well-known optimization techniques. The simulated annealing is known to have comparable optimization performance to the genetic algorithms. Nevertheless, since there has been little research on the prediction and classification of business decision-making problems using the simulated annealing, it is meaningful to confirm the usefulness of the proposed model in business analytics. In this study, we use the combined model of simulated annealing and machine learning to select the input features of the bankruptcy prediction model. Typical types of combining optimization and machine learning techniques are feature selection, feature weighting, and instance selection. This study proposes a combining model for feature selection, which has been studied the most. In order to confirm the superiority of the proposed model in this study, we apply the real-world financial data of the Korean companies and analyze the results. The results show that the predictive accuracy of the proposed model is better than that of the naïve model. Notably, the performance is significantly improved as compared with the traditional decision tree, random forests, artificial neural network, SVM, and logistic regression analysis.

A Study on the Development of Flight Prediction Model and Rules for Military Aircraft Using Data Mining Techniques (데이터 마이닝 기법을 활용한 군용 항공기 비행 예측모형 및 비행규칙 도출 연구)

  • Yu, Kyoung Yul;Moon, Young Joo;Jeong, Dae Yul
    • The Journal of Information Systems
    • /
    • v.31 no.3
    • /
    • pp.177-195
    • /
    • 2022
  • Purpose This paper aims to prepare a full operational readiness by establishing an optimal flight plan considering the weather conditions in order to effectively perform the mission and operation of military aircraft. This paper suggests a flight prediction model and rules by analyzing the correlation between flight implementation and cancellation according to weather conditions by using big data collected from historical flight information of military aircraft supplied by Korean manufacturers and meteorological information from the Korea Meteorological Administration. In addition, by deriving flight rules according to weather information, it was possible to discover an efficient flight schedule establishment method in consideration of weather information. Design/methodology/approach This study is an analytic study using data mining techniques based on flight historical data of 44,558 flights of military aircraft accumulated by the Republic of Korea Air Force for a total of 36 months from January 2013 to December 2015 and meteorological information provided by the Korea Meteorological Administration. Four steps were taken to develop optimal flight prediction models and to derive rules for flight implementation and cancellation. First, a total of 10 independent variables and one dependent variable were used to develop the optimal model for flight implementation according to weather condition. Second, optimal flight prediction models were derived using algorithms such as logistics regression, Adaboost, KNN, Random forest and LightGBM, which are data mining techniques. Third, we collected the opinions of military aircraft pilots who have more than 25 years experience and evaluated importance level about independent variables using Python heatmap to develop flight implementation and cancellation rules according to weather conditions. Finally, the decision tree model was constructed, and the flight rules were derived to see how the weather conditions at each airport affect the implementation and cancellation of the flight. Findings Based on historical flight information of military aircraft and weather information of flight zone. We developed flight prediction model using data mining techniques. As a result of optimal flight prediction model development for each airbase, it was confirmed that the LightGBM algorithm had the best prediction rate in terms of recall rate. Each flight rules were checked according to the weather condition, and it was confirmed that precipitation, humidity, and the total cloud had a significant effect on flight cancellation. Whereas, the effect of visibility was found to be relatively insignificant. When a flight schedule was established, the rules will provide some insight to decide flight training more systematically and effectively.

The big data method for flash flood warning (돌발홍수 예보를 위한 빅데이터 분석방법)

  • Park, Dain;Yoon, Sanghoo
    • Journal of Digital Convergence
    • /
    • v.15 no.11
    • /
    • pp.245-250
    • /
    • 2017
  • Flash floods is defined as the flooding of intense rainfall over a relatively small area that flows through river and valley rapidly in short time with no advance warning. So that it can cause damage property and casuality. This study is to establish the flash-flood warning system using 38 accident data, reported from the National Disaster Information Center and Land Surface Model(TOPLATS) between 2009 and 2012. Three variables were used in the Land Surface Model: precipitation, soil moisture, and surface runoff. The three variables of 6 hours preceding flash flood were reduced to 3 factors through factor analysis. Decision tree, random forest, Naive Bayes, Support Vector Machine, and logistic regression model are considered as big data methods. The prediction performance was evaluated by comparison of Accuracy, Kappa, TP Rate, FP Rate and F-Measure. The best method was suggested based on reproducibility evaluation at the each points of flash flood occurrence and predicted count versus actual count using 4 years data.

An Exploratory Study of Fatigue Related Factors among School Personnelin Seoul by Data mining (데이터 마이닝을 이용한 서울시교직원의 피로요인 탐색연구)

  • Lee, Hui-U;Sin, Seon-Mi
    • Journal of the Korean Society of School Health
    • /
    • v.19 no.1
    • /
    • pp.79-88
    • /
    • 2006
  • Purpose : To identify general characteristics of school personnel with recent fatigue which was the most frequent symptom among subjective symptoms and to explore fatigue-related factors by evaluating physical and perceived health status, life style, and symptoms through data mining techniques. Methods : We collected a data of the 1,147(male 545, female 602) who were elementary, middle, or high school personnel, answered a questionnaire, and received physical examination in Seoul School Health Center from September to November in 2000. And we investigated the differences between fatigue group and non-fatigue group for demographic characteristics, physical health status, perceived health status, symptoms, and laboratory values by frequency, chi-square test, t-test, or simple logistic regression analysis by SAS package 8.1, and then selected significant variables as input variables of a decision tree analysis of CART model by SAS E-miner. Results : In general characteristics, the fatigue consisted of 41.1%(male 35.2%, female 46.4%) among 1,147 school personnel. In classical statistics, factors related with fatigue were female, lower means of systolic and diastolic pressure, young age, personnel in middle school, irregular eating habit, no exercise a week or less than 30minutes exercise a day, perception of unhealthy status, and subjective symptoms including short of breath at exercise. In simple logistic regression to examine the relationship between selected independent variables and fatigue as a dependent variable, the odds ratio of gender (female vs male) was 1.58 times, and young age ( 20s vs 60s) 20.67 times, and middle vs high school personnel 1.86 times. However, we mined combined several characteristics by SAS-E miner. In CART model, if health perception was healthy, and age was >= 37.5 years, the proportion of the fatigue was only 19.3%. but if health perception was not healthy and symptom was severe 'short of breath' during exercise and age was < 53.5 years, and BMI was >= 22.69, the proportion of the fatigue was up to 84.8%. Conclusions : The fatigue consisted of 41.1%(male 35.2%, female 46.4%). In classical statistics, fatigue-related factors among school personnel were young age, female gender, perceived unhealthy status, subjective physical symptoms, poor life-style, and lower blood pressure rather than only physical health status. However, in data mining, if health perception was healthy and age was >= 37.5 years, the proportion of the fatigue was only 19.3%. but if health perception was not healthy and symptom was severe 'short of breath' during exercise and age was < 53.5 years, and BMI was >= 22.69, the proportion of the fatigue was up to 84.8%.

Performance Comparison of Machine Learning based Prediction Models for University Students Dropout (머신러닝 기반 대학생 중도 탈락 예측 모델의 성능 비교)

  • Seok-Bong Jeong;Du-Yon Kim
    • Journal of the Korea Society for Simulation
    • /
    • v.32 no.4
    • /
    • pp.19-26
    • /
    • 2023
  • The increase in the dropout rate of college students nationwide has a serious negative impact on universities and society as well as individual students. In order to proactive identify students at risk of dropout, this study built a decision tree, random forest, logistic regression, and deep learning-based dropout prediction model using academic data that can be easily obtained from each university's academic management system. Their performances were subsequently analyzed and compared. The analysis revealed that while the logistic regression-based prediction model exhibited the highest recall rate, its f-1 value and ROC-AUC (Receiver Operating Characteristic - Area Under the Curve) value were comparatively lower. On the other hand, the random forest-based prediction model demonstrated superior performance across all other metrics except recall value. In addition, in order to assess model performance over distinct prediction periods, we divided these periods into short-term (within one semester), medium-term (within two semesters), and long-term (within three semesters). The results underscored that the long-term prediction yielded the highest predictive efficacy. Through this study, each university is expected to be able to identify students who are expected to be dropped out early, reduce the dropout rate through intensive management, and further contribute to the stabilization of university finances.