• Title/Summary/Keyword: decision tree regression

Search Result 323, Processing Time 0.032 seconds

Decision Tree of Occupational Lung Cancer Using Classification and Regression Analysis

  • Kim, Tae-Woo;Koh, Dong-Hee;Park, Chung-Yill
    • Safety and Health at Work
    • /
    • v.1 no.2
    • /
    • pp.140-148
    • /
    • 2010
  • Objectives: Determining the work-relatedness of lung cancer developed through occupational exposures is very difficult. Aims of the present study are to develop a decision tree of occupational lung cancer. Methods: 153 cases of lung cancer surveyed by the Occupational Safety and Health Research Institute (OSHRI) from 1992-2007 were included. The target variable was whether the case was approved as work-related lung cancer, and independent variables were age, sex, pack-years of smoking, histological type, type of industry, latency, working period and exposure material in the workplace. The Classification and Regression Test (CART) model was used in searching for predictors of occupational lung cancer. Results: In the CART model, the best predictor was exposure to known lung carcinogens. The second best predictor was 8.6 years or higher latency and the third best predictor was smoking history of less than 11.25 pack-years. The CART model must be used sparingly in deciding the work-relatedness of lung cancer because it is not absolute. Conclusion: We found that exposure to lung carcinogens, latency and smoking history were predictive factors of approval for occupational lung cancer. Further studies for work-relatedness of occupational disease are needed.

A Study on Decision Factors Affecting Utilization of Elderly Welfare Center: Focus on Gimpo City (노인복지관 이용 결정요인에 관한 연구: 김포시 노인을 중심으로)

  • Won, Il;Kim, Keunhong;Kim, SungHyun
    • 한국노년학
    • /
    • v.38 no.2
    • /
    • pp.351-364
    • /
    • 2018
  • The purpose of this study is to learn about the decision factors affecting utilization of elderly welfare center of the elderly living in Gimpo city. The reason of the study is that the elderly welfare center as a provider of general welfare services could not only thinking about the state policy but also need to consider about the inherent role and function of the elderly. Especially for these elders living in rural areas, although the number of elderly welfare centers of the whole country has greatly increased in last 10 years, the effect and function of the facility are almost the same and they are still lack of leisure activities. This issue become a serious problem nowadays. For the above reasons, this article conducts a social survey of 360 elderly people over the age of 65 who lives in the Gimpo city which is a rural-urban type city. The research method is to examine the relationship between the predisposing factors, enabling factors and need factors of Andersen's behavior model with binary logistic regression analysis and the decision tree analysis. The result of binary logistic regression shows the most of factors of Andersen's model is significant. The factors of age, gender, education level in predisposing factors; monthly income in enabling factors and the reserve for old life, the preparation of economic activity for old life in need factors are significant. Then the result of decision tree analysis shows the interaction between factors; when the education level in predisposing factors is higher, the possibility of using of elderly welfare center becomes bigger. Also as the level of healthy promoting preparation in the need factors gets lower, the possibility of using of elderly welfare center still becomes bigger. Although differences were found in the interpretation of the results of regression analysis and decision tree analysis, the results of this study can still provide support for the necessity of elderly welfare centers providing integrated welfare services.

Prediction Models of Mild Cognitive Impairment Using the Korea Longitudinal Study of Ageing (고령화연구패널조사를 이용한 경도인지장애 예측모형)

  • Park, Hyojin;Ha, Juyoung
    • Journal of Korean Academy of Nursing
    • /
    • v.50 no.2
    • /
    • pp.191-199
    • /
    • 2020
  • Purpose: The purpose of this study was to compare sociodemographic characteristics of a normal cognitive group and mild cognitive impairment group, and establish prediction models of Mild Cognitive Impairment (MCI). Methods: This study was a secondary data analysis research using data from "the 4th Korea Longitudinal Study of Ageing" of the Korea Employment Information Service. A total of 6,405 individuals, including 1,329 individuals with MCI and 5,076 individuals with normal cognitive abilities, were part of the study. Based on the panel survey items, the research used 28 variables. The methods of analysis included a χ2-test, logistic regression analysis, decision tree analysis, predicted error rate, and an ROC curve calculated using SPSS 23.0 and SAS 13.2. Results: In the MCI group, the mean age was 71.4 and 65.8% of the participants was women. There were statistically significant differences in gender, age, and education in both groups. Predictors of MCI determined by using a logistic regression analysis were gender, age, education, instrumental activity of daily living (IADL), perceived health status, participation group, cultural activities, and life satisfaction. Decision tree analysis of predictors of MCI identified education, age, life satisfaction, and IADL as predictors. Conclusion: The accuracy of logistic regression model for MCI is slightly higher than that of decision tree model. The implementation of the prediction model for MCI established in this study may be utilized to identify middle-aged and elderly people with risks of MCI. Therefore, this study may contribute to the prevention and reduction of dementia.

The Construction Methodology of a Rule-based Expert System using CART-based Decision Tree Method (CART 알고리즘 기반의 의사결정트리 기법을 이용한 규칙기반 전문가 시스템 구축 방법론)

  • Ko, Yun-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.6
    • /
    • pp.849-854
    • /
    • 2011
  • To minimize the spreading effect from the events of the system, a rule-based expert system is very effective. However, because the events of the large-scale system are diverse and the load condition is very variable, it is very difficult to construct the rule-based expert system. To solve this problem, this paper studies a methodology which constructs a rule-based expert system by applying a CART(Classification and Regression Trees) algorithm based decision tree determination method to event case examples.

A Study on Factors of Education's Outcome using Decision Trees (의사결정트리를 이용한 교육성과 요인에 관한 연구)

  • Kim, Wan-Seop
    • Journal of Engineering Education Research
    • /
    • v.13 no.4
    • /
    • pp.51-59
    • /
    • 2010
  • In order to manage the lectures efficiently in the university and improve the educational outcome, the process is needed that make diagnosis of the present educational outcome of each classes on a lecture and find factors of educational outcome. In most studies for finding the factors of the efficient lecture, statistical methods such as association analysis, regression analysis are used usually, and recently decision tree analysis is employed, too. The decision tree analysis have the merits that is easy to understand a result model, and to be easy to apply for the decision making, but have the weaknesses that is not strong for characteristic of input data such as multicollinearity. This paper indicates the weaknesses of decision tree analysis, and suggests the experimental solution using multiple decision tree algorithm to supplement these problems. The experimental result shows that the suggested method is more effective in finding the reliable factors of the educational outcome.

  • PDF

Estimating Indoor Radio Environment Maps with Mobile Robots and Machine Learning

  • Taewoong Hwang;Mario R. Camana Acosta;Carla E. Garcia Moreta;Insoo Koo
    • International journal of advanced smart convergence
    • /
    • v.12 no.1
    • /
    • pp.92-100
    • /
    • 2023
  • Wireless communication technology is becoming increasingly prevalent in smart factories, but the rise in the number of wireless devices can lead to interference in the ISM band and obstacles like metal blocks within the factory can weaken communication signals, creating radio shadow areas that impede information exchange. Consequently, accurately determining the radio communication coverage range is crucial. To address this issue, a Radio Environment Map (REM) can be used to provide information about the radio environment in a specific area. In this paper, a technique for estimating an indoor REM usinga mobile robot and machine learning methods is introduced. The mobile robot first collects and processes data, including the Received Signal Strength Indicator (RSSI) and location estimation. This data is then used to implement the REM through machine learning regression algorithms such as Extra Tree Regressor, Random Forest Regressor, and Decision Tree Regressor. Furthermore, the numerical and visual performance of REM for each model can be assessed in terms of R2 and Root Mean Square Error (RMSE).

Prediction Model for Unpaid Customers Using Big Data (빅 데이터 기반의 체납 수용가 예측 모델)

  • Jeong, Jaean;Lee, Kyouhwan;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.827-833
    • /
    • 2020
  • In this paper, to reduce the unpaid rate of local governments, the internal data elements affecting the arrears in Water-INFOS are searched through interviews with meter readers in certain local governments. Candidate data affecting arrears from national statistical data were derived. The influence of the independent variable on the dependent variable was sampled by examining the disorder of the dependent variable in the data set called information gain. We also evaluated the higher prediction rates of decision tree and logistic regression using n-fold cross-validation. The results confirmed that the decision tree can find more accurate customer payment patterns than logistic regression. In the process of developing an analysis algorithm model using machine learning, the optimal values of two environmental variables, the minimum number of data and the maximum purity, which directly affect the complexity and accuracy of the decision tree, are derived to improve the accuracy of the algorithm.

A Study on the Prediction Models of Used Car Prices for Domestic Brands Using Machine Learning (머신러닝을 활용한 브랜드별 국내 중고차 가격 예측 모델에 관한 연구)

  • Seungjun Yim;Joungho Lee;Choonho Ryu
    • Journal of Service Research and Studies
    • /
    • v.13 no.3
    • /
    • pp.105-126
    • /
    • 2023
  • The domestic used car market continues to grow along with the used car online platform service. The used car online platform service discloses vehicle specifications, accident history, inspection history, and detailed options to service consumers. Most of the preceding studies were predictions of used car prices using vehicle specifications and some options for vehicles. As a result of the study, it was confirmed that there was a nonlinear relationship between used car prices and some specification variables. Accordingly, the researchers tried to solve the nonlinear problem by executing a Machine Learning model. In common, the Regression based Machine Learning model had the advantage of knowing the actual influence and direction of variables, but there was a disadvantage of low Cost Function figures compared to the Decision Tree based Machine Learning model. This study attempted to predict used car prices of six domestic brands by utilizing both vehicle specifications and vehicle options. Through this, we tried to collect the advantages of the two types of Machine Learning models. To this end, we sequentially conducted a regression based Machine Learning model and a decision tree based Machine Learning model. As a result of the analysis, the practical influence and direction of each brand variable, and the best tree based Machine Learning model were selected. The implications of this study are as follows. It will help buyers and sellers who use used car online platform services to predict approximate used car prices. And it is hoped that it will help solve the problem caused by information inequality among users of the used car online platform service.

An application to Multivariate Zero-Inflated Poisson Regression Model

  • Kim, Kyung-Moo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.177-186
    • /
    • 2003
  • The Zero-Inflated Poisson regression is a model for count data with exess zeros. When the correlated response variables are intrested, we have to extend the univariate zero-inflated regression model to multivariate model. In this paper, we study and simulate the multivariate zero-inflated regression model. A real example was applied to this model. Regression parameters are estimated by using MLE's. We also compare the fitness of multivariate zero-inflated Poisson regression model with the decision tree model.

  • PDF

An application to Zero-Inflated Poisson Regression Model

  • Kim, Kyung-Moo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.1
    • /
    • pp.45-53
    • /
    • 2003
  • The Zero-Inflated Poisson regression is a model for count data with exess zeros. When the reponse variables have excess zeros, it is not easy to apply the Poisson regression model. In this paper, we study and simulate the zero-inflated Poisson regression model. An real example was applied to this model. Regression parameters are estimated by using MLE's. We also compare the fitness of zero-inflated Poisson model with the Poisson regression and decision tree model.

  • PDF