• Title/Summary/Keyword: Decision Tree Regression

Search Result 328, Processing Time 0.028 seconds

Integrity Assessment Models for Bridge Structures Using Fuzzy Decision-Making (퍼지의사결정을 이용한 교량 구조물의 건전성평가 모델)

  • 안영기;김성칠
    • Journal of the Korea Concrete Institute
    • /
    • v.14 no.6
    • /
    • pp.1022-1031
    • /
    • 2002
  • This paper presents efficient models for bridge structures using CART-ANFIS (classification and regression tree-adaptive neuro fuzzy inference system). A fuzzy decision tree partitions the input space of a data set into mutually exclusive regions, each region is assigned a label, a value, or an action to characterize its data points. Fuzzy decision trees used for classification problems are often called fuzzy classification trees, and each terminal node contains a label that indicates the predicted class of a given feature vector. In the same vein, decision trees used for regression problems are often called fuzzy regression trees, and the terminal node labels may be constants or equations that specify the predicted output value of a given input vector. Note that CART can select relevant inputs and do tree partitioning of the input space, while ANFIS refines the regression and makes it continuous and smooth everywhere. Thus it can be seen that CART and ANFIS are complementary and their combination constitutes a solid approach to fuzzy modeling.

Comparison of Data Mining Classification Algorithms for Categorical Feature Variables (범주형 자료에 대한 데이터 마이닝 분류기법 성능 비교)

  • Sohn, So-Young;Shin, Hyung-Won
    • IE interfaces
    • /
    • v.12 no.4
    • /
    • pp.551-556
    • /
    • 1999
  • In this paper, we compare the performance of three data mining classification algorithms(neural network, decision tree, logistic regression) in consideration of various characteristics of categorical input and output data. $2^{4-1}$. 3 fractional factorial design is used to simulate the comparison situation where factors used are (1) the categorical ratio of input variables, (2) the complexity of functional relationship between the output and input variables, (3) the size of randomness in the relationship, (4) the categorical ratio of an output variable, and (5) the classification algorithm. Experimental study results indicate the following: decision tree performs better than the others when the relationship between output and input variables is simple while logistic regression is better when the other way is around; and neural network appears a better choice than the others when the randomness in the relationship is relatively large. We also use Taguchi design to improve the practicality of our study results by letting the relationship between the output and input variables as a noise factor. As a result, the classification accuracy of neural network and decision tree turns out to be higher than that of logistic regression, when the categorical proportion of the output variable is even.

  • PDF

A Study on a car Insurance purchase Prediction Using Two-Class Logistic Regression and Two-Class Boosted Decision Tree

  • AN, Su Hyun;YEO, Seong Hee;KANG, Minsoo
    • Korean Journal of Artificial Intelligence
    • /
    • v.9 no.1
    • /
    • pp.9-14
    • /
    • 2021
  • This paper predicted a model that indicates whether to buy a car based on primary health insurance customer data. Currently, automobiles are being used to land transportation and living, and the scope of use and equipment is expanding. This rapid increase in automobiles has caused automobile insurance to emerge as an essential business target for insurance companies. Therefore, if the car insurance sales are predicted and sold using the information of existing health insurance customers, it can generate continuous profits in the insurance company's operating performance. Therefore, this paper aims to analyze existing customer characteristics and implement a predictive model to activate advertisements for customers interested in such auto insurance. The goal of this study is to maximize the profits of insurance companies by devising communication strategies that can optimize business models and profits for customers. This study was conducted through the Microsoft Azure program, and an automobile insurance purchase prediction model was implemented using Health Insurance Cross-sell Prediction data. The program algorithm uses Two-Class Logistic Regression and Two-Class Boosted Decision Tree at the same time to compare two models and predict and compare the results. According to the results of this study, when the Threshold is 0.3, the AUC is 0.837, and the accuracy is 0.833, which has high accuracy. Therefore, the result was that customers with health insurance could induce a positive reaction to auto insurance purchases.

Comparison of Classification Models for Sequential Flight Test Results (단계별 비행훈련 성패 예측 모형의 성능 비교 연구)

  • Sohn, So-Young;Cho, Yong-Kwan;Choi, Sung-Ok;Kim, Young-Joun
    • Journal of the Ergonomics Society of Korea
    • /
    • v.21 no.1
    • /
    • pp.1-14
    • /
    • 2002
  • The main purpose of this paper is to present selection criteria for ROK Airforce pilot training candidates in order to save costs involved in sequential pilot training. We use classification models such Decision Tree, Logistic Regression and Neural Network based on aptitude test results of 288 ROK Air Force applicants in 1994-1996. Different models are compared in terms of classification accuracy, ROC and Lift-value. Neural network is evaluated as the best model for each sequential flight test result while Logistic regression model outperforms the rest of them for discriminating the last flight test result. Therefore we suggest a pilot selection criterion based on this logistic regression. Overall. we find that the factors such as Attention Sharing, Speed Tracking, Machine Comprehension and Instrument Reading Ability having significant effects on the flight results. We expect that the use of our criteria can increase the effectiveness of flight resources.

An Investigation of Factors Affecting Management Efficiency in Korean General Hospitals Using DEA Model (DEA모형을 이용한 종합병원의 효율성 측정과 영향요인)

  • Ahn, In-Whan;Yang, Dong-Hyun
    • Korea Journal of Hospital Management
    • /
    • v.10 no.1
    • /
    • pp.71-92
    • /
    • 2005
  • The purpose of this study is to analyze the efficiency in management of general hospitals and investigate the major factors on efficiency. Specifically, the management of each general hospital is evaluated by using Data Envelopment Analysis(DEA) technique which is a nonparametric statistical method for measurement of efficiency. Then, the influencing factors are investigated through analyses of Decision-Tree Model and Tobit Regression. The target hospitals were general hospitals in which bed sizes are between 200 and 500 among a total of 276 general hospitals. The main data of financial indicators were collected from 48 hospitals, and it was analyzed by using two statistical models. For Model I, three input and two output variables were used for efficiency evaluation. In particular, three input variables were the number of medical doctors, the number of paramedical personnel, and the bed size. And, two output variables were the numbers of inpatients and outpatients per year, adjusted by bed-size. The results of DEA analysis showed that only seven out of 48 hospitals(15%) turned out to be efficient. The decision-tree analysis also showed that there were six significant influencing factors for Model I. Six factors for Model I were Bed Occupancy Rate, Cost per Adjusted Inpatient, New Visit Ratio of Outpatients, Retired Ratio, Net Profit to Gross Revenues, Net Profit to Total Assets. In addition, the management efficiency of hospital is proved to increase as profit and patient-induced indicators increase and cost-related indicators decrease, by the Tobit regression model of independent variables derived from the decision-tree analysis. This study may be contributable to the development of analytic methodology regarding the efficiency of hospital management in that it suggests the synthetic measures by utilizing DEA model instead of suggesting simple ratio-analyzing results.

  • PDF

Analysis of Korean Adolescents' Life Satisfaction based on Public Database and Data Mining Techniques: Emphasis on Decision Tree (공공 DB 데이터마이닝 기법을 활용한 국내 청소년 삶의 만족도 분석에 관한 실증연구: 의사결정나무 기법을 중심으로)

  • Jo, Hyun Jin;Ko, Geo Nu;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.18 no.6
    • /
    • pp.297-309
    • /
    • 2020
  • This study focuses on the application of the data mining technique logistic regression analysis and decision tree analysis to the domestic public database called Korean Children Youth Panel Survey (KCYPS) to derive a series of important factors affecting the enhancement of life satisfaction of domestic youth. As a result, the general impact factors on life satisfaction for each grade were derived from logistic regression. Using decision tree analysis, we came to conclusions that those factors such as depression, overall grade satisfaction, household economic level, and school adaptation play crucial roles in affecting high school adolesscents' life satisfaction.

Study on Detection Technique for Cochlodinium polykrikoides Red tide using Logistic Regression Model and Decision Tree Model (로지스틱 회귀모형과 의사결정나무 모형을 이용한 Cochlodinium polykrikoides 적조 탐지 기법 연구)

  • Bak, Su-Ho;Kim, Heung-Min;Kim, Bum-Kyu;Hwang, Do-Hyun;Unuzaya, Enkhjargal;Yoon, Hong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.4
    • /
    • pp.777-786
    • /
    • 2018
  • This study propose a new method to detect Cochlodinium polykrikoides on satellite images using logistic regression and decision tree. We used spectral profiles(918) extracted from red tide, clear water and turbid water as training data. The 70% of the entire data set was extracted and used for model training, and the classification accuracy of the model was evaluated by using the remaining 30%. As a result of the accuracy evaluation, the logistic regression model showed about 97% classification accuracy, and the decision tree model showed about 86% classification accuracy.

Crop Yield and Crop Production Predictions using Machine Learning

  • Divya Goel;Payal Gulati
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.9
    • /
    • pp.17-28
    • /
    • 2023
  • Today Agriculture segment is a significant supporter of Indian economy as it represents 18% of India's Gross Domestic Product (GDP) and it gives work to half of the nation's work power. Farming segment are required to satisfy the expanding need of food because of increasing populace. Therefore, to cater the ever-increasing needs of people of nation yield prediction is done at prior. The farmers are also benefited from yield prediction as it will assist the farmers to predict the yield of crop prior to cultivating. There are various parameters that affect the yield of crop like rainfall, temperature, fertilizers, ph level and other atmospheric conditions. Thus, considering these factors the yield of crop is thus hard to predict and becomes a challenging task. Thus, motivated this work as in this work dataset of different states producing different crops in different seasons is prepared; which was further pre-processed and there after machine learning techniques Gradient Boosting Regressor, Random Forest Regressor, Decision Tree Regressor, Ridge Regression, Polynomial Regression, Linear Regression are applied and their results are compared using python programming.

Development of Forecasting Model for the Initial Sale of Apartment Using Data Mining: The Case of Unsold Apartment Complex in Wirye New Town (데이터 마이닝을 이용한 아파트 초기계약 예측모형 개발: 위례 신도시 미분양 아파트 단지를 사례로)

  • Kim, Ji Young;Lee, Sang-Kyeong
    • Journal of Digital Convergence
    • /
    • v.16 no.12
    • /
    • pp.217-229
    • /
    • 2018
  • This paper aims at applying the data mining such as decision tree, neural network, and logistic regression to an unsold apartment complex in Wirye new town and developing the model forecasting the result of initial sale contract by house unit. Raw data are divided into training data and test data. The order of predictability in training data is neural network, decision tree, and logistic regression. On the contrary, the results of test data show that logistic regression is the best model. This means that logistic regression has more data adaptability than neural network which is developed as the model optimized for training data. Determinants of initial sale are the location of floor, direction, the location of unit, the proximity of electricity and generator room, subscriber's residential region and the type of subscription. This suggests that using two models together is more effective in exploring determinants of initial sales. This paper contributes to the development of convergence field by expanding the scope of data mining.

Forecasting Energy Consumption of Steel Industry Using Regression Model (회귀 모델을 활용한 철강 기업의 에너지 소비 예측)

  • Sung-Ho KANG;Hyun-Ki KIM
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.2
    • /
    • pp.21-25
    • /
    • 2023
  • The purpose of this study was to compare the performance using multiple regression models to predict the energy consumption of steel industry. Specific independent variables were selected in consideration of correlation among various attributes such as CO2 concentration, NSM, Week Status, Day of week, and Load Type, and preprocessing was performed to solve the multicollinearity problem. In data preprocessing, we evaluated linear and nonlinear relationships between each attribute through correlation analysis. In particular, we decided to select variables with high correlation and include appropriate variables in the final model to prevent multicollinearity problems. Among the many regression models learned, Boosted Decision Tree Regression showed the best predictive performance. Ensemble learning in this model was able to effectively learn complex patterns while preventing overfitting by combining multiple decision trees. Consequently, these predictive models are expected to provide important information for improving energy efficiency and management decision-making at steel industry. In the future, we plan to improve the performance of the model by collecting more data and extending variables, and the application of the model considering interactions with external factors will also be considered.