• 제목/요약/키워드: decision tree regression

검색결과 323건 처리시간 0.028초

퍼지의사결정을 이용한 교량 구조물의 건전성평가 모델 (Integrity Assessment Models for Bridge Structures Using Fuzzy Decision-Making)

  • 안영기;김성칠
    • 콘크리트학회논문집
    • /
    • 제14권6호
    • /
    • pp.1022-1031
    • /
    • 2002
  • 본 연구에서는 분규ㆍ회귀목-적응 뉴고 퍼지추론 시스템을 사용하여 교량 구조물에 대한 유용한 모델을 제시하였다. 퍼지결정목은 데이터집합의 입력영역이 서로 다른 영역으로 분류되고 하나의 부호나 값으로 나타내지며 데이터 정점에서 특정화시키기 위한 활동영역으로 할당되기도 한다. 분류문제로 사용되는 결정목은 가끔 퍼지결정목이라고 불려지는데, 각 최종점은 주어진 특정백터의 예측등급을 나타낸다. 회귀문제에 사용되는 결정목을 가끔 퍼지회귀목이라고 하는데, 이 때 최종점 영역은 주어진 입력백터의 예측 출력 값을 상수나 방정식으로 나타낼 수 있다. 분류ㆍ회귀목은 관련된 입력값을 선택하여 입력구역에서 분류 할 수 있는 반면에 적응 뉴로 퍼지추론 시스템은 회귀문제를 수정하고 이틀의 회귀문제를 보다 연속적이면서 간략하게 만들 수 있음을 주목해야 한다. 따라서 분류ㆍ회귀목과 적응 뉴로 퍼지추론 시스템은 서로 상보적인 것이며, 이들의 조합은 퍼지모델링을 위해 실직적인 근사식으로 구성된다.

범주형 자료에 대한 데이터 마이닝 분류기법 성능 비교 (Comparison of Data Mining Classification Algorithms for Categorical Feature Variables)

  • 손소영;신형원
    • 산업공학
    • /
    • 제12권4호
    • /
    • pp.551-556
    • /
    • 1999
  • In this paper, we compare the performance of three data mining classification algorithms(neural network, decision tree, logistic regression) in consideration of various characteristics of categorical input and output data. $2^{4-1}$. 3 fractional factorial design is used to simulate the comparison situation where factors used are (1) the categorical ratio of input variables, (2) the complexity of functional relationship between the output and input variables, (3) the size of randomness in the relationship, (4) the categorical ratio of an output variable, and (5) the classification algorithm. Experimental study results indicate the following: decision tree performs better than the others when the relationship between output and input variables is simple while logistic regression is better when the other way is around; and neural network appears a better choice than the others when the randomness in the relationship is relatively large. We also use Taguchi design to improve the practicality of our study results by letting the relationship between the output and input variables as a noise factor. As a result, the classification accuracy of neural network and decision tree turns out to be higher than that of logistic regression, when the categorical proportion of the output variable is even.

  • PDF

A Study on a car Insurance purchase Prediction Using Two-Class Logistic Regression and Two-Class Boosted Decision Tree

  • AN, Su Hyun;YEO, Seong Hee;KANG, Minsoo
    • 한국인공지능학회지
    • /
    • 제9권1호
    • /
    • pp.9-14
    • /
    • 2021
  • This paper predicted a model that indicates whether to buy a car based on primary health insurance customer data. Currently, automobiles are being used to land transportation and living, and the scope of use and equipment is expanding. This rapid increase in automobiles has caused automobile insurance to emerge as an essential business target for insurance companies. Therefore, if the car insurance sales are predicted and sold using the information of existing health insurance customers, it can generate continuous profits in the insurance company's operating performance. Therefore, this paper aims to analyze existing customer characteristics and implement a predictive model to activate advertisements for customers interested in such auto insurance. The goal of this study is to maximize the profits of insurance companies by devising communication strategies that can optimize business models and profits for customers. This study was conducted through the Microsoft Azure program, and an automobile insurance purchase prediction model was implemented using Health Insurance Cross-sell Prediction data. The program algorithm uses Two-Class Logistic Regression and Two-Class Boosted Decision Tree at the same time to compare two models and predict and compare the results. According to the results of this study, when the Threshold is 0.3, the AUC is 0.837, and the accuracy is 0.833, which has high accuracy. Therefore, the result was that customers with health insurance could induce a positive reaction to auto insurance purchases.

단계별 비행훈련 성패 예측 모형의 성능 비교 연구 (Comparison of Classification Models for Sequential Flight Test Results)

  • 손소영;조용관;최성옥;김영준
    • 대한인간공학회지
    • /
    • 제21권1호
    • /
    • pp.1-14
    • /
    • 2002
  • The main purpose of this paper is to present selection criteria for ROK Airforce pilot training candidates in order to save costs involved in sequential pilot training. We use classification models such Decision Tree, Logistic Regression and Neural Network based on aptitude test results of 288 ROK Air Force applicants in 1994-1996. Different models are compared in terms of classification accuracy, ROC and Lift-value. Neural network is evaluated as the best model for each sequential flight test result while Logistic regression model outperforms the rest of them for discriminating the last flight test result. Therefore we suggest a pilot selection criterion based on this logistic regression. Overall. we find that the factors such as Attention Sharing, Speed Tracking, Machine Comprehension and Instrument Reading Ability having significant effects on the flight results. We expect that the use of our criteria can increase the effectiveness of flight resources.

DEA모형을 이용한 종합병원의 효율성 측정과 영향요인 (An Investigation of Factors Affecting Management Efficiency in Korean General Hospitals Using DEA Model)

  • 안인환;양동현
    • 한국병원경영학회지
    • /
    • 제10권1호
    • /
    • pp.71-92
    • /
    • 2005
  • The purpose of this study is to analyze the efficiency in management of general hospitals and investigate the major factors on efficiency. Specifically, the management of each general hospital is evaluated by using Data Envelopment Analysis(DEA) technique which is a nonparametric statistical method for measurement of efficiency. Then, the influencing factors are investigated through analyses of Decision-Tree Model and Tobit Regression. The target hospitals were general hospitals in which bed sizes are between 200 and 500 among a total of 276 general hospitals. The main data of financial indicators were collected from 48 hospitals, and it was analyzed by using two statistical models. For Model I, three input and two output variables were used for efficiency evaluation. In particular, three input variables were the number of medical doctors, the number of paramedical personnel, and the bed size. And, two output variables were the numbers of inpatients and outpatients per year, adjusted by bed-size. The results of DEA analysis showed that only seven out of 48 hospitals(15%) turned out to be efficient. The decision-tree analysis also showed that there were six significant influencing factors for Model I. Six factors for Model I were Bed Occupancy Rate, Cost per Adjusted Inpatient, New Visit Ratio of Outpatients, Retired Ratio, Net Profit to Gross Revenues, Net Profit to Total Assets. In addition, the management efficiency of hospital is proved to increase as profit and patient-induced indicators increase and cost-related indicators decrease, by the Tobit regression model of independent variables derived from the decision-tree analysis. This study may be contributable to the development of analytic methodology regarding the efficiency of hospital management in that it suggests the synthetic measures by utilizing DEA model instead of suggesting simple ratio-analyzing results.

  • PDF

공공 DB 데이터마이닝 기법을 활용한 국내 청소년 삶의 만족도 분석에 관한 실증연구: 의사결정나무 기법을 중심으로 (Analysis of Korean Adolescents' Life Satisfaction based on Public Database and Data Mining Techniques: Emphasis on Decision Tree)

  • 조현진;고건우;이건창
    • 디지털융복합연구
    • /
    • 제18권6호
    • /
    • pp.297-309
    • /
    • 2020
  • 본 연구는 국내 공공 DB에 데이터마이닝 기법인 로지스틱 회귀분석과 의사결정나무 분석을 적용하여 국내 청소년의 삶의 만족도 증진에 관한 의미 있는 의사결정 규칙을 추출하는 과정을 분석한다. 분석을 위하여 한국아동·청소년패널조사(KYCPS) 중에서 중1 패널데이터의 4~6차연도 자료인 고등학생 학년별 자료를 활용하였다. 로지스틱 회귀분석으로 추출된 영향요인은 1학년은 전체 성적 만족도, 주의집중 문제, 우울, 자아 탄력성, 애정, 과잉간섭, 학습활동, 교사관계, 2학년은 가정의 경제 수준, 건강상태, 전체 성적 만족도, 신뢰, 소외, 학습활동, 학교규칙, 교우관계, 교사 관계, 3학년은 가정의 경제 수준, 전체 성적 만족도, 우울, 자아 탄력성, 애정, 학대, 학교규칙, 교사 관계로 나타났다. 의사결정나무 기법을 적용한 결과 국내 고등학생의 삶의 만족도는 개인의 정서 문제, 학교성적, 가정의 경제적 환경, 학교적응 등에 의하여 복합적으로 영향을 받는 것으로 파악되었다.

로지스틱 회귀모형과 의사결정나무 모형을 이용한 Cochlodinium polykrikoides 적조 탐지 기법 연구 (Study on Detection Technique for Cochlodinium polykrikoides Red tide using Logistic Regression Model and Decision Tree Model)

  • 박수호;김흥민;김범규;황도현;엥흐자리갈 운자야;윤홍주
    • 한국전자통신학회논문지
    • /
    • 제13권4호
    • /
    • pp.777-786
    • /
    • 2018
  • 본 연구에서는 기계학습 기법의 한 갈래인 로지스틱 회귀모형과 의사결정나무 모형을 이용하여 인공위성 영상에서 Cochlodinium polykrikoides 적조 픽셀을 탐지하는 방법을 제안한다. 학습자료로 적조, 청수, 탁수해역에서 추출된 수출광량 분광 프로파일(918개)을 활용하였다. 전체 데이터셋의 70%를 추출하여 모형 학습에 활용하였으며, 나머지 30%를 이용하여 모형의 분류 정확도를 평가하였다. 정확도 평가 결과 로지스틱 회귀모형은 약 97%의 분류 정확도를 보였으며, 의사결정나무 모형은 약 86%의 분류 정확도를 보였다.

Crop Yield and Crop Production Predictions using Machine Learning

  • Divya Goel;Payal Gulati
    • International Journal of Computer Science & Network Security
    • /
    • 제23권9호
    • /
    • pp.17-28
    • /
    • 2023
  • Today Agriculture segment is a significant supporter of Indian economy as it represents 18% of India's Gross Domestic Product (GDP) and it gives work to half of the nation's work power. Farming segment are required to satisfy the expanding need of food because of increasing populace. Therefore, to cater the ever-increasing needs of people of nation yield prediction is done at prior. The farmers are also benefited from yield prediction as it will assist the farmers to predict the yield of crop prior to cultivating. There are various parameters that affect the yield of crop like rainfall, temperature, fertilizers, ph level and other atmospheric conditions. Thus, considering these factors the yield of crop is thus hard to predict and becomes a challenging task. Thus, motivated this work as in this work dataset of different states producing different crops in different seasons is prepared; which was further pre-processed and there after machine learning techniques Gradient Boosting Regressor, Random Forest Regressor, Decision Tree Regressor, Ridge Regression, Polynomial Regression, Linear Regression are applied and their results are compared using python programming.

데이터 마이닝을 이용한 아파트 초기계약 예측모형 개발: 위례 신도시 미분양 아파트 단지를 사례로 (Development of Forecasting Model for the Initial Sale of Apartment Using Data Mining: The Case of Unsold Apartment Complex in Wirye New Town)

  • 김지영;이상경
    • 디지털융복합연구
    • /
    • 제16권12호
    • /
    • pp.217-229
    • /
    • 2018
  • 이 연구에서는 미분양 아파트 단지의 세대별 계약 자료에 데이터 마이닝 기법인 의사결정나무, 신경망, 로지스틱 모형을 적용하여 세대별 초기계약을 예측하는 모형을 개발한다. 모형 개발에는 위례신도시 미분양 아파트 단지의 계약 자료가 이용되며, 이 자료는 훈련용 자료와 검정용 자료로 분할되어 분석에 투입된다. 훈련용 자료에서는 신경망, 의사결정나무, 로지스틱 모형 순으로 예측력이 뛰어났지만 검정용 자료에서는 로지스틱 모형이 가장 우수하게 나타났다. 이 같은 결과는 신경망이 훈련용 자료에 최적화된 모형으로 구축되면서 검정용 자료에 대한 적응성이 떨어져 나타난 결과로 판단된다. 의사결정나무와 로지스틱 모형을 병행 적용한 결과, 층수, 향, 세대 위치, 전기 및 발전기실의 소음, 청약자 거주지, 청약 종류가 초기계약에 영향을 주는 것으로 나타났다. 이는 두 가지 모형을 같이 사용하는 것이 초기계약 결정요인 발굴에 더 효과적이라는 것을 의미한다. 이 연구는 데이터 마이닝의 적용 범위를 주택 분양 예측까지 확장함으로써 융복합 분야 발전에 기여하고 있다.

회귀 모델을 활용한 철강 기업의 에너지 소비 예측 (Forecasting Energy Consumption of Steel Industry Using Regression Model)

  • Sung-Ho KANG;Hyun-Ki KIM
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제1권2호
    • /
    • pp.21-25
    • /
    • 2023
  • The purpose of this study was to compare the performance using multiple regression models to predict the energy consumption of steel industry. Specific independent variables were selected in consideration of correlation among various attributes such as CO2 concentration, NSM, Week Status, Day of week, and Load Type, and preprocessing was performed to solve the multicollinearity problem. In data preprocessing, we evaluated linear and nonlinear relationships between each attribute through correlation analysis. In particular, we decided to select variables with high correlation and include appropriate variables in the final model to prevent multicollinearity problems. Among the many regression models learned, Boosted Decision Tree Regression showed the best predictive performance. Ensemble learning in this model was able to effectively learn complex patterns while preventing overfitting by combining multiple decision trees. Consequently, these predictive models are expected to provide important information for improving energy efficiency and management decision-making at steel industry. In the future, we plan to improve the performance of the model by collecting more data and extending variables, and the application of the model considering interactions with external factors will also be considered.