• Title/Summary/Keyword: 의사결정나무 모형

Search Result 228, Processing Time 0.025 seconds

Penalized quantile regression tree (벌점화 분위수 회귀나무모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1361-1371
    • /
    • 2016
  • Quantile regression provides a variety of useful statistical information to examine how covariates influence the conditional quantile functions of a response variable. However, traditional quantile regression (which assume a linear model) is not appropriate when the relationship between the response and the covariates is a nonlinear. It is also necessary to conduct variable selection for high dimensional data or strongly correlated covariates. In this paper, we propose a penalized quantile regression tree model. The split rule of the proposed method is based on residual analysis, which has a negligible bias to select a split variable and reasonable computational cost. A simulation study and real data analysis are presented to demonstrate the satisfactory performance and usefulness of the proposed method.

The Prediction Model for Self-Reported Voice Problem Using a Decision Tree Model (의사결정나무 모형을 이용한 주관적 음성장애 예측모형)

  • Byeon, Haewon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.7
    • /
    • pp.3368-3373
    • /
    • 2013
  • The purpose of this study was to analyze the risk factors of self-reported voice problem. Data were from the Korea National Health and Nutritional Examination Survey 2008. Subjects were 3,600 persons (1,501 men, 2,099 women) aged 19 years and older. A prediction model was developed by the use of a exhaustive CHAID (Chi Squared Automatic Interaction Detection) algorism of decision tree model. In the decision tree analysis, pain and discomfort during the last 2 weeks, age, the longest occupation and thyroid disorders was significantly associated with self-reported voice problem. The findings of associated factors suggest potential ways of targeting counseling and prevention efforts to control self-reported voice problem.

Comparison of Methodologies for Characterizing Pedestrian-Vehicle Collisions (보행자-차량 충돌사고 특성분석 방법론 비교 연구)

  • Choi, Saerona;Jeong, Eunbi;Oh, Cheol
    • Journal of Korean Society of Transportation
    • /
    • v.31 no.6
    • /
    • pp.53-66
    • /
    • 2013
  • The major purpose of this study is to evaluate methodologies to predict the injury severity of pedestrian-vehicle collisions. Methodologies to be evaluated and compared in this study include Binary Logistic Regression(BLR), Ordered Probit Model(OPM), Support Vector Machine(SVM) and Decision Tree(DT) method. Valuable insights into applying methodologies to analyze the characteristics of pedestrian injury severity are derived. For the purpose of identifying causal factors affecting the injury severity, statistical approaches such as BLR and OPM are recommended. On the other hand, to achieve better prediction performance, heuristic approaches such as SVM and DT are recommended. It is expected that the outcome of this study would be useful in developing various countermeasures for enhancing pedestrian safety.

A Study for the Development of a Bid Price Rate Prediction Model (낙찰률 예측 모형에 관한 연구)

  • Choi, Bo-Seung;Kang, Hyun-Cheol;Han, Sang-Tae
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.1
    • /
    • pp.23-34
    • /
    • 2011
  • Property auctions have become a new method for real estate investment because the property auction market grows in tandem with the growth of the real estate market. This study focused on the statistical model for predicting bid price rates which is the main index for participants in the real estate auction market. For estimating the monthly bid price rate, we proposed a new method to make up for the mean of regions and terms as well as to reduce the prediction error using a decision tree analysis. We also proposed a linear regression model to predict a bid price rate for individual auction property. We applied the proposed model to apartment auction property and tried to predict the bid price rate as well as categorize individual auction property into an auction grade.

Predictability of emergency water supply using machine learning-based classification techniques (딥러닝 기반 분류기법을 활용한 비상급수 예측 가능성 검토)

  • Oh, Yeoung Rok;Jun, Kyung Soo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.303-303
    • /
    • 2022
  • 기후변화로 인해 기상이변 현상의 발생 빈도가 잦아지며 가뭄 방생 빈도 또한 증가하는 추세이다. 이에 따라 가뭄 피해를 경감하는 선제적 가뭄대응체계 구축과 가뭄이 발생한 이후에 피해를 최소화하기 위한 연구가 필요하다. 본 연구에서는 가뭄피해 여부를 이진분류 방법으로 접근하여 예측 가능성을 검토하였다. 가뭄피해 여부는 비상급수(제한급수,운반급수) 자료를 이용하여 비상급수가 시행된 경우를 가뭄피해 발생으로 보고, 비상급수가 시행되지 않은 경우를 피해 없는 사례로 구분하였다. 기상 상황 변수로는 강수량, 기온, 상대습도 등을 이용하였다. 또한 지역별 연간 총 급수량 대비 저수량을 이용하여 지역별 현 상황을 고려하고자 하였다. 의사결정나무를 이용하여 분석한 결과 불균형 클래스 문제의 정확도에 주로 이용되는 오차행렬의 정확도가 0.95 이상으로 나타났으며, F1-Score는 약 0.5 로 나타났다. 이는 예측 결과 전체를 대상으로 했을 경우 95 %의 확률로 가뭄피해 여부를 구분할 수 있는 것을 나타내며, 가뭄 피해만을 대상으로 했을 경우 50 %의 정확도를 타나낸다. 그러나 본 연구에서는 비상급수를 유발하는 충분한 환경적 변수를 고려하지 않았고, 다양한 딥러닝 모형을 분석하지 않았다. 따라서 비상급수를 유발하는 요인을 충분히 고려하고 딥러닝 기법을 고도화 한다면 모형의 정확도 개선을 기대할 수 있을 것으로 판단된다.

  • PDF

Identification of major risk factors association with respiratory diseases by data mining (데이터마이닝 모형을 활용한 호흡기질환의 주요인 선별)

  • Lee, Jea-Young;Kim, Hyun-Ji
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.373-384
    • /
    • 2014
  • Data mining is to clarify pattern or correlation of mass data of complicated structure and to predict the diverse outcomes. This technique is used in the fields of finance, telecommunication, circulation, medicine and so on. In this paper, we selected risk factors of respiratory diseases in the field of medicine. The data we used was divided into respiratory diseases group and health group from the Gyeongsangbuk-do database of Community Health Survey conducted in 2012. In order to select major risk factors, we applied data mining techniques such as neural network, logistic regression, Bayesian network, C5.0 and CART. We divided total data into training and testing data, and applied model which was designed by training data to testing data. By the comparison of prediction accuracy, CART was identified as best model. Depression, smoking and stress were proved as the major risk factors of respiratory disease.

A study on the behavior of cosmetic customers (화장품구매 자료를 통한 고객 구매행태 분석)

  • Cho, Dae-Hyeon;Kim, Byung-Soo;Seok, Kyung-Ha;Lee, Jong-Un;Kim, Jong-Sung;Kim, Sun-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.4
    • /
    • pp.615-627
    • /
    • 2009
  • In micro marketing promotion, it is important to know the behavior of customers. In this study we are interested in the forecasting of repurchase of customers from customers' behavior. By analyzing the cosmetic transaction data we derive some variables which play an important role in the knowledge of the customers' behavior and in the modeling of repurchase. As modeling tools we use the decision tree, logistic regression and neural network model. Finally we decide to use the decision tree as a final model since it yields the smallest RASE (root average squared error) and the greatest correct classification rate.

  • PDF

A Recommending System for Care Plan(Res-CP) in Long-Term Care Insurance System (데이터마이닝 기법을 활용한 노인장기요양급여 권고모형 개발)

  • Han, Eun-Jeong;Lee, Jung-Suk;Kim, Dong-Geon;Ka, Im-Ok
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1229-1237
    • /
    • 2009
  • In the long-term care insurance(LTCI) system, the question of how to provide the most appropriate care has become a major issue for the elderly, their family, and for policy makers. To help beneficiaries use LTC services appropriately to their needs of care, National Health Insurance Corporation(NHIC) provide them with the individualized care plan, named the Long-term Care User Guide. It includes recommendations for beneficiaries' most appropriate type of care. The purpose of this study is to develop a recommending system for care plan(Res-CP) in LTCI system. We used data set for Long-term Care User Guide in the 3rd long-term care insurance pilot programs. To develop the model, we tested four models, including a decision-tree model in data-mining, a logistic regression model, and a boosting and boosting techniques in an ensemble model. A decision-tree model was selected to describe the Res-CP, because it may be easy to explain the algorithm of Res-CP to the working groups. Res-CP might be useful in an evidence-based care planning in LTCI system and may contribute to support use of LTC services efficiently.

Particulate Matter Prediction using Quantile Boosting (분위수 부스팅을 이용한 미세먼지 농도 예측)

  • Kwon, Jun-Hyeon;Lim, Yaeji;Oh, Hee-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.83-92
    • /
    • 2015
  • Concerning the national health, it is important to develop an accurate prediction method of atmospheric particulate matter (PM) because being exposed to such fine dust can trigger not only respiratory diseases as well as dermatoses, ophthalmopathies and cardiovascular diseases. The National Institute of Environmental Research (NIER) employs a decision tree to predict bad weather days with a high PM concentration. However, the decision tree method (even with the inherent unstableness) cannot be a suitable model to predict bad weather days which represent only 4% of the entire data. In this paper, while presenting the inaccuracy and inappropriateness of the method used by the NIER, we present the utility of a new prediction model which adopts boosting with quantile loss functions. We evaluate the performance of the new method over various ${\tau}$-value's and justify the proposed method through comparison.

A Study on the Node Split in Decision Tree with Multivariate Target Variables (다변량 목표변수를 갖는 의사결정나무의 노드분리에 관한 연구)

  • Kim, Seong-Jun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.386-390
    • /
    • 2003
  • Data mining is a process of discovering useful patterns for decision making from an amount of data. It has recently received much attention in a wide range of business and engineering fields. Classifying a group into subgroups is one of the most important subjects in data mining. Tree-based methods, known as decision trees, provide an efficient way to finding the classification model. The primary concern in tree learning is to minimize a node impurity, which is evaluated using a target variable in the data set. However, there are situations where multiple target variable should be taken into account, for example, such as manufacturing process monitoring, marketing science, and clinical and health analysis. The purpose of this article is to present some methods for measuring the node impurity, which are applicable to data sets with multivariate target variables. For illustration, a numerical cxample is given with discussion.