• Title/Summary/Keyword: Decision Tree Algorithm

검색결과 443건 처리시간 0.029초

Decision Tree 를 이용한 Machine Learning (Machine Learning by Decision Tree Algorithm)

  • 정원찬;최경수;김정호
    • 전자통신동향분석
    • /
    • 제8권4호
    • /
    • pp.205-211
    • /
    • 1993
  • 필요한 자료의 제공만으로 컴퓨터 스스로 논리 체계를 세워 나가는 Machine Learning은 인공 지능의 한 분야로서 여러 방면에서 활발한 연구가 진행되고 있다. 본 고에서는 Machine Learning 의 기본적인 여러가지 방식 중의 하나인 Decision Tree 방법을 소개하고 문제점 및 연구 방향을 서술한다.

의사결정나무를 활용한 업종별 버스 교통사고 특성 연구 (Study on the Characteristics of Bus Traffic Accidents by Types Using the Decision Tree)

  • 박원일;김경현;한음;박상민;윤일수
    • 한국도로학회논문집
    • /
    • 제18권5호
    • /
    • pp.105-115
    • /
    • 2016
  • PURPOSES : This study was initiated to analyze the characteristics of bus traffic accidents, by bus types, using the decision tree in order to establish customized safety alternatives by bus types, including the intra-city bus, rural area bus, and inter-city bus. METHODS : In this study, the major elements involved in bus traffic accidents were identified using decision trees and CHAID algorithm. The decision tree was used to identify the characteristics of major elements influencing bus traffic accidents. In addition, the CHAID algorithm was applied to branch the decision trees. RESULTS : The number of casualties and severe injuries are high in bus accidents involving pedestrians, bicycles, motorcycles, etc. In the case of light injury caused by bus accidents, different results are found. In the case of intra-city bus accidents, the probability of light injury is of 77.2% when boarding a non-owned car and breaching of duty to drive safely are involved. In the case of rural area bus accidents, the elements showing the highest probability of light injury are boarding an owned car, vehicle-to-vehicle accidents, and breaching of duty to drive safely. In the case of intra-city bus accidents, boarding owned car, streets, and vehicle-to-vehicle accidents work as the critical elements. CONCLUSIONS : In this study, the bus accident data were categorized by bus types, and then the influential elements were identified using decision trees. As a result, the characteristics of bus accidents were found to be different depending on bus types. The findings in this study are expected to be utilized in establishing effective alternatives to reduce bus accidents.

A review of tree-based Bayesian methods

  • Linero, Antonio R.
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.543-559
    • /
    • 2017
  • Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

유전 알고리즘을 이용한 선형 결정 함수의 결정 및 이진 결정 트리 구성에의 적용 (A determination of linear decision function using GA and its application to the construction of binary decision tree)

  • 정순원;박귀태
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1996년도 추계학술대회 학술발표 논문집
    • /
    • pp.271-274
    • /
    • 1996
  • In this paper a new determination scheme of linear decision function is proposed. In this scheme, the weights in linear decision function is obtained by genetic algorithm. The result considering balance between clusters as well as classification error can be obtained by properly selecting the fitness function of genetic algorithm in determination of linear decision function and this has the merit in applying this scheme to the construction of binary decision tree. The proposed scheme is applied to the artificial two dimensional data and real multi dimensional data. Experimental results show the usefulness of the proposed scheme.

  • PDF

의사결정나무 기법을 이용한 노인들의 자살생각 예측모형 및 의사결정 규칙 개발 (A Development of Suicidal Ideation Prediction Model and Decision Rules for the Elderly: Decision Tree Approach)

  • 김덕현;유동희;정대율
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제28권3호
    • /
    • pp.249-276
    • /
    • 2019
  • Purpose The purpose of this study is to develop a prediction model and decision rules for the elderly's suicidal ideation based on the Korean Welfare Panel survey data. By utilizing this data, we obtained many decision rules to predict the elderly's suicide ideation. Design/methodology/approach This study used classification analysis to derive decision rules to predict on the basis of decision tree technique. Weka 3.8 is used as the data mining tool in this study. The decision tree algorithm uses J48, also known as C4.5. In addition, 66.6% of the total data was divided into learning data and verification data. We considered all possible variables based on previous studies in predicting suicidal ideation of the elderly. Finally, 99 variables including the target variable were used. Classification analysis was performed by introducing sampling technique through backward elimination and data balancing. Findings As a result, there were significant differences between the data sets. The selected data sets have different, various decision tree and several rules. Based on the decision tree method, we derived the rules for suicide prevention. The decision tree derives not only the rules for the suicidal ideation of the depressed group, but also the rules for the suicidal ideation of the non-depressed group. In addition, in developing the predictive model, the problem of over-fitting due to the data imbalance phenomenon was directly identified through the application of data balancing. We could conclude that it is necessary to balance the data on the target variables in order to perform the correct classification analysis without over-fitting. In addition, although data balancing is applied, it is shown that performance is not inferior in prediction rate when compared with a biased prediction model.

A Study on Split Variable Selection Using Transformation of Variables in Decision Trees

  • Chung, Sung-S.;Lee, Ki-H.;Lee, Seung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권2호
    • /
    • pp.195-205
    • /
    • 2005
  • In decision tree analysis, C4.5 and CART algorithm have some problems of computational complexity and bias on variable selection. But QUEST algorithm solves these problems by dividing the step of variable selection and split point selection. When input variables are continuous, QUEST algorithm uses ANOVA F-test under the assumption of normality and homogeneity of variances. In this paper, we investigate the influence of violation of normality assumption and effect of the transformation of variables in the QUEST algorithm. In the simulation study, we obtained the empirical powers of variable selection and the empirical bias of variable selection after transformation of variables having various type of underlying distributions.

  • PDF

Feature Selection and Hyper-Parameter Tuning for Optimizing Decision Tree Algorithm on Heart Disease Classification

  • Tsehay Admassu Assegie;Sushma S.J;Bhavya B.G;Padmashree S
    • International Journal of Computer Science & Network Security
    • /
    • 제24권2호
    • /
    • pp.150-154
    • /
    • 2024
  • In recent years, there are extensive researches on the applications of machine learning to the automation and decision support for medical experts during disease detection. However, the performance of machine learning still needs improvement so that machine learning model produces result that is more accurate and reliable for disease detection. Selecting the hyper-parameter that could produce the possible maximum classification accuracy on medical dataset is the most challenging task in developing decision support systems with machine learning algorithms for medical dataset classification. Moreover, selecting the features that best characterizes a disease is another challenge in developing machine-learning model with better classification accuracy. In this study, we have proposed an optimized decision tree model for heart disease classification by using heart disease dataset collected from kaggle data repository. The proposed model is evaluated and experimental test reveals that the performance of decision tree improves when an optimal number of features are used for training. Overall, the accuracy of the proposed decision tree model is 98.2% for heart disease classification.

침입탐지시스템에서 하이브리드 특징 선택에 관한 연구 (A Study on Hybrid Feature Selection in Intrusion Detection System)

  • 한명묵
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2006년도 춘계학술대회 학술발표 논문집 제16권 제1호
    • /
    • pp.279-282
    • /
    • 2006
  • 네트워크를 기반으로 한 컴퓨터 시스템이 현대 사회에 있어서 더욱 더 불가결한 역할을 하는 것에 따라, 네트워크 기반 컴퓨터 시스템은 침입자의 침입 목표가 되고 있다. 이를 보호하기 위한 침입탐지시스템(Intrusion Detection System : IDS)은 점차 중요한 기술이 되었다. 침입탐지시스템에서 패턴들을 분석한 후 정상/비정상을 판단 및 예측하기 위해서는 초기단계인 특징추출이나 선택이 매우 중요한 부분이 되고 있다. 본 논문에서는 IDS에서 중요한 부분인 feature selection을 Data Mining 기법인 Genetic Algorithm(GA)과 Decision Tree(DT)를 적용해서 구현했다.

  • PDF

Assessing Factors Linked with Ozone Exceedances in Seoul, Korea through a Decision Tree Algorithm

  • Park, Sun-Kyoung
    • 한국환경과학회지
    • /
    • 제25권2호
    • /
    • pp.191-216
    • /
    • 2016
  • Since prolonged exposure to elevated ozone ($O_3$) concentrations is known to be harmful to human health, appropriate control strategies for ozone are needed for the non-attainment area such as Seoul, Korea. The goal of this research is to assess factors linked with the 1-hour ozone exceedance through a decision tree model. Since ozone is a secondary pollutant, lag times between ozone and explanatory variables for ozone formation are taken into account in the model to improve the accuracy of the simulation. Results show that while ozone concentrations of the previous day and $NO_2$ concentrations in the morning are major drivers for ozone exceedances in the early afternoon, meteorology plays more important role for ozone exceedances in the late afternoon. Results also show that a selection of lag times between ozone and explanatory variables affect the accuracy of predicting 1-hour ozone exceedances. The result analyzed in this study can be used for developing control strategies of ozone in Seoul, Korea.

CART 알고리즘 기반의 의사결정트리 기법을 이용한 규칙기반 전문가 시스템 구축 방법론 (The Construction Methodology of a Rule-based Expert System using CART-based Decision Tree Method)

  • 고윤석
    • 한국전자통신학회논문지
    • /
    • 제6권6호
    • /
    • pp.849-854
    • /
    • 2011
  • 시스템 이벤트들로부터 그 파급효과를 최소화하기 위해서는 실시간 조건에 기반한 규칙기반 전문가 시스템이 매우 효과적인데, 그 이벤트가 다양하고 부하조건이 매우 가변적이기 때문에 규칙 기반 전문가 시스템을 구축하기가 쉽지 않다. 따라서 본 연구에서는 CART 알고리즘 기반의 의사결정 트리 기법을 적용하여 상정사고 사례들로부터 규칙기반 전문가 시스템을 구축하는 방법론에 대해서 연구하고자 한다.