• 제목/요약/키워드: Splitting Criteria

검색결과 33건 처리시간 0.024초

New Splitting Criteria for Classification Trees

  • Lee, Yung-Seop
    • Communications for Statistical Applications and Methods
    • /
    • 제8권3호
    • /
    • pp.885-894
    • /
    • 2001
  • Decision tree methods is the one of data mining techniques. Classification trees are used to predict a class label. When a tree grows, the conventional splitting criteria use the weighted average of the left and the right child nodes for measuring the node impurity. In this paper, new splitting criteria for classification trees are proposed which improve the interpretablity of trees comparing to the conventional methods. The criteria search only for interesting subsets of the data, as opposed to modeling all of the data equally well. As a result, the tree is very unbalanced but extremely interpretable.

  • PDF

A Study of Combined Splitting Rules in Regression Trees

  • 이영섭
    • Journal of the Korean Data and Information Science Society
    • /
    • 제13권1호
    • /
    • pp.97-104
    • /
    • 2002
  • Regression trees, a technique in data mining, are constructed by splitting function-a independent variable and its threshold. Lee (2002) considered one-sided purity (OSP) and one-sided extreme (OSE) splitting criteria for finding a interesting node as early as possible. But these methods cannot be crossed each other in the same tree. They are just concentrated on OSP or OSE separately in advance. In this paper, a new splitting method, which is the combination and extension of OSP and OSE, is proposed. By these combined criteria, we can select the nodes by considering both pure and extreme in the same tree. These criteria are not the generalized one of the previous criteria but another option depending on the circumstance.

  • PDF

대표적인 의사결정나무 알고리즘의 해석력 비교 (Interpretability Comparison of Popular Decision Tree Algorithms)

  • 홍정식;황근성
    • 산업경영시스템학회지
    • /
    • 제44권2호
    • /
    • pp.15-23
    • /
    • 2021
  • Most of the open-source decision tree algorithms are based on three splitting criteria (Entropy, Gini Index, and Gain Ratio). Therefore, the advantages and disadvantages of these three popular algorithms need to be studied more thoroughly. Comparisons of the three algorithms were mainly performed with respect to the predictive performance. In this work, we conducted a comparative experiment on the splitting criteria of three decision trees, focusing on their interpretability. Depth, homogeneity, coverage, lift, and stability were used as indicators for measuring interpretability. To measure the stability of decision trees, we present a measure of the stability of the root node and the stability of the dominating rules based on a measure of the similarity of trees. Based on 10 data collected from UCI and Kaggle, we compare the interpretability of DT (Decision Tree) algorithms based on three splitting criteria. The results show that the GR (Gain Ratio) branch-based DT algorithm performs well in terms of lift and homogeneity, while the GINI (Gini Index) and ENT (Entropy) branch-based DT algorithms performs well in terms of coverage. With respect to stability, considering both the similarity of the dominating rule or the similarity of the root node, the DT algorithm according to the ENT splitting criterion shows the best results.

퍼지의사결정나무 개선방법을 이용한 CRM 적용 사례 (Case Study of CRM Application Using Improvement Method of Fuzzy Decision Tree Analysis)

  • 양승정;이종태
    • 한국콘텐츠학회논문지
    • /
    • 제7권8호
    • /
    • pp.13-20
    • /
    • 2007
  • 의사결정나무는 대량의 데이터를 몇 개의 집단으로 분류하고, 미래상황을 예측하기 위해 자주 사용되는 분석기법 중의 하나이며, 각 노드에서 분할이 일어나면서 자라게 되고, 각 노드에 속하는 자료의 순수도가 효과적으로 증가하도록 진행된다. 또한 의사결정나무를 생성하는 과정에서 필요 이상의 가지(leaves)를 갖게 되면 노드의 분할을 정지하거나, 분류성능 향상에 큰 도움이 되지 못하는 가지를 잘라내게 된다. 이러한 가지치기의 결과로 의사결정나무의 형태가 변하게 되는데 이는 기존의 가지분할이 효율적이지 않았음을 의미하는 것이다. 본 연구에서는 가지치기의 교정뿐 아니라 새로운 분할과정을 혼합한 우수한 의사결정나무 추출 방법을 제안한다. 특히, 새로운 분할 노드의 선택에 있어 퍼지이론을 적용하여 분할의 효과성을 제고할 수 있는 방법을 제시하고자 한다.

소성 가공의 유한 요소 해석을 위한 자동 요소망 생성 (Automated Mesh Generation For Finite Element Analysis In Metal Forming)

  • 이상훈;오수익
    • 한국소성가공학회:학술대회논문집
    • /
    • 한국소성가공학회 1997년도 추계학술대회논문집
    • /
    • pp.17-23
    • /
    • 1997
  • In the two-dimensional Finite Element Method for forming simulation, mesh generation and remeshing process are very significant. In this paper, using the modified splitting mesh generation algorithm, we can overcome the limitation of existing techniques and acquire mesh, which has optimal mesh density. A modified splitting algorithm for automatically generating quadrilateral mesh within a complex domain is described. Unnecessary meshing process for density representation is removed. Especially, during the mesh generation with high gradient density like as shear band representation, the modified mesh density scheme, which will generate quadrilateral mesh with the minimized error, which takes effect on FEM solver, is introduced.

  • PDF

회귀의사결정나무에서의 관심노드 찾는 분류 기준법 (Interesting Node Finding Criteria for Regression Trees)

  • 이영섭
    • 응용통계연구
    • /
    • 제16권1호
    • /
    • pp.45-53
    • /
    • 2003
  • 의사결정나무 분석 기법 중 하나인 회귀의사결정나무는 연속적인 반응변수를 예측할 때 사용된다. 나무 구조를 형성할 때, 전통적인 분류 기준법은 왼쪽과 오른쪽 자식노드의 불순도를 결합하여 이루어진다. 그러나 본 논문에서 제안하는 새로운 분류 기준법은 관심있는 한쪽만 선택하고 다른 나머지 자식노드는 큰 관심이 없어 무시함으로써 더 이상 결합하여 구하는 것이 아니다. 따라서 나무 구조는 불균형적일 수 있으나 이해하기가 쉽다. 즉, 관심있는 부분집합을 가능한 한 빨리 찾음으로써 단지 몇 개의 조건으로 쉽게 표현할 수 있으며, 정확도는 다소 떨어지지만 설명력은 아주 높다.

사각형 유한요소망의 자동생성 (Automatic Mesh Generation with Quadrilateral Finite Elements)

  • 채수원;신보성;민중기
    • 대한기계학회논문집
    • /
    • 제17권12호
    • /
    • pp.2995-3006
    • /
    • 1993
  • An automatic mesh generation scheme has been developed for finite element analysis with two-dimensional, quadrilateral elements. The basic strategies of the method are to transform the analysis domain into loops with key nodes and the loops are recursively subdivided into subloops with the use of best split lines. Finally by using the basic loop operators, the meshes are completed. In this algorithm an eight-node loop operator is proposed, which is useful in the area where the change of element size is large and the splitting criteria for subdividing the loops have also been modified to the existing algorithms. Lines, arcs, and cubic spline curves are used to define the boundaries of analysis domain. Sample meshes for several geometries are presented to demonstrate the robustness of the algorithm.

Estimation of splitting tensile strength of modified recycled aggregate concrete using hybrid algorithms

  • Zhu, Yirong;Huang, Lihua;Zhang, Zhijun;Bayrami, Behzad
    • Steel and Composite Structures
    • /
    • 제44권3호
    • /
    • pp.389-406
    • /
    • 2022
  • Recycling concrete construction waste is an encouraging step toward green and sustainable building. A lot of research has been done on recycled aggregate concretes (RACs), but not nearly as much has been done on concrete made with recycled aggregate. Recycled aggregate concrete, on the other hand, has been found to have a lower mechanical productivity compared to conventional one. Accurately estimating the mechanical behavior of the concrete samples is a most important scientific topic in civil, structural, and construction engineering. This may prevent the need for excess time and effort and lead to economic considerations because experimental studies are often time-consuming, costly, and troublous. This study presents a comprehensive data-mining-based model for predicting the splitting tensile strength of recycled aggregate concrete modified with glass fiber and silica fume. For this purpose, first, 168 splitting tensile strength tests under different conditions have been performed in the laboratory, then based on the different conditions of each experiment, some variables are considered as input parameters to predict the splitting tensile strength. Then, three hybrid models as GWO-RF, GWO-MLP, and GWO-SVR, were utilized for this purpose. The results showed that all developed GWO-based hybrid predicting models have good agreement with measured experimental results. Significantly, the GWO-RF model has the best accuracy based on the model performance assessment criteria for training and testing data.

분할법과 평균거리 개념에 의한 용접 결함 표현 방법 (The Weld Defects Expression Method by the Concept of Segment Splitting Method and Mean Distance)

  • 이정익;고병갑
    • 한국공작기계학회논문집
    • /
    • 제16권2호
    • /
    • pp.37-43
    • /
    • 2007
  • In this paper, laser vision sensor is used to detect some defects any $co_{2}$ welded specimen in hardware. But, as the best expression of defects of welded specimen, the concept of segment splitting method and mean distance are introduced in software. The developed GUI software is used for deriding whether any welded specimen makes as proper shape or detects in real time. The criteria are based upon ISO 5817 as limits of imperfections in metallic fusion welds.