• Title/Summary/Keyword: Splitting Criteria

Search Result 33, Processing Time 0.027 seconds

New Splitting Criteria for Classification Trees

  • Lee, Yung-Seop
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.885-894
    • /
    • 2001
  • Decision tree methods is the one of data mining techniques. Classification trees are used to predict a class label. When a tree grows, the conventional splitting criteria use the weighted average of the left and the right child nodes for measuring the node impurity. In this paper, new splitting criteria for classification trees are proposed which improve the interpretablity of trees comparing to the conventional methods. The criteria search only for interesting subsets of the data, as opposed to modeling all of the data equally well. As a result, the tree is very unbalanced but extremely interpretable.

  • PDF

A Study of Combined Splitting Rules in Regression Trees

  • Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.1
    • /
    • pp.97-104
    • /
    • 2002
  • Regression trees, a technique in data mining, are constructed by splitting function-a independent variable and its threshold. Lee (2002) considered one-sided purity (OSP) and one-sided extreme (OSE) splitting criteria for finding a interesting node as early as possible. But these methods cannot be crossed each other in the same tree. They are just concentrated on OSP or OSE separately in advance. In this paper, a new splitting method, which is the combination and extension of OSP and OSE, is proposed. By these combined criteria, we can select the nodes by considering both pure and extreme in the same tree. These criteria are not the generalized one of the previous criteria but another option depending on the circumstance.

  • PDF

Interpretability Comparison of Popular Decision Tree Algorithms (대표적인 의사결정나무 알고리즘의 해석력 비교)

  • Hong, Jung-Sik;Hwang, Geun-Seong
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.2
    • /
    • pp.15-23
    • /
    • 2021
  • Most of the open-source decision tree algorithms are based on three splitting criteria (Entropy, Gini Index, and Gain Ratio). Therefore, the advantages and disadvantages of these three popular algorithms need to be studied more thoroughly. Comparisons of the three algorithms were mainly performed with respect to the predictive performance. In this work, we conducted a comparative experiment on the splitting criteria of three decision trees, focusing on their interpretability. Depth, homogeneity, coverage, lift, and stability were used as indicators for measuring interpretability. To measure the stability of decision trees, we present a measure of the stability of the root node and the stability of the dominating rules based on a measure of the similarity of trees. Based on 10 data collected from UCI and Kaggle, we compare the interpretability of DT (Decision Tree) algorithms based on three splitting criteria. The results show that the GR (Gain Ratio) branch-based DT algorithm performs well in terms of lift and homogeneity, while the GINI (Gini Index) and ENT (Entropy) branch-based DT algorithms performs well in terms of coverage. With respect to stability, considering both the similarity of the dominating rule or the similarity of the root node, the DT algorithm according to the ENT splitting criterion shows the best results.

Case Study of CRM Application Using Improvement Method of Fuzzy Decision Tree Analysis (퍼지의사결정나무 개선방법을 이용한 CRM 적용 사례)

  • Yang, Seung-Jeong;Rhee, Jong-Tae
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.8
    • /
    • pp.13-20
    • /
    • 2007
  • Decision tree is one of the most useful analysis methods for various data mining functions, including prediction, classification, etc, from massive data. Decision tree grows by splitting nodes, during which the purity increases. It is needed to stop splitting nodes when the purity does not increase effectively or new leaves does not contain meaningful number of records. Pruning is done if a branch does not show certain level of performance. By pruning, the structure of decision tree is changed and it is implied that the previous splitting of the parent node was not effective. It is also implied that the splitting of the ancestor nodes were not effective and the choices of attributes and criteria in splitting them were not successful. It should be noticed that new attributes or criteria might be selected to split such nodes for better tries. In this paper, we suggest a procedure to modify decision tree by Fuzzy theory and splitting as an integrated approach.

Automated Mesh Generation For Finite Element Analysis In Metal Forming (소성 가공의 유한 요소 해석을 위한 자동 요소망 생성)

  • 이상훈;오수익
    • Proceedings of the Korean Society for Technology of Plasticity Conference
    • /
    • 1997.10a
    • /
    • pp.17-23
    • /
    • 1997
  • In the two-dimensional Finite Element Method for forming simulation, mesh generation and remeshing process are very significant. In this paper, using the modified splitting mesh generation algorithm, we can overcome the limitation of existing techniques and acquire mesh, which has optimal mesh density. A modified splitting algorithm for automatically generating quadrilateral mesh within a complex domain is described. Unnecessary meshing process for density representation is removed. Especially, during the mesh generation with high gradient density like as shear band representation, the modified mesh density scheme, which will generate quadrilateral mesh with the minimized error, which takes effect on FEM solver, is introduced.

  • PDF

Interesting Node Finding Criteria for Regression Trees (회귀의사결정나무에서의 관심노드 찾는 분류 기준법)

  • 이영섭
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.1
    • /
    • pp.45-53
    • /
    • 2003
  • One of decision tree method is regression trees which are used to predict a continuous response. The general splitting criteria in tree growing are based on a compromise in the impurity between the left and the right child node. By picking or the more interesting subsets and ignoring the other, the proposed new splitting criteria in this paper do not split based on a compromise of child nodes anymore. The tree structure by the new criteria might be unbalanced but plausible. It can find a interesting subset as early as possible and express it by a simple clause. As a result, it is very interpretable by sacrificing a little bit of accuracy.

Automatic Mesh Generation with Quadrilateral Finite Elements (사각형 유한요소망의 자동생성)

  • 채수원;신보성;민중기
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.17 no.12
    • /
    • pp.2995-3006
    • /
    • 1993
  • An automatic mesh generation scheme has been developed for finite element analysis with two-dimensional, quadrilateral elements. The basic strategies of the method are to transform the analysis domain into loops with key nodes and the loops are recursively subdivided into subloops with the use of best split lines. Finally by using the basic loop operators, the meshes are completed. In this algorithm an eight-node loop operator is proposed, which is useful in the area where the change of element size is large and the splitting criteria for subdividing the loops have also been modified to the existing algorithms. Lines, arcs, and cubic spline curves are used to define the boundaries of analysis domain. Sample meshes for several geometries are presented to demonstrate the robustness of the algorithm.

Estimation of splitting tensile strength of modified recycled aggregate concrete using hybrid algorithms

  • Zhu, Yirong;Huang, Lihua;Zhang, Zhijun;Bayrami, Behzad
    • Steel and Composite Structures
    • /
    • v.44 no.3
    • /
    • pp.389-406
    • /
    • 2022
  • Recycling concrete construction waste is an encouraging step toward green and sustainable building. A lot of research has been done on recycled aggregate concretes (RACs), but not nearly as much has been done on concrete made with recycled aggregate. Recycled aggregate concrete, on the other hand, has been found to have a lower mechanical productivity compared to conventional one. Accurately estimating the mechanical behavior of the concrete samples is a most important scientific topic in civil, structural, and construction engineering. This may prevent the need for excess time and effort and lead to economic considerations because experimental studies are often time-consuming, costly, and troublous. This study presents a comprehensive data-mining-based model for predicting the splitting tensile strength of recycled aggregate concrete modified with glass fiber and silica fume. For this purpose, first, 168 splitting tensile strength tests under different conditions have been performed in the laboratory, then based on the different conditions of each experiment, some variables are considered as input parameters to predict the splitting tensile strength. Then, three hybrid models as GWO-RF, GWO-MLP, and GWO-SVR, were utilized for this purpose. The results showed that all developed GWO-based hybrid predicting models have good agreement with measured experimental results. Significantly, the GWO-RF model has the best accuracy based on the model performance assessment criteria for training and testing data.

The Weld Defects Expression Method by the Concept of Segment Splitting Method and Mean Distance (분할법과 평균거리 개념에 의한 용접 결함 표현 방법)

  • Lee, Jeong-Ick;Koh, Byung-Kab
    • Transactions of the Korean Society of Machine Tool Engineers
    • /
    • v.16 no.2
    • /
    • pp.37-43
    • /
    • 2007
  • In this paper, laser vision sensor is used to detect some defects any $co_{2}$ welded specimen in hardware. But, as the best expression of defects of welded specimen, the concept of segment splitting method and mean distance are introduced in software. The developed GUI software is used for deriding whether any welded specimen makes as proper shape or detects in real time. The criteria are based upon ISO 5817 as limits of imperfections in metallic fusion welds.