• Title/Summary/Keyword: Tree-based algorithms

Search Result 385, Processing Time 0.024 seconds

Interpretability Comparison of Popular Decision Tree Algorithms (대표적인 의사결정나무 알고리즘의 해석력 비교)

  • Hong, Jung-Sik;Hwang, Geun-Seong
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.2
    • /
    • pp.15-23
    • /
    • 2021
  • Most of the open-source decision tree algorithms are based on three splitting criteria (Entropy, Gini Index, and Gain Ratio). Therefore, the advantages and disadvantages of these three popular algorithms need to be studied more thoroughly. Comparisons of the three algorithms were mainly performed with respect to the predictive performance. In this work, we conducted a comparative experiment on the splitting criteria of three decision trees, focusing on their interpretability. Depth, homogeneity, coverage, lift, and stability were used as indicators for measuring interpretability. To measure the stability of decision trees, we present a measure of the stability of the root node and the stability of the dominating rules based on a measure of the similarity of trees. Based on 10 data collected from UCI and Kaggle, we compare the interpretability of DT (Decision Tree) algorithms based on three splitting criteria. The results show that the GR (Gain Ratio) branch-based DT algorithm performs well in terms of lift and homogeneity, while the GINI (Gini Index) and ENT (Entropy) branch-based DT algorithms performs well in terms of coverage. With respect to stability, considering both the similarity of the dominating rule or the similarity of the root node, the DT algorithm according to the ENT splitting criterion shows the best results.

Performance Evaluation of Anti-collision Algorithms in the Low-cost RFID System (저비용 RFID 시스템에서의 충돌방지 알고리즘에 대한 성능평가)

  • Quan Cheng-hao;Hong Won-kee;Lee Yong-doo;Kim Hie-cheol
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.1B
    • /
    • pp.17-26
    • /
    • 2005
  • RFID(Radio Frequency IDentification) is a technology that automatically identifies objects attached with electronic tags by using radio wave. For the implementation of an RFID system, an anti-collision algorithm is required to identify several tags within the RFID reader's range. Few researches report the performance trade-off among anti-collision algorithms in terms of the communications traffic between the reader and tags, the identification speed, and so on. In this paper, we analyze both tree based memoryless algorithms and slot aloha based algorithms that comprise of almost every class of existing anti-collision algorithms. To compare the performance, we evaluated each class of anti-collision algorithms with respect to low-cost RFID system with 96-bit EPC(Electronic Product Code). The results show that the collision tracking tree algorithm outperforms current tree based and aloha based algorithms by at least 2 times to 50 times.

Hybrid Tag Anti-Collision Algorithms in RFID System (RFID 시스템에서 하이브리드 태그 충돌 방지 알고리즘)

  • Shin, Jae-Dong;Yeo, Sang-Soo;Cho, Jung-Sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.4A
    • /
    • pp.358-364
    • /
    • 2007
  • RFID, Radio Frequency Identification, technology is a contactless automatic identification technology using radio frequency. For this RFID technology to be widely spread, the problem of multiple tag identification, which a reader identifies a multiple number of tags in a very short time, has to be solved. Up to the present, many anti-collision algorithms have been developed in order to solve this problem, and those can be largely divided into ALOHA based algorithm and tree based algorithm. In this paper, two new anti-collision algorithms combining the characteristics of these two categories are presented. And the performances of the two algorithms are compared and evaluated in comparison with those of typical anti-collision algorithms: 18000-6 Type A, Type B, Type C, and query tree algorithm.

A Spanning Tree-based Representation and Its Application to the MAX CUT Problem (신장 트리 기반 표현과 MAX CUT 문제로의 응용)

  • Hyun, Soohwan;Kim, Yong-Hyuk;Seo, Kisung
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.18 no.12
    • /
    • pp.1096-1100
    • /
    • 2012
  • Most of previous genetic algorithms for solving graph problems have used a vertex-based encoding. We proposed an edge encoding based new genetic algorithm using a spanning tree. Contrary to general edge-based encoding, a spanning tree-based encoding represents only feasible partitions. As a target problem, we adopted the MAX CUT problem, which is well known as a representative NP-hard problem, and examined the performance of the proposed genetic algorithm. The experiments on benchmark graphs are executed and compared with vertex-based encoding. Performance improvements of the spanning tree-based encoding on sparse graphs was observed.

Use of Tree Traversal Algorithms for Chain Formation in the PEGASIS Data Gathering Protocol for Wireless Sensor Networks

  • Meghanathan, Natarajan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.3 no.6
    • /
    • pp.612-627
    • /
    • 2009
  • The high-level contribution of this paper is to illustrate the effectiveness of using graph theory tree traversal algorithms (pre-order, in-order and post-order traversals) to generate the chain of sensor nodes in the classical Power Efficient-Gathering in Sensor Information Systems (PEGASIS) data aggregation protocol for wireless sensor networks. We first construct an undirected minimum-weight spanning tree (ud-MST) on a complete sensor network graph, wherein the weight of each edge is the Euclidean distance between the constituent nodes of the edge. A Breadth-First-Search of the ud-MST, starting with the node located closest to the center of the network, is now conducted to iteratively construct a rooted directed minimum-weight spanning tree (rd-MST). The three tree traversal algorithms are then executed on the rd-MST and the node sequence resulting from each of the traversals is used as the chain of nodes for the PEGASIS protocol. Simulation studies on PEGASIS conducted for both TDMA and CDMA systems illustrate that using the chain of nodes generated from the tree traversal algorithms, the node lifetime can improve as large as by 19%-30% and at the same time, the energy loss per node can be 19%-35% lower than that obtained with the currently used distance-based greedy heuristic.

Construction of UOWHF: New Parallel Domain Extender with Optimal Key Size (UOWHF 구생방법 : 최적의 키 길이를 가자는 새로운 병렬 도메인 확장기)

  • Wonil Lee;Donghoon Chang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.14 no.2
    • /
    • pp.57-68
    • /
    • 2004
  • We present a new parallel algorithm for extending the domain of a UOWHF. Our algorithm is based on non-complete l-ary tree and has the same optimal key length expansion as Shoup's which has the most efficient key length expansion known so far. Using the recent result [8], we can also prove that the key length expansion of this algorithm and Shoup's sequential algorithm are the minimum possible for any algorithms in a large class of "natural" domain extending algorithms. But its prallelizability performance is less efficient than complete tree based constructions. However if l is getting larger then the parallelizability of the construction is also getting near to that of complete tree based constructions.tructions.

A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis

  • Tang, Tzung-I;Zheng, Gang;Huang, Yalou;Shu, Guangfu;Wang, Pengtao
    • Industrial Engineering and Management Systems
    • /
    • v.4 no.1
    • /
    • pp.102-108
    • /
    • 2005
  • This paper studies medical data classification methods, comparing decision tree and system reconstruction analysis as applied to heart disease medical data mining. The data we study is collected from patients with coronary heart disease. It has 1,723 records of 71 attributes each. We use the system-reconstruction method to weight it. We use decision tree algorithms, such as induction of decision trees (ID3), classification and regression tree (C4.5), classification and regression tree (CART), Chi-square automatic interaction detector (CHAID), and exhausted CHAID. We use the results to compare the correction rate, leaf number, and tree depth of different decision-tree algorithms. According to the experiments, we know that weighted data can improve the correction rate of coronary heart disease data but has little effect on the tree depth and leaf number.

Improvement and Performance Analysis of Hybrid Anti-Collision Algorithm for Object Identification of Multi-Tags in RFID Systems (RFID 시스템에서 다중 태그 인식을 위한 하이브리드 충돌방지 알고리즘의 개선 및 성능 분석)

  • Choi, Tae-Jeong;Seo, Jae-Joon;Baek, Jang-Hyun
    • IE interfaces
    • /
    • v.22 no.3
    • /
    • pp.278-286
    • /
    • 2009
  • The anti-collision algorithms to identify a number of tags in real-time in RFID systems are divided into the anti-collision algorithms based on the Framed slotted ALOHA that randomly select multiple slots to identify the tags, and the anti-collision algorithms based on the Tree-based algorithm that repeat the questions and answer process to identify the tags. In the hybrid algorithm which is combined the advantages of these algorithms, tags are distributed over the frames by selecting one frame among them and then identified by using the Query tree frame by frame. In this hybrid algorithm, however, the time of identifying all tags may increase if many tags are concentrated in a few frames. In this study, to improve the performance of the hybrid algorithm, we suggest an improved algorithm that the tags select a specific group of frames based on the earlier bits of the tag ID so that the tags are distribute equally over the frames. By using the simulation and mathematical analysis, we show that the suggested algorithm outperforms traditional hybrid algorithm from the viewpoint of the number of queries per frame and the time of identifying all tags.

Improving Performance of Change Detection Algorithms through the Efficiency of Matching (대응효율성을 통한 변화 탐지 알고리즘의 성능 개선)

  • Lee, Suk-Kyoon;Kim, Dong-Ah
    • The KIPS Transactions:PartD
    • /
    • v.14D no.2
    • /
    • pp.145-156
    • /
    • 2007
  • Recently, the needs for effective real time change detection algorithms for XML/HTML documents and increased in such fields as the detection of defacement attacks to web documents, the version management, and so on. Especially, those applications of real time change detection for large number of XML/HTML documents require fast heuristic algorithms to be used in real time environment, instead of algorithms which compute minimal cost-edit scripts. Existing heuristic algorithms are fast in execution time, but do not provide satisfactory edit script. In this paper, we present existing algorithms XyDiff and X-tree Diff, analyze their problems and propose algorithm X-tree Diff which improve problems in existing ones. X-tree Diff+ has similar performance in execution time with existing algorithms, but it improves matching ratio between nodes from two documents by refining matching process based on the notion of efficiency of matching.

Variable Selection with Regression Trees

  • Chang, Young-Jae
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.357-366
    • /
    • 2010
  • Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many noise variables. To handle this problem, we propose the multi-step GUIDE, which is a regression tree algorithm with a variable selection process. The multi-step GUIDE performs better than some of the well-known algorithms such as Random Forest and MARS. The results based on simulation study shows that the multi-step GUIDE outperforms other algorithms in terms of variable selection and prediction accuracy. It generally selects the important variables correctly with relatively few noise variables and eventually gives good prediction accuracy.