• Title/Summary/Keyword: 트리 마이닝

Search Result 129, Processing Time 0.027 seconds

Design of Heuristic Decision Tree (HDT) Using Human Knowledge (인간 지식을 이용한 경험적 의사결정트리의 설계)

  • Yoon, Tae-Tok;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.525-531
    • /
    • 2009
  • Data mining is the process of extracting hidden patterns from collected data. At this time, for collected data which take important role as the basic information for prediction and recommendation, the process to discriminate incorrect data in order to enhance the performance of analysis result, is needed. The existing methods to discriminate unexpected data from collected data, mainly relies on methods which are based on statistics or simple distance between data. However, for these methods, the problematic point that even meaningful data could be excluded from analysis due that the environment and characteristic of the relevant data are not considered, exists. This study proposes a method to endow human heuristic knowledge with weight value through the comparison between collected data and human heuristic knowledge, and to use the value for creating a decision tree. The data discrimination by the method proposed is more credible as human knowledge is reflected in the created tree. The validity of the proposed method is verified through an experiment.

An Adaptive Business Process Mining Algorithm based on Modified FP-Tree (변형된 FP-트리 기반의 적응형 비즈니스 프로세스 마이닝 알고리즘)

  • Kim, Gun-Woo;Lee, Seung-Hoon;Kim, Jae-Hyung;Seo, Hye-Myung;Son, Jin-Hyun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.3
    • /
    • pp.301-315
    • /
    • 2010
  • Recently, competition between companies has intensified and so has the necessity of creating a new business value inventions has increased. A numbers of Business organizations are beginning to realize the importance of business process management. Processes however can often not go the way they were initially designed or non-efficient performance process model could be designed. This can be due to a lack of cooperation and understanding between business analysts and system developers. To solve this problem, business process mining which can be used as the basis of the business process re-engineering has been recognized to an important concept. Current process mining research has only focused their attention on extracting workflow-based process model from competed process logs. Thus there have a limitations in expressing various forms of business processes. The disadvantage in this method is process discovering time and log scanning time in itself take a considerable amount of time. This is due to the re-scanning of the process logs with each new update. In this paper, we will presents a modified FP-Tree algorithm for FP-Tree based business processes, which are used for association analysis in data mining. Our modified algorithm supports the discovery of the appropriate level of process model according to the user's need without re-scanning the entire process logs during updated.

Implementation of Data Preparation System for Data Mining on Heterogenious Distributed Environment (이기종 분산환경에서 데이터마이닝을 위한 데이터준비 시스템 구현)

  • Lee sang hee;Lee won sup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.9 no.3
    • /
    • pp.109-113
    • /
    • 2004
  • This paper is to investigate the efficiency of the process of data preparation for existing data mining tools, and present a design principle for a new efficient data preparation system . We compare the often used data mining tools based on the access method to local and remote databases, and on the exchange of information resources between different computers. The compared data mining tools are Answer Tree, Clementine, Enterprise Miner, and Weka. We propose a design principle for an efficient system for data preparation for data mining on the distributed networks.

  • PDF

Frequent Itemset Search Using LSI Similarity (LSI 유사도를 이용한 효율적인 빈발항목 탐색 알고리즘)

  • Ko, Younhee;Kim, Hyeoncheol;Lee, Wongyu
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.1
    • /
    • pp.1-8
    • /
    • 2003
  • We introduce a efficient vertical mining algorithm that reduces searching complexity for frequent k-itemsets significantly. This method includes sorting items by their LSI(Least Support Itemsets) similarity and then searching frequent itemsets in tree-based manner. The search tree structure provides several useful heuristics and therefore, reduces search space significantly at early stages. Experimental results on various data sets shows that the proposed algorithm improves searching performance compared to other algorithms, especially for a database having long pattern.

  • PDF

Decision Trees For Multiple Abstraction Level of Data (데이터의 다중 추상화 수준을 위한 결정 트리)

  • 정민아;이도현
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.82-84
    • /
    • 2001
  • 데이터 분류(classification)란 이미 분류된 객체집단군 즉, 학습 데이터에 대한 분석을 바탕으로 아직 분류되지 않는 개체의 소속 집단을 결정하는 작업이다. 현재까지 제안된 여러 가지 분류 모델 중 결정 트리(decision tree)는 인간이 이해하기 쉬운 형태를 갖고 있기 때문에 탐사적인 데이터 마이닝(exploatory)작업에 특히 유용하다. 본 논문에서는 결정 트리 분류에 다중 추상화 수준 문제(multiple abstraction level problem)를 소개하고 이러한 문제를 다루기 위한 실용적인 방법을 제안한다. 데이터의 다중 추상화 수준 문제를 해결하기 위해 추상화 수준을 강제로 같게 하는 것이 문제를 해결할 수 없다는 것을 보인 후, 데이터 값들 사이의 일반화, 세분화 관련성을 그대로 유지하면서 존재하는 유용화할 수 있는 방법을 제시한다.

  • PDF

High Utility Pattern Mining using a Prefix-Tree (Prefix-Tree를 이용한 높은 유틸리티 패턴 마이닝 기법)

  • Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan;Lee, In-Gi;Yong, Hwan-Seong
    • Journal of KIISE:Databases
    • /
    • v.36 no.5
    • /
    • pp.341-351
    • /
    • 2009
  • Recently high utility pattern (HUP) mining is one of the most important research issuer in data mining since it can consider the different weight Haloes of items. However, existing mining algorithms suffer from the performance degradation because it cannot easily apply Apriori-principle for pattern mining. In this paper, we introduce new high utility pattern mining approach by using a prefix-tree as in FP-Growth algorithm. Our approach stores the weight value of each item into a node and utilizes them for pruning unnecessary patterns. We compare the performance characteristics of three different prefix-tree structures. By thorough experimentation, we also prove that our approach can give performance improvement to a degree.

Association Service Mining using Level Cross Tree (레벨 교차 트리를 이용한 연관 서비스 탐사)

  • Hwang, Jeong Hee
    • Journal of Digital Contents Society
    • /
    • v.15 no.5
    • /
    • pp.569-577
    • /
    • 2014
  • The various services are required to user in time and space. It is important to provide suitable service to user according to user's circumstance. Therefore it is need to provide services to user through mining by latest information of user activity and service history. In this paper we propose a mining method to search association rule using service history based on spatiotemporal information and service ontology. In this method, we find the associative service pattern using level-cross tree on service ontology. The proposed method is to be a basic research to find the service pattern to provide high quality service to user according to season, location and age under the same context.

Data Mining Algorithm Based on Fuzzy Decision Tree for Pattern Classification (퍼지 결정트리를 이용한 패턴분류를 위한 데이터 마이닝 알고리즘)

  • Lee, Jung-Geun;Kim, Myeong-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.11
    • /
    • pp.1314-1323
    • /
    • 1999
  • 컴퓨터의 사용이 일반화됨에 따라 데이타를 생성하고 수집하는 것이 용이해졌다. 이에 따라 데이타로부터 자동적으로 유용한 지식을 얻는 기술이 필요하게 되었다. 데이타 마이닝에서 얻어진 지식은 정확성과 이해성을 충족해야 한다. 본 논문에서는 데이타 마이닝을 위하여 퍼지 결정트리에 기반한 효율적인 퍼지 규칙을 생성하는 알고리즘을 제안한다. 퍼지 결정트리는 ID3와 C4.5의 이해성과 퍼지이론의 추론과 표현력을 결합한 방법이다. 특히, 퍼지 규칙은 속성 축에 평행하게 판단 경계선을 결정하는 방법으로는 어려운 속성 축에 평행하지 않는 경계선을 갖는 패턴을 효율적으로 분류한다. 제안된 알고리즘은 첫째, 각 속성 데이타의 히스토그램 분석을 통해 적절한 소속함수를 생성한다. 둘째, 주어진 소속함수를 바탕으로 ID3와 C4.5와 유사한 방법으로 퍼지 결정트리를 생성한다. 또한, 유전자 알고리즘을 이용하여 소속함수를 조율한다. IRIS 데이타, Wisconsin breast cancer 데이타, credit screening 데이타 등 벤치마크 데이타들에 대한 실험 결과 제안된 방법이 C4.5 방법을 포함한 다른 방법보다 성능과 규칙의 이해성에서 보다 효율적임을 보인다.Abstract With an extended use of computers, we can easily generate and collect data. There is a need to acquire useful knowledge from data automatically. In data mining the acquired knowledge needs to be both accurate and comprehensible. In this paper, we propose an efficient fuzzy rule generation algorithm based on fuzzy decision tree for data mining. We combine the comprehensibility of rules generated based on decision tree such as ID3 and C4.5 and the expressive power of fuzzy sets. Particularly, fuzzy rules allow us to effectively classify patterns of non-axis-parallel decision boundaries, which are difficult to do using attribute-based classification methods.In our algorithm we first determine an appropriate set of membership functions for each attribute of data using histogram analysis. Given a set of membership functions then we construct a fuzzy decision tree in a similar way to that of ID3 and C4.5. We also apply genetic algorithm to tune the initial set of membership functions. We have experimented our algorithm with several benchmark data sets including the IRIS data, the Wisconsin breast cancer data, and the credit screening data. The experiment results show that our method is more efficient in performance and comprehensibility of rules compared with other methods including C4.5.

Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports (다중 최소 임계치 기반 빈발 패턴 마이닝의 성능분석)

  • Ryang, Heungmo;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.1-8
    • /
    • 2013
  • Data mining techniques are used to find important and meaningful information from huge databases, and pattern mining is one of the significant data mining techniques. Pattern mining is a method of discovering useful patterns from the huge databases. Frequent pattern mining which is one of the pattern mining extracts patterns having higher frequencies than a minimum support threshold from databases, and the patterns are called frequent patterns. Traditional frequent pattern mining is based on a single minimum support threshold for the whole database to perform mining frequent patterns. This single support model implicitly supposes that all of the items in the database have the same nature. In real world applications, however, each item in databases can have relative characteristics, and thus an appropriate pattern mining technique which reflects the characteristics is required. In the framework of frequent pattern mining, where the natures of items are not considered, it needs to set the single minimum support threshold to a too low value for mining patterns containing rare items. It leads to too many patterns including meaningless items though. In contrast, we cannot mine any pattern if a too high threshold is used. This dilemma is called the rare item problem. To solve this problem, the initial researches proposed approximate approaches which split data into several groups according to item frequencies or group related rare items. However, these methods cannot find all of the frequent patterns including rare frequent patterns due to being based on approximate techniques. Hence, pattern mining model with multiple minimum supports is proposed in order to solve the rare item problem. In the model, each item has a corresponding minimum support threshold, called MIS (Minimum Item Support), and it is calculated based on item frequencies in databases. The multiple minimum supports model finds all of the rare frequent patterns without generating meaningless patterns and losing significant patterns by applying the MIS. Meanwhile, candidate patterns are extracted during a process of mining frequent patterns, and the only single minimum support is compared with frequencies of the candidate patterns in the single minimum support model. Therefore, the characteristics of items consist of the candidate patterns are not reflected. In addition, the rare item problem occurs in the model. In order to address this issue in the multiple minimum supports model, the minimum MIS value among all of the values of items in a candidate pattern is used as a minimum support threshold with respect to the candidate pattern for considering its characteristics. For efficiently mining frequent patterns including rare frequent patterns by adopting the above concept, tree based algorithms of the multiple minimum supports model sort items in a tree according to MIS descending order in contrast to those of the single minimum support model, where the items are ordered in frequency descending order. In this paper, we study the characteristics of the frequent pattern mining based on multiple minimum supports and conduct performance evaluation with a general frequent pattern mining algorithm in terms of runtime, memory usage, and scalability. Experimental results show that the multiple minimum supports based algorithm outperforms the single minimum support based one and demands more memory usage for MIS information. Moreover, the compared algorithms have a good scalability in the results.

A Study on Intermittent Demand Forecasting of Patriot Spare Parts Using Data Mining (데이터 마이닝을 이용한 패트리어트 수리부속의 간헐적 수요 예측에 관한 연구)

  • Park, Cheonkyu;Ma, Jungmok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.3
    • /
    • pp.234-241
    • /
    • 2021
  • By recognizing the importance of demand forecasting, the military is conducting many studies to improve the prediction accuracy for repair parts. Demand forecasting for repair parts is becoming a very important factor in budgeting and equipment availability. On the other hand, the demand for intermittent repair parts that have not constant sizes and intervals with the time series model currently used in the military is difficult to predict. This paper proposes a method to improve the prediction accuracy for intermittent repair parts of the Patriot. The authors collected intermittent repair parts data by classifying the demand types of 701 repair parts from 2013 to 2019. The temperature and operating time identified as external factors that can affect the failure were selected as input variables. The prediction accuracy was measured using both time series models and data mining models. As a result, the prediction accuracy of the data mining models was higher than that of the time series models, and the multilayer perceptron model showed the best performance.