• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.03 seconds

Symbolic tree based model for HCC using SNP data (악성간암환자의 유전체자료 심볼릭 나무구조 모형연구)

  • Lee, Tae Rim
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1095-1106
    • /
    • 2014
  • Symbolic data analysis extends the data mining and exploratory data analysis to the knowledge mining, we can suggest the SDA tree model on clinical and genomic data with new knowledge mining SDA approach. Using SDA application for huge genomic SNP data, we can get the correlation the availability of understanding of hidden structure of HCC data could be proved. We can confirm validity of application of SDA to the tree structured progression model and to quantify the clinical lab data and SNP data for early diagnosis of HCC. Our proposed model constructs the representative model for HCC survival time and causal association with their SNP gene data. To fit the simple and easy interpretation tree structured survival model which could reduced from huge clinical and genomic data under the new statistical theory of knowledge mining with SDA.

On the Tree Model grown by one-sided purity (단측 순수성에 의한 나무모형의 성장에 대하여)

  • 김용대;최대우
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.1
    • /
    • pp.17-25
    • /
    • 2001
  • Tree model is the most popular classification algorithm in data mining due to easy interpretation of the result. In CART(Breiman et al., 1984) and C4.5(Quinlan, 1993) which are representative of tree algorithms, the split fur classification proceeds to attain the homogeneous terminal nodes with respect to the composition of levels in target variable. But, fur instance, in the chum prediction modeling fur CRM(Customer Relationship management), the rate of churn is generally very low although we are interested in mining the churners. Thus it is difficult to get accurate prediction modes using tree model based on the traditional split rule, such as mini or deviance. Buja and Lee(1999) introduced a new split rule, one-sided purity for classifying minor interesting group. In this paper, we compared one-sided purity with traditional split rule, deviance analyzing churning vs. non-churning data of ISP company. Also reviewing the result of tree model based on one-sided purity with some simulated data, we discussed problems and researchable topics.

  • PDF

Case Study of CRM Application Using Improvement Method of Fuzzy Decision Tree Analysis (퍼지의사결정나무 개선방법을 이용한 CRM 적용 사례)

  • Yang, Seung-Jeong;Rhee, Jong-Tae
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.8
    • /
    • pp.13-20
    • /
    • 2007
  • Decision tree is one of the most useful analysis methods for various data mining functions, including prediction, classification, etc, from massive data. Decision tree grows by splitting nodes, during which the purity increases. It is needed to stop splitting nodes when the purity does not increase effectively or new leaves does not contain meaningful number of records. Pruning is done if a branch does not show certain level of performance. By pruning, the structure of decision tree is changed and it is implied that the previous splitting of the parent node was not effective. It is also implied that the splitting of the ancestor nodes were not effective and the choices of attributes and criteria in splitting them were not successful. It should be noticed that new attributes or criteria might be selected to split such nodes for better tries. In this paper, we suggest a procedure to modify decision tree by Fuzzy theory and splitting as an integrated approach.

A Decision Tree Approach for Identifying Defective Products in the Manufacturing Process

  • Choi, Sungsu;Battulga, Lkhagvadorj;Nasridinov, Aziz;Yoo, Kwan-Hee
    • International Journal of Contents
    • /
    • v.13 no.2
    • /
    • pp.57-65
    • /
    • 2017
  • Recently, due to the significance of Industry 4.0, the manufacturing industry is developing globally. Conventionally, the manufacturing industry generates a large volume of data that is often related to process, line and products. In this paper, we analyzed causes of defective products in the manufacturing process using the decision tree technique, that is a well-known technique used in data mining. We used data collected from the domestic manufacturing industry that includes Manufacturing Execution System (MES), Point of Production (POP), equipment data accumulated directly in equipment, in-process/external air-conditioning sensors and static electricity. We propose to implement a model using C4.5 decision tree algorithm. Specifically, the proposed decision tree model is modeled based on components of a specific part. We propose to identify the state of products, where the defect occurred and compare it with the generated decision tree model to determine the cause of the defect.

Improved Decision Tree Algorithms by Considering Variables Interaction (교호효과를 고려한 향상된 의사결정나무 알고리듬에 관한 연구)

  • Kwon, Keunseob;Choi, Gyunghyun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.30 no.4
    • /
    • pp.267-276
    • /
    • 2004
  • Much of previous attention on researches of the decision tree focuses on the splitting criteria and optimization of tree size. Nowadays the quantity of the data increase and relation of variables becomes very complex. And hence, this comes to have plenty number of unnecessary node and leaf. Consequently the confidence of the explanation and forecasting of the decision tree falls off. In this research report, we propose some decision tree algorithms considering the interaction of predictor variables. A generic algorithm, the k-1 Algorithm, dealing with the interaction with a combination of all predictor variable is presented. And then, the extended version k-k Algorithm which considers with the interaction every k-depth with a combination of some predictor variables. Also, we present an improved algorithm by introducing control parameter to the algorithms. The algorithms are tested by real field credit card data, census data, bank data, etc.

Development of Decision Tree Program based on Web for Analyzing Clinical Information of Sasang Constitutional Medicine (사상체질 임상정보 분석을 위한 웹 기반의 의사결정 나무 프로그램 개발)

  • Jin, Hee-Jeong;Kim, Myoung-Geun;Kim, Jong-Yeol
    • Korean Journal of Oriental Medicine
    • /
    • v.14 no.3
    • /
    • pp.81-87
    • /
    • 2008
  • Sasanag Contitution Medicine(SCM) is the traditional medicine theory based on constitutional medicine in Korea. It is most import ant that a personal SCM type is determined accurately ahead of applying any Sasang treatments. For this, many researches have been studied to diagnose the SCM type using constitutional clinical data. The decision tree is a tree-structured data-mining methodology. Recently, in the Korean traditional medicine society, there have been several efforts to find diagnosing tools using the decision tree method. So, we developed a decision tree program based on web for analyzing constitutional clinical information. It can use various clinical data as input data, offer filtering function to select clinical data to be used. We can find useful factor to be influential on SCM types using this program.

  • PDF

An Application of Decision Tree Method for Fault Diagnosis of Induction Motors

  • Tran, Van Tung;Yang, Bo-Suk;Oh, Myung-Suck
    • Proceedings of the Korea Committee for Ocean Resources and Engineering Conference
    • /
    • 2006.11a
    • /
    • pp.54-59
    • /
    • 2006
  • Decision tree is one of the most effective and widely used methods for building classification model. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining have considered the decision tree method as an effective solution to their field problems. In this paper, an application of decision tree method to classify the faults of induction motors is proposed. The original data from experiment is dealt with feature calculation to get the useful information as attributes. These data are then assigned the classes which are based on our experience before becoming data inputs for decision tree. The total 9 classes are defined. An implementation of decision tree written in Matlab is used for these data.

  • PDF

Development of a Measurement Method for Three Dimensional Treeing Degradation using a Computerized Tomography Method

  • Masateru-Yanagiwara;Noboru-Yoshimura
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 1990.10a
    • /
    • pp.23-25
    • /
    • 1990
  • In this paper, a system to measure tree degradation of three dimensional phenomena in organic insulating materials using image processing system is discussed. Using a computerized tomography method, volume of tree immediately after tree initiation, as well as changes in the configuration of the tree were measured, which up to now have been difficult to measure. The specimens used an acrylic acid resin. As a result, it was possible to record the cross sections of the tree, and to describe the volume of the tree by the three dimensional measurement.

An Efficient Algorithm for mining frequent itemsets using L2-tree (L2-tree를 이용한 효율적인 빈발항목 집합 탐사)

  • 박인창;장중혁;이원석
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.259-261
    • /
    • 2002
  • 데이터마이닝 분야에서 빈발항목집합 탐사에 관한 연구는 활발히 진행되어 왔지만 여전히 많은 메모리 공간과 시간을 필요로 한다. 특히 apriori 알고리즘에 기반한 방법들은 긴 패턴이 생성될수록 지수적으로 시간과 공간이 증가한다. 최근에 발표된 fp-growth는 일반적인 데이터 집합에서 우수한 성능을 보이나 희소 데이터 집합에서 효율적인 성능을 보여주지 못한다. 본 논문에서는 길이가 2인 빈발항목집합 L2에 기반한 L2-tree 구조를 제안한다. 또한 L2-tree에서 빈발항목집합을 탐사하는 L2-traverse 알고리즘을 제안한다. L2-tree는 L2를 기반으로 하기 때문에 L2가 상대적으로 적은 희소 데이터 집합 환경에서 적은 메모리 공간을 사용하게 된다. L2-traverse 알고리즘은 별도의 추출 데이터베이스를 생성하는 FP-growth와 달리 단순히 L2-tree를 오직 한번의 깊이 우선 탐사를 통해 빈발항목집합을 찾는다. 최적화 기법으로써 길이가 3인 빈발항목집합 L3가 되지 않는 L2 패턴들을 미리 제거하는 방법으로 C3-traverse 알고리즘을 제안하며 실험을 통해 기존 알고리즘과 비교 검증한다.

  • PDF

Analysis of Healthcare Quality Indicators using Data Mining and Development of a Decision Support System (데이터마이닝을 이용한 의료의 질 측정지표 분석 및 의사결정지원시스템 개발)

  • Kim, Hye Sook;Chae, Young-Moon;Tark, Kwan-Chul;Park, Hyun-Ju;Ho, Seung-Hee
    • Quality Improvement in Health Care
    • /
    • v.8 no.2
    • /
    • pp.186-207
    • /
    • 2001
  • Background : This study presented an analysis of healthcare quality indicators using data mining and a development of decision support system for quality improvement. Method : Specifically, important factors influencing the key quality indicators were identified using a decision tree method for data mining based on 8,405 patients who discharged from a medical center during the period between December 1, 2000 and January 31, 2001. In addition, a decision support system was developed to analyze and monitor trends of these quality indicators using a Visual Basic 6.0. Guidelines and tutorial for quality improvement activities were also included in the system. Result : Among 12 selected quality indicators, decision tree analysis was performed for 3 indicators ; unscheduled readmission due to the same or related condition, unscheduled return to intensive care unit, and inpatient mortality which have a volume bigger than 100 cases during the period. The optimum range of target group in healthcare quality indicators were identified from the gain chart. Important influencing factors for these 3 indicators were: diagnosis, attribute of the disease, and age of the patient in unscheduled returns to ICU group ; and length of stay, diagnosis, and belonging department in inpatient mortality group. Conclusion : We developed a decision support system through analysis of healthcare quality indicators and data mining technique which can be effectively implemented for utilization review and quality management in a healthcare organization. In the future, further number of quality indicators should be developed to effectively support a hospital-wide Continuous Quality Improvement activity. Through these endevours, a decision support system can be developed and the newly developed decision support system should be well integrated with the hospital Order Communication System to support concurrent review, utilization review, quality and risk management.

  • PDF