• Title/Summary/Keyword: C5.0 decision tree

Search Result 47, Processing Time 0.022 seconds

Major gene identification for FASN gene in Korean cattles by data mining (데이터마이닝을 이용한 한우의 우수 지방산합성효소 유전자 조합 선별)

  • Kim, Byung-Doo;Kim, Hyun-Ji;Lee, Seong-Won;Lee, Jea-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1385-1395
    • /
    • 2014
  • Economic traits of livestock are affected by environmental factors and genetic factors. In addition, it is not affected by one gene, but is affected by interaction of genes. We used a linear regression model in order to adjust environmental factors. And, in order to identify gene-gene interaction effect, we applied data mining techniques such as neural network, logistic regression, CART and C5.0 using five-SNPs (single nucleotide polymorphism) of FASN (fatty acid synthase). We divided total data into training (60%) and testing (40%) data, and applied the model which was designed by training data to testing data. By the comparison of prediction accuracy, C5.0 was identified as the best model. It were selected superior genotype using the decision tree.

Design and Evaluation of ANFIS-based Classification Model (ANFIS 기반 분류모형의 설계 및 성능평가)

  • Song, Hee-Seok;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.151-165
    • /
    • 2009
  • Fuzzy neural network is an integrated model of artificial neural network and fuzzy system and it has been successfully applied in control and forecasting area. Recently ANFIS(Adaptive Network-based Fuzzy Inference System) has been noticed widely among various fuzzy neural network models because of its outstanding accuracy of control and forecasting area. We design a new classification model based on ANFIS and evaluate it in terms of classification accuracy. We identified ANFIS-based classification model has higher classification accuracy compared to existing classification model, C5.0 decision tree model by comparing their experimental results.

  • PDF

Analyzing Customer Purchase Behavior of a Department Store and Applying Customer Relationship Management Strategies (백화점 고객의 구매 분석 및 고객관계관리 전략 적용)

  • Ha Sung Ho;Baek Kyung Hoon
    • Korean Management Science Review
    • /
    • v.21 no.3
    • /
    • pp.55-69
    • /
    • 2004
  • This study analyzes customer buying-behavior patterns in a department store as time goes on, and predicts moving patterns of its customers. Through them, it suggests in this paper short-term and long-term marketing promotion strategies. RFM techniques are utilized for customer segmentation. Customers are clustered by using the Kohonen's Self Organizing Map as a method of data mining techniques. Then C5.0, a decision tree analysis technique, is used to predict moving patterns of customers. Using real world data, this study evaluates the prediction accuracy of predictive models.

Artificial Neural Network, Induction Rules, and IRANN to Forecast Purchasers for a Specific Product (제품별 구매고객 예측을 위한 인공신경망, 귀납규칙 및 IRANN모형)

  • Jung Su-Mi;Lee Gun-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.30 no.4
    • /
    • pp.117-130
    • /
    • 2005
  • It is effective and desirable for a proper customer relationship management or marketing to focus on the specific customers rather than a number of non specific customers. This study forecasts the prospective purchasers with high probability to purchase a specific product. Artificial Neural Network( ANN) can classily the characteristics of the prospective purchasers but ANN has a limitation in comprehending of outputs. ANN is integrated into IRANN with IR of decision tree program C5.0 to comprehend and analyze the outputs of ANN. We compare and analyze the accuracy of ANN, IR, and IRANN each other.

Data Mining Model Approach for The Risk Factor of BMI - By Medical Examination of Health Data -

  • Lee Jea-Young;Lee Yong-Won
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.1
    • /
    • pp.217-227
    • /
    • 2005
  • The data mining is a new approach to extract useful information through effective analysis of huge data in numerous fields. We utilized this data mining technique to analyze medical record of 35,671 people. Whole data were assorted by BMI score and divided into two groups. We tried to find out BMI risk factor from overweight group by analyzing the raw data with data mining approach. The result extracted by C5.0 decision tree method showed that important risk factors for BMI score are triglyceride, gender, age and HDL cholesterol. Odds ratio of major risk factors were calculated to show individual effect of each factors.

Churn Analysis for the First Successful Candidates in the Entrance Examination for K University

  • Kim, Kyu-Il;Kim, Seung-Han;Kim, Eun-Young;Kim, Hyun;Yang, Jae-Wan;Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.1-10
    • /
    • 2007
  • In this paper, we focus on churn analysis for the first successful candidates in the entrance examination on 2006 year using Clementine, data mining tool. The goal of this study is to apply decision tree including C5.0 and CART algorithms, neural network and logistic regression techniques to predict a successful candidate churn. And we analyze the churning and nochurning successful candidates and why the successful candidates churn and which successful candidates are most likely to churn in the future using data from entrance examination data of K university on 2006 year.

  • PDF

Business Process Repository for Exception Handling in BPM (예외업무 관리를 위한 비즈니스 프로세스 저장소의 활용)

  • Choi Deok-Won;Sin Jin-Gyu;Jin Jung-Hyeon
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2006.05a
    • /
    • pp.265-270
    • /
    • 2006
  • In an organization where major business operations are geared by business process management system(BPMS), routine tasks are processed according to the predefined business processes. However, most business operations are subject to some sort of exceptions, and the exceptional situations require update of the existing business process model, or a new business process model has to be defined to handle the exceptions. This paper proposes a system architecture that deploys business process repository as the media for storage and retrieval of the various business process models developed for exception handling. Well defined situation variables and decision variables play the key role for efficient storage and retrieval of the business process models developed for exception handling. The data mining technique C5.0 was used to build the optimum path for the process repository search tree.

  • PDF

Host based Feature Description Method for Detecting APT Attack (APT 공격 탐지를 위한 호스트 기반 특징 표현 방법)

  • Moon, Daesung;Lee, Hansung;Kim, Ikkyun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.5
    • /
    • pp.839-850
    • /
    • 2014
  • As the social and financial damages caused by APT attack such as 3.20 cyber terror are increased, the technical solution against APT attack is required. It is, however, difficult to protect APT attack with existing security equipments because the attack use a zero-day malware persistingly. In this paper, we propose a host based anomaly detection method to overcome the limitation of the conventional signature-based intrusion detection system. First, we defined 39 features to identify between normal and abnormal behavior, and then collected 8.7 million feature data set that are occurred during running both malware and normal executable file. Further, each process is represented as 83-dimensional vector that profiles the frequency of appearance of features. the vector also includes the frequency of features generated in the child processes of each process. Therefore, it is possible to represent the whole behavior information of the process while the process is running. In the experimental results which is applying C4.5 decision tree algorithm, we have confirmed 2.0% and 5.8% for the false positive and the false negative, respectively.

Establishment of Hygienic Standards for Pizza Restaurant Based on HACCP Concept -Focused on Pizza Production- (HACCP의 적용을 위한 피자 전문 레스토랑의 위생관리 기준 설정 -피자생산을 중심으로-)

  • Lee, Bog-Hieu;Huh, Kyoung-Sook;Kim, In-Ho
    • Korean Journal of Food Science and Technology
    • /
    • v.36 no.1
    • /
    • pp.174-182
    • /
    • 2004
  • Hygienic standards for pizza specialty restaurant located in Seoul during summer, 2000 were established based on HACCP concept by measuring temperature, time, pH, $A_{w}$ and microbiological assessments of pizza, and evaluation of hygienic conditions of kitchens and workers. Kitchen and worker conditions were average 1.2 and 1.0 (3 point Sly's scale), respectively, Microbial contaminations occurred at $5-60^{\circ}C$, pH above 5.0, and $A_{w}$ (0.93-0.98). Microbial assessments for pizza processing revealed $1.5{\times}10^{2}-3.9{\times}10^{8}\;CFU/g$ of TPC and $0.5{\times}10^{1}-1.6{\times}10^{7}\;CFU/g$ of coliforms, exceeding standards ($TPC\;10^{6}\;CFU/g\;and\;coliform\;10^{3}\;CFU/g$) established by Solberg et al., although significantly decreased after baking. S. aureus was not discovered, but Salmonella was found in onions. Tools and containers such as pizza cutting knife, topping container, serving bowl, pizza plate, working board, and dough kneading board contained $6.2{\times}10^{2}-1.1{\times}10^{9}\;CFU/g$ of TPC, $2.0{\times}10^{1}-6.2{\times}10^{3}\;CFU/g$ of coliforms. Workers' hands contained $3.1{\times}10^{4}\;CFU/g$ of TPC and S. aureus as compared to safety standards of Harrigan and McCance (500 and 10 CFU/g of TPC and coliforms per $100cm^{2}$). CCPs (critical control points) were determined as receiving, topping, and baking according to CCP decision tree analysis. Results suggest purchase of quality materials, careful monitoring of time and temperature, hygienic use of tools and utensils, and sanitary practicer by workers are recommended as control points for safe pizza production.

Identification of major risk factors association with respiratory diseases by data mining (데이터마이닝 모형을 활용한 호흡기질환의 주요인 선별)

  • Lee, Jea-Young;Kim, Hyun-Ji
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.373-384
    • /
    • 2014
  • Data mining is to clarify pattern or correlation of mass data of complicated structure and to predict the diverse outcomes. This technique is used in the fields of finance, telecommunication, circulation, medicine and so on. In this paper, we selected risk factors of respiratory diseases in the field of medicine. The data we used was divided into respiratory diseases group and health group from the Gyeongsangbuk-do database of Community Health Survey conducted in 2012. In order to select major risk factors, we applied data mining techniques such as neural network, logistic regression, Bayesian network, C5.0 and CART. We divided total data into training and testing data, and applied model which was designed by training data to testing data. By the comparison of prediction accuracy, CART was identified as best model. Depression, smoking and stress were proved as the major risk factors of respiratory disease.