• Title/Summary/Keyword: C5.0 decision tree

Search Result 47, Processing Time 0.024 seconds

Identification of Subgroups with Poor Glycemic Control among Patients with Type 2 Diabetes Mellitus: Based on the Korean National Health and Nutrition Examination Survey from KNHANES VII (2016 to 2018) (제 2형 성인 당뇨병 유병자의 혈당조절 취약군 예측: 제7기(2016-2018년도) 국민건강영양조사 자료 활용)

  • Kim, Hee Sun;Jeong, Seok Hee
    • Journal of Korean Biological Nursing Science
    • /
    • v.23 no.1
    • /
    • pp.31-42
    • /
    • 2021
  • Purpose: This study was performed to assess the level of blood glucose and to identify poor glycemic control groups among patients with type 2 diabetes mellitus (DM). Methods: Data of 1,022 Korean type 2 DM patients aged 30-64 years were extracted from the Korea National Health and Nutrition Examination Survey VII. Complex samples analysis and a decision-tree analysis were performed using the SPSS WIN 26.0 program. Results: The mean level of hemoglobin A1c (HbA1c) was 7.22±0.25%, and 69.0% of the participants showed abnormal glycemic control (HbA1c≥6.5%). The characteristics of participants associated with poor glycemic control groups were presented with six different pathways by the decision-tree analysis. Poor glycemic control groups were classified according to the patients' characteristics such as period after DM diagnosis, awareness of DM, sleep duration, gender, alcohol drinking, occupation, income status, low density lipoprotein-cholesterol, abdominal obesity, and number of walking days per week. Period of DM diagnosis with a cut-off point of 6 years was the most significant predictor of the poor glycemic control group. Conclusion: The findings showed the predictable characteristics of the poor glycemic control groups, and they can be used to screen the poor glycemic control groups among adults with type 2 DM.

Using a Hybrid Model of DEA and Decision Tree Algorithm C5.0 to Evaluate the Efficiency of Ports (DEA와 의사결정 나무(C5.0)의 하이브리드 모델을 사용한 항만의 효율성 평가)

  • Hong, Han-Kook;Leem, Byung-hak;Kim, Sam-Moon
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.7
    • /
    • pp.99-109
    • /
    • 2019
  • Data Envelopment Analysis (DEA), a non-parametric productivity analysis tool, has become an accepted approach for assessing efficiency in a wide range of fields. Despite of its extensive applications, some features of DEA remain bothersome. For example DEA is good at estimating "relative" efficiency of a DMU(Decision Making Unit), it only tells us how well we are doing compared with our peers but not compared with a "theoretical maximum." Thus, in order to measure efficiency of a new DMU, we have to develop entirely new DEA with the data of previously used DMUs. Also we cannot predict the efficiency level of the new DMU without another DEA analysis. We aim to show that DEA can be used to evaluate the efficiency of ports and suggest the methodology which overcomes the limitation of DEA through hybrid analysis utilizing DEA along with C5.0. We can generate classification rules C5.0 in order to classify any new Port without perturbing previously existing evaluation structures by proposed methodology.

Search Tree Generation for Efficient Management of Business Process Repository in e-commerce Delivery Exception Handling (전자상거래 배송업무의 예외처리용 프로세스 저장소의 효과적 관리를 위한 검색트리 생성)

  • Choi, Doug-Won;Shin, Jin-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.4
    • /
    • pp.147-160
    • /
    • 2008
  • BPMS(business process management system) facilitates defining new processes or updating existing processes. However, processing of exceptional or nonroutine task requires the intervention of domain experts or introduction of the situation specific resolution process. This paper assumes sufficient amount of business process exception handling cases are stored in the process repository. Since the retrieval of the best exception handling process requires a good understanding about the exceptional situation, context awareness is an important issue. To facilitate the understanding of exceptional situation and to enable the efficient selection of the best exception handling process, we adopted the 'situation variable' and 'decision variable' construct. A case example for exception handling in the e-commerce delivery process is provided to illustrate how the proposed construct works. Application of the C5.0 algorithm guarantees the construction of an optimum search tree. It also implies that an efficient search path has been identified for the context aware selection of the best exception handling process.

  • PDF

Data Mining for Knowledge Management in a Health Insurance Domain

  • Chae, Young-Moon;Ho, Seung-Hee;Cho, Kyoung-Won;Lee, Dong-Ha;Ji, Sun-Ha
    • Journal of Intelligence and Information Systems
    • /
    • v.6 no.1
    • /
    • pp.73-82
    • /
    • 2000
  • This study examined the characteristicso f the knowledge discovery and data mining algorithms to demonstrate how they can be used to predict health outcomes and provide policy information for hypertension management using the Korea Medical Insurance Corporation database. Specifically this study validated the predictive power of data mining algorithms by comparing the performance of logistic regression and two decision tree algorithms CHAID (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5) since logistic regression has assumed a major position in the healthcare field as a method for predicting or classifying health outcomes based on the specific characteristics of each individual case. This comparison was performed using the test set of 4,588 beneficiaries and the training set of 13,689 beneficiaries that were used to develop the models. On the contrary to the previous study CHAID algorithm performed better than logistic regression in predicting hypertension but C5.0 had the lowest predictive power. In addition CHAID algorithm and association rule also provided the segment characteristics for the risk factors that may be used in developing hypertension management programs. This showed that data mining approach can be a useful analytic tool for predicting and classifying health outcomes data.

  • PDF

MRI Predictors of Malignant Transformation in Patients with Inverted Papilloma: A Decision Tree Analysis Using Conventional Imaging Features and Histogram Analysis of Apparent Diffusion Coefficients

  • Chong Hyun Suh;Jeong Hyun Lee;Mi Sun Chung;Xiao Quan Xu;Yu Sub Sung;Sae Rom Chung;Young Jun Choi;Jung Hwan Baek
    • Korean Journal of Radiology
    • /
    • v.22 no.5
    • /
    • pp.751-758
    • /
    • 2021
  • Objective: Preoperative differentiation between inverted papilloma (IP) and its malignant transformation to squamous cell carcinoma (IP-SCC) is critical for patient management. We aimed to determine the diagnostic accuracy of conventional imaging features and histogram parameters obtained from whole tumor apparent diffusion coefficient (ADC) values to predict IP-SCC in patients with IP, using decision tree analysis. Materials and Methods: In this retrospective study, we analyzed data generated from the records of 180 consecutive patients with histopathologically diagnosed IP or IP-SCC who underwent head and neck magnetic resonance imaging, including diffusion-weighted imaging and 62 patients were included in the study. To obtain whole tumor ADC values, the region of interest was placed to cover the entire volume of the tumor. Classification and regression tree analyses were performed to determine the most significant predictors of IP-SCC among multiple covariates. The final tree was selected by cross-validation pruning based on minimal error. Results: Of 62 patients with IP, 21 (34%) had IP-SCC. The decision tree analysis revealed that the loss of convoluted cerebriform pattern and the 20th percentile cutoff of ADC were the most significant predictors of IP-SCC. With these decision trees, the sensitivity, specificity, accuracy, and C-statistics were 86% (18 out of 21; 95% confidence interval [CI], 65-95%), 100% (41 out of 41; 95% CI, 91-100%), 95% (59 out of 61; 95% CI, 87-98%), and 0.966 (95% CI, 0.912-1.000), respectively. Conclusion: Decision tree analysis using conventional imaging features and histogram analysis of whole volume ADC could predict IP-SCC in patients with IP with high diagnostic accuracy.

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

Identifying prospective buyers for specific products using artificial neural network and induction rules (인공신경망과 귀납규칙기법을 이용한 제품별 예상 구매고객예측)

  • Lee Geon-Ho;Jeong Su-Mi;Jeong Byeong-Hui
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.395-398
    • /
    • 2004
  • It is effective and desirable for a proper customer relational management(CRM) to send an email of product sales' advertisement bills for the prospective customers rather than to send spam mails for non specific customers. This study identifies the prospective customers with high probability to buy the specific products using Artificial Neural Network(ANN) and Induction Rule(IR) technique. We suggest an integrated model, IRANN of ANN and IR of decision tree program C5.0 and, also compare and analyze the accuracy of ANN, IR, and IRANN each other.

  • PDF

Verification Test of High-activity SMEs Using Technology Appraisal Items (기술력 평가항목을 이용한 고활동성 중소기업 판별)

  • Lee, Jun-won
    • Journal of Technology Innovation
    • /
    • v.28 no.1
    • /
    • pp.31-52
    • /
    • 2020
  • This study was started to verify the preliminary(Ex-ante) discrimination power of the firm's high-activity using the 'Forward-looking' oriented technology appraisal model used in technology financing. The analytical firms are classified into the industry (manufacturing / non-manufacturing) and the age of company (initial / non-initial). High-activity SMEs are defined as those that achieve at least twice the average asset turnover ratio of the cluster. As a result of the discriminant model by applying C5.0 method, which is one of decision tree models, classification accuracy is more than 99% in all industries and the age of company, and it is confirmed that the discriminant power of the model is stable. As a result, the management expertise, capital involvement and funding capacity items were identified as a critical variable for the high-activity SMEs. In addition, the technology management capability and technology life cycle were also confirmed to be the items to determine high-activity SMEs in the manufacturing industry. Through this, it was possible to confirm some possibility of prior discrimination and policy utilization of high-activity SMEs by using technology appraisal items.

A Study on the Documents's Automatic Classification Using Machine Learning (기계학습을 이용한 문서 자동분류에 관한 연구)

  • Kim, Seong-Hee;Eom, Jae-Eun
    • Journal of Information Management
    • /
    • v.39 no.4
    • /
    • pp.47-66
    • /
    • 2008
  • This study introduced the machine learning algorithms to overcome the many different limitations involved with manual classification and to provide the users with faster and more accurate classification service. The experiments objects of the study were consisted of 100 literature titles for each of the eight subject categories in MeSH. The algorithms used to the experiments included Neural network, C5.0, CHAID and KNN. As results, the combination of the neural network and C5.0 technique recorded classification accuracy of 83.75%, which was 2.5% and 3.75% higher than that of the neural network alone and C5.0 alone, respectively. The number represented the highest accuracy rates among the four classification experiments. Thus the use of the neural network and C5.0 technique together will result in higher accuracy rates than the techniques individually.

Length of stay in PACU among surgical patients using data mining technique (데이터 마이닝을 활용한 외과수술환자의 회복실 체류시간 분석)

  • Yoo, Je-Bog;Jang, Hee Jung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.7
    • /
    • pp.3400-3411
    • /
    • 2013
  • The data mining is a new approach to extract useful information through effective analysis of huge data in numerous fields. This study was analyzed by decision making tree model using Clementine C&RT(Classification & Regression Tree, CART) as data mining technique. We utilized this data mining technique to analyze medical record of 1,500 people. Whole data were assorted by length of stay in PACU and divided into 3 groups. The result extracted by C5.0 decision tree method showed that important related factors for lengh of stay in PACU are type of operation, preoperative EKG abnormality, anesthetics, operative duration, age.