• Title/Summary/Keyword: C5.0 decision tree

Search Result 47, Processing Time 0.023 seconds

A new Intelligent Yield Management Methodology based on Feature Manipulation (특성 변동 관리에 기반한 지능적 수율관리 방안)

  • 이장희
    • Proceedings of the Korean Society for Quality Management Conference
    • /
    • 2004.04a
    • /
    • pp.148-151
    • /
    • 2004
  • This study presents a new intelligent yield management methodology which can forecast the yield level of a production unit based on features' behaviors. In this proposed methodology, we identify the existing features using C5.0 that are combination of nodes (i.e., variables) in the decision tree generated by C5.0, use SOM(Self-Organizing Map) neural networks in oder to extract the feature's patterns and classify, and then make features' control rules using C5.0.

  • PDF

Tolerance Computation for Process Parameter Considering Loss Cost : In Case of the Larger is better Characteristics (손실 비용을 고려한 공정 파라미터 허용차 산출 : 망대 특성치의 경우)

  • Kim, Yong-Jun;Kim, Geun-Sik;Park, Hyung-Geun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.2
    • /
    • pp.129-136
    • /
    • 2017
  • Among the information technology and automation that have rapidly developed in the manufacturing industries recently, tens of thousands of quality variables are estimated and categorized in database every day. The former existing statistical methods, or variable selection and interpretation by experts, place limits on proper judgment. Accordingly, various data mining methods, including decision tree analysis, have been developed in recent years. Cart and C5.0 are representative algorithms for decision tree analysis, but these algorithms have limits in defining the tolerance of continuous explanatory variables. Also, target variables are restricted by the information that indicates only the quality of the products like the rate of defective products. Therefore it is essential to develop an algorithm that improves upon Cart and C5.0 and allows access to new quality information such as loss cost. In this study, a new algorithm was developed not only to find the major variables which minimize the target variable, loss cost, but also to overcome the limits of Cart and C5.0. The new algorithm is one that defines tolerance of variables systematically by adopting 3 categories of the continuous explanatory variables. The characteristics of larger-the-better was presumed in the environment of programming R to compare the performance among the new algorithm and existing ones, and 10 simulations were performed with 1,000 data sets for each variable. The performance of the new algorithm was verified through a mean test of loss cost. As a result of the verification show, the new algorithm found that the tolerance of continuous explanatory variables lowered loss cost more than existing ones in the larger is better characteristics. In a conclusion, the new algorithm could be used to find the tolerance of continuous explanatory variables to minimize the loss in the process taking into account the loss cost of the products.

Industrial Waste Database Analysis Using Data Mining Techniques

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.455-465
    • /
    • 2006
  • Data mining is the method to find useful information for large amounts of data in database. It is used to find hidden knowledge by massive data, unexpectedly pattern, and relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. We analyze industrial waste database using data mining technique. We use k-means algorithm for clustering and C5.0 algorithm for decision tree and Apriori algorithm for association rule. We can use these outputs for environmental preservation and environmental improvement.

  • PDF

Industrial Waste Database Analysis Using Data Mining

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.04a
    • /
    • pp.241-251
    • /
    • 2006
  • Data mining is the method to find useful information for large amounts of data in database It is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. We analyze industrial waste database using data mining technique. We use k-means algorithm for clustering and C5.0 algorithm for decision tree and Apriori algorithm for association rule. We can use these analysis outputs for environmental preservation and environmental improvement.

  • PDF

A Study on Factors of Education's Outcome using Decision Trees (의사결정트리를 이용한 교육성과 요인에 관한 연구)

  • Kim, Wan-Seop
    • Journal of Engineering Education Research
    • /
    • v.13 no.4
    • /
    • pp.51-59
    • /
    • 2010
  • In order to manage the lectures efficiently in the university and improve the educational outcome, the process is needed that make diagnosis of the present educational outcome of each classes on a lecture and find factors of educational outcome. In most studies for finding the factors of the efficient lecture, statistical methods such as association analysis, regression analysis are used usually, and recently decision tree analysis is employed, too. The decision tree analysis have the merits that is easy to understand a result model, and to be easy to apply for the decision making, but have the weaknesses that is not strong for characteristic of input data such as multicollinearity. This paper indicates the weaknesses of decision tree analysis, and suggests the experimental solution using multiple decision tree algorithm to supplement these problems. The experimental result shows that the suggested method is more effective in finding the reliable factors of the educational outcome.

  • PDF

A Study on Stage Classification of Eight Constitution Questionnaire (팔체질 진단을 위한 단계별 설문지 개발 연구)

  • Lee, Joo-Ho;Kim, Min-Yong;Kim, Hee-Ju;Shin, Young-Sup;Oh, Hwan-Sup;Park, Young-Bae;Park, Young-Jae
    • The Journal of the Society of Korean Medicine Diagnostics
    • /
    • v.16 no.2
    • /
    • pp.59-70
    • /
    • 2012
  • Objectives : Pulse diagnosis by Expert is the only way to classify 8 Constitutions so the study to supplement classifying method by the questionnaire has developed and modified and ECM-32 System has designed in 2010. But analyzing with Decision tree had many nodes and 32 important questions omitted while processing the data. So this study was to classify the 8 constitution patients into 2 groups first and analyze its characters in consecutive order. Methods : The participants of this study were 1027 patients who classified into one of the 8 constitutions according to pulse diagnosis and answered 251 questionnaires in 2010. They were divided into sympathetic nerve acceleration constitution and parasympathetic nerve acceleration constitution and analyzed with decision tree. Results : The reponses of the questionnaire were analyzed with 4 methods of 5 scales interval method from 0 to 5, Na, Low(1,2), Medium(3), High(4,5), average value, Y/N dichotomy. Average Value had no significance. 1. From the 5 scale interval method 6 questionnaires with 7 nodes (F5e, B1d, F7f, F2a, F1b, C4L) were significant. The accuracy was 92.5%. 2. From L, M, H method 7 questionnaires with 7 nodes(F5e, B1d, F7f, F1a, B1c, C4L, P3d) were significant. The accuracy was 92.5%. 3. From Y/N dichotomy 9 questionnaires with 9 nodes( F5e, B1d, F7f, F1a, B1c, C4L, B1b, P1i, B2a) were significant. The accuracy was 93.18%. Conclusions : Based on this study, Yes or No dichotomy method was most significant and categorized among the 4 methods. Unlike previous studies which used interval scale method only, Y/N dichotomy method was more statistically significant with the questionnaire to supplement the method of pulse diagnosis. For further study by analyzing decision tree method in consecutive order, the patients can be divided into 8 Constitutions with higher significance with less questionnaires.

Analysis of Factors for Seasonal Meat Color Characteristics in Hanwoo(Korean Cattle) Beef using Decision Tree Method (의사결정나무분석기법을 이용한 계절별 한우육의 육색 특성에 미치는 요인분석)

  • Kim, Seok-Jung;Kim, Yong-Sun;Song, Young-Han;Lee, Sung-Ki
    • Journal of Animal Science and Technology
    • /
    • v.44 no.5
    • /
    • pp.607-616
    • /
    • 2002
  • This study analyzed the effects of pH, sex, backfat thickness, ribeye area, cold carcass weight, shipping month, muscle internal temperature, average daily temperature, and average relative humidity for slaughtered Hanwoo to meat color by season. The analyses focused on interaction and each effect to meat color of the factors. For the result for analysis of multiple linear regressions, meat color values were decreased as pH increased in all meat color, and the meat color values increased as the backfat thickness was increased. As the results of the decision tree analysis by each factor, cow and steer slaughtered in spring and autumn were the highest in the lightness(L*). The redness(a*) was the cases that pH was less than 5.63 and average relative humidity was over than 71.5% for Hanwoo slaughtered in autumn. The chroma(C*) value was the highest for Hanwoo that was slaughtered in summer and autumn, the pH was less than 5.60, and the back fat thickness was over than 8 mm. The hue angle($h^0$) was shown that the muscle internal temperature was less than 4.7$^{\circ}C$ among Hanwoo which was slaughtered in spring, summer, and autumn, the pH was less than 5.66, and the back fat thickness was over than 8 mm.

Development of Decision Tree Software and Protein Profiling using Surface Enhanced laser Desorption/lonization - Time of Flight - Mass Spectrometry (SELDI-TOF-MS) in Papillary Thyroid Cancer (의사결정트리 프로그램 개발 및 갑상선유두암에서 질량분석법을 이용한 단백질 패턴 분석)

  • Yoon, Joon-Kee;Lee, Jun;An, Young-Sil;Park, Bok-Nam;Yoon, Seok-Nam
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.41 no.4
    • /
    • pp.299-308
    • /
    • 2007
  • Purpose: The aim of this study was to develop a bioinformatics software and to test it in serum samples of papillary thyroid cancer using mass spectrometry (SELDI-TOF-MS). Materials and Methods: Development of 'Protein analysis' software performing decision tree analysis was done by customizing C4.5. Sixty-one serum samples from 27 papillary thyroid cancer, 17 autoimmune thyroiditis, 17 controls were applied to 2 types of protein chips, CM10 (weak cation exchange) and IMAC3 (metal binding - Cu). Mass spectrometry was performed to reveal the protein expression profiles. Decision trees were generated using 'Protein analysis' software, and automatically detected biomarker candidates. Validation analysis was performed for CM10 chip by random sampling. Results: Decision tree software, which can perform training and validation from profiling data, was developed. For CM10 and IMAC3 chips, 23 of 113 and 8 of 41 protein peaks were significantly different among 3 groups (p<0.05), respectively. Decision tree correctly classified 3 groups with an error rate of 3.3% for CM10 and 2.0% for IMAC3, and 4 and 7 biomarker candidates were detected respectively. In 2 group comparisons, all cancer samples were correctly discriminated from non-cancer samples (error rate = 0%) for CM10 by single node and for IMAC3 by multiple nodes. Validation results from 5 test sets revealed SELDI-TOF-MS and decision tree correctly differentiated cancers from non-cancers (54/55, 98%), while predictability was moderate in 3 group classification (36/55, 65%). Conclusion: Our in-house software was able to successfully build decision trees and detect biomarker candidates, therefore it could be useful for biomarker discovery and clinical follow up of papillary thyroid cancer.

Prediction of Slope Hazard Probability around Express Way using Decision Tree Model (의사결정나무모형을 이용한 고속도로 주변 급경사지재해 발생가능성 예측)

  • Kim, Chan-Kee;Bak, Gueon Jun;Kim, Joong Chul;Song, Young-Suk;Yun, Jung-Mann
    • Journal of the Korean Geosynthetics Society
    • /
    • v.12 no.2
    • /
    • pp.67-74
    • /
    • 2013
  • In this study, the prediction of slope hazard probability was performed to the study area located in Hadae-ri, Woochun-myeon, Hoengsung-gun, Gangwon Province around Youngdong express way using the computer program SHAPP ver 1.0 developed by a decision tree model. The soil samples were collected at total 10 points, and soil tests were performed to measure soil properties. The thematic maps of soil properties such as coefficient of permeability and void ratio were made on the basis of soil test results. The slope angle analysis of topography was performed using a digital map. As the prediction result of slope hazard probability, 2,120 cells among total 27,776 cells were predicted to be in the event of slope hazards. Therefore, the predicted area of occurring slope hazards may be $53,000m^2$ because the analyzed cell size was $5m{\times}5m$.

A Study on the Big Data Analysis and Predictive Models for Quality Issues in Defense C5ISR (국방 C5ISR 분야 품질문제의 빅데이터 분석 및 예측 모델에 대한 연구)

  • Hyoung Jo Huh;Sujin Ko;Seung Hyun Baek
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.4
    • /
    • pp.551-571
    • /
    • 2023
  • Purpose: The purpose of this study is to propose useful suggestions by analyzing the causal effect relationship between the failure rate of quality and the process variables in the C5ISR domain of the defense industry. Methods: The collected data through the in house Systems were analyzed using Big data analysis. Data analysis between quality data and A/S history data was conducted using the CRISP-DM(Cross-Industry Standard Process for Data Mining) analysis process. Results: The results of this study are as follows: After evaluating the performance of candidate models for the influence of inspection data and A/S history data, logistic regression was selected as the final model because it performed relatively well compared to the decision tree with an accuracy of 82%/67% and an AUC of 0.66/0.57. Based on this model, we estimated the coefficients using 'R', a data analysis tool, and found that a specific variable(continuous maximum discharge current time) had a statistically significant effect on the A/S quality failure rate and it was analysed that 82% of the failure rate could be predicted. Conclusion: As the first case of applying big data analysis to quality issues in the defense industry, this study confirms that it is possible to improve the market failure rates of defense products by focusing on the measured values of the main causes of failures derived through the big data analysis process, and identifies improvements, such as the number of data samples and data collection limitations, to be addressed in subsequent studies for a more reliable analysis model.