• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.03 seconds

A study on analysis of factors on in-hospital mortality for community-acquired pneumonia (지역사회획득 폐렴 환자의 퇴원시 사망 요인 분석)

  • Kim, Yoo-Mi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.389-400
    • /
    • 2011
  • This study was carried out to analysis factors related to in-hospital mortality of community-acquired peumonia using administrative database. The subjects were 5,353 community-acquired pneumonia inpatients of the Korean National Hospital Discharge Injury Survey 2004-2006 data. The data were analyzed using chi-squared test and decision tree model in the data mining technique. Among the decision tree model, C4.5 had the best performance. The critical factors on in-hospital mortality of communityacquired pneumonia are admission route, respiratory failure, congenital heart failure including age, comorbidity, and bed size. This study was carried out using the administrative database including patients' characteristics and comorbidity. However further study should be extensively including hospital characteristics, regional medical resources, and patient management practice behavior.

The Life Satisfaction Analysis of Middle School Students Using Korean Children and Youth Panel Survey Data (한국아동·청소년패널조사 데이터를 이용한 중학생 삶의 만족도 분석)

  • An, Ji-Hye;Yun, You-Dong;Lim, Heui-Seok
    • Journal of Digital Convergence
    • /
    • v.14 no.2
    • /
    • pp.197-208
    • /
    • 2016
  • In this paper, data mining regression analysis and decision tree analysis techniques were used to analyze factors affecting the life satisfaction of middle school students. For this purpose, we analyzed Korean Children and Youth Panel Survey(KCYPS) data. As results, the common influencing factors to the life satisfaction were derived from regression analysis. Those factors are self-esteem, depression, total grade satisfaction, regional community awareness, career identity, annual delinquency damage experience, siblings' factors, trust, behavioral control, and concentration. Based on the result described by decision tree analysis, the factors that indicate a significant impact on the life satisfaction of middle school students were self-esteem, depression, career identity and attention factor.

Exploration of Optimal Product Innovation Strategy Using Decision Tree Analysis: A Data-mining Approach

  • Cho, Insu
    • STI Policy Review
    • /
    • v.8 no.2
    • /
    • pp.75-93
    • /
    • 2017
  • Recently, global competition in the manufacturing sector is driving firms in the manufacturing sector to conduct product innovation projects to maintain their competitive edge. The key points of product innovation projects are 1) what the purpose of the project is and 2) what expected results in the target market can be achieved by implementing the innovation. Therefore, this study focuses on the performance of innovation projects with a business viewpoint. In this respect, this study proposes the "achievement rate" of product innovation projects as a measurement of project performance. Then, this study finds the best strategies from various innovation activities to optimize the achievement rate of product innovation projects. There are three major innovation activities for the projects, including three types of R&D activities: Internal, joint and external R&D, and five types of non-R&D activities - acquisition of machines, equipment and software, purchasing external knowledge, job education and training, market research and design. This study applies decision tree modeling, a kind of data-mining methodology, to explore effective innovation activities. This study employs the data from the 'Korean Innovation Survey (KIS) 2014: Manufacturing Sector.' The KIS 2014 gathered information about innovation activities in the manufacturing sector over three years (2011-2013). This study gives some practical implication for managing the activities. First, innovation activities that increased the achievement rate of product diversification projects included a combination of market research, new product design, and job training. Second, our results show that a combination of internal R&D, job training and training, and market research increases the project achievement most for the replacement of outdated products. Third, new market creation or extension of market share indicates that launching replacement products and continuously upgrading products are most important.

A Study on the Effective Selection of Tunnel Reinforcement Methods using Decision Tree Technique (의사결정트리 기법을 이용한 터널 보조공법 선정방안 연구)

  • Kim, Jong-Gyu;Sagong, Myung;Lee, Jun S.;Lee, Yong-Joo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.4C
    • /
    • pp.255-264
    • /
    • 2006
  • The auxiliary reinforcement method is normally applied to prevent a possible collapse of the tunnel face where the ground condition is not favorable or geologic information is not sufficient. Recently, several engineering approaches have been made to choose the effective reinforcement methods using expert system such as neural network and fuzzy theory field, among others. Even if the expert system has offered many decision aid tools to properly select the reinforcement method, the quantitative assessment items are not easy to estimate and this is why the data mining technique, widely used in the field of social science, medical treatment, banking and agriculture, is introduced in this study. Using decision tree together with PDA, the decision aids for reinforcement method based on field construction data are created to derive the field rules and future study will be concentrated on the application of the proposed methods in a variety of underground development cases.

Genetic Programming with Weighted Linear Associative Memories and its Application to Engineering Problems (가중 선형 연상기억을 채용한 유전적 프로그래밍과 그 공학적 응용)

  • 연윤석
    • Korean Journal of Computational Design and Engineering
    • /
    • v.3 no.1
    • /
    • pp.57-67
    • /
    • 1998
  • Genetic programming (GP) is an extension of a genetic algoriths paradigm, deals with tree structures representing computer programs as individuals. In recent, there have been many research activities on applications of GP to various engineering problems including system identification, data mining, function approximation, and so forth. However, standard GP suffers from the lack of the estimation techniques for numerical parameters of the GP tree that is an essential element in treating various engineering applications involving real-valued function approximations. Unlike the other research activities, where nonlinear optimization methods are employed, I adopt the use of a weighted linear associative memory for estimation of these parameters under GP algorithm. This approach can significantly reduce computational cost while the reasonable accurate value for parameters can be obtained. Due to the fact that the GP algorithm is likely to fall into a local minimum, the GP algorithm often fails to generate the tree with the desired accuracy. This motivates to devise a group of additive genetic programming trees (GAGPT) which consists of a primary tree and a set of auxiliary trees. The output of the GAGPT is the summation of outputs of the primary tree and all auxiliary trees. The addition of auxiliary trees makes it possible to improve both the teaming and generalization capability of the GAGPT, since the auxiliary tree evolves toward refining the quality of the GAGPT by optimizing its fitness function. The effectiveness of this approach is verified by applying the GAGPT to the estimation of the principal dimensions of bulk cargo ships and engine torque of the passenger car.

  • PDF

A Comparative Study of Predictive Factors for Passing the National Physical Therapy Examination using Logistic Regression Analysis and Decision Tree Analysis

  • Kim, So Hyun;Cho, Sung Hyoun
    • Physical Therapy Rehabilitation Science
    • /
    • v.11 no.3
    • /
    • pp.285-295
    • /
    • 2022
  • Objective: The purpose of this study is to use logistic regression and decision tree analysis to identify the factors that affect the success or failurein the national physical therapy examination; and to build and compare predictive models. Design: Secondary data analysis study Methods: We analyzed 76,727 subjects from the physical therapy national examination data provided by the Korea Health Personnel Licensing Examination Institute. The target variable was pass or fail, and the input variables were gender, age, graduation status, and examination area. Frequency analysis, chi-square test, binary logistic regression, and decision tree analysis were performed on the data. Results: In the logistic regression analysis, subjects in their 20s (Odds ratio, OR=1, reference), expected to graduate (OR=13.616, p<0.001) and from the examination area of Jeju-do (OR=3.135, p<0.001), had a high probability of passing. In the decision tree, the predictive factors for passing result had the greatest influence in the order of graduation status (x2=12366.843, p<0.001) and examination area (x2=312.446, p<0.001). Logistic regression analysis showed a specificity of 39.6% and sensitivity of 95.5%; while decision tree analysis showed a specificity of 45.8% and sensitivity of 94.7%. In classification accuracy, logistic regression and decision tree analysis showed 87.6% and 88.0% prediction, respectively. Conclusions: Both logistic regression and decision tree analysis were adequate to explain the predictive model. Additionally, whether actual test takers passed the national physical therapy examination could be determined, by applying the constructed prediction model and prediction rate.

A Comparative Study of Predictive Factors for Hypertension using Logistic Regression Analysis and Decision Tree Analysis

  • SoHyun Kim;SungHyoun Cho
    • Physical Therapy Rehabilitation Science
    • /
    • v.12 no.2
    • /
    • pp.80-91
    • /
    • 2023
  • Objective: The purpose of this study is to identify factors that affect the incidence of hypertension using logistic regression and decision tree analysis, and to build and compare predictive models. Design: Secondary data analysis study Methods: We analyzed 9,859 subjects from the Korean health panel annual 2019 data provided by the Korea Institute for Health and Social Affairs and National Health Insurance Service. Frequency analysis, chi-square test, binary logistic regression, and decision tree analysis were performed on the data. Results: In logistic regression analysis, those who were 60 years of age or older (Odds ratio, OR=68.801, p<0.001), those who were divorced/widowhood/separated (OR=1.377, p<0.001), those who graduated from middle school or younger (OR=1, reference), those who did not walk at all (OR=1, reference), those who were obese (OR=5.109, p<0.001), and those who had poor subjective health status (OR=2.163, p<0.001) were more likely to develop hypertension. In the decision tree, those over 60 years of age, overweight or obese, and those who graduated from middle school or younger had the highest probability of developing hypertension at 83.3%. Logistic regression analysis showed a specificity of 85.3% and sensitivity of 47.9%; while decision tree analysis showed a specificity of 81.9% and sensitivity of 52.9%. In classification accuracy, logistic regression and decision tree analysis showed 73.6% and 72.6% prediction, respectively. Conclusions: Both logistic regression and decision tree analysis were adequate to explain the predictive model. It is thought that both analysis methods can be used as useful data for constructing a predictive model for hypertension.

A comparison of three design tree based search algorithms for the detection of engineering parts constructed with CATIA V5 in large databases

  • Roj, Robin
    • Journal of Computational Design and Engineering
    • /
    • v.1 no.3
    • /
    • pp.161-172
    • /
    • 2014
  • This paper presents three different search engines for the detection of CAD-parts in large databases. The analysis of the contained information is performed by the export of the data that is stored in the structure trees of the CAD-models. A preparation program generates one XML-file for every model, which in addition to including the data of the structure tree, also owns certain physical properties of each part. The first search engine is specializes in the discovery of standard parts, like screws or washers. The second program uses certain user input as search parameters, and therefore has the ability to perform personalized queries. The third one compares one given reference part with all parts in the database, and locates files that are identical, or similar to, the reference part. All approaches run automatically, and have the analysis of the structure tree in common. Files constructed with CATIA V5, and search engines written with Python have been used for the implementation. The paper also includes a short comparison of the advantages and disadvantages of each program, as well as a performance test.

CANCER CLASSIFICATION AND PREDICTION USING MULTIVARIATE ANALYSIS

  • Shon, Ho-Sun;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.706-709
    • /
    • 2006
  • Cancer is one of the major causes of death; however, the survival rate can be increased if discovered at an early stage for timely treatment. According to the statistics of the World Health Organization of 2002, breast cancer was the most prevalent cancer for all cancers occurring in women worldwide, and it account for 16.8% of entire cancers inflicting Korean women today. In order to classify the type of breast cancer whether it is benign or malignant, this study was conducted with the use of the discriminant analysis and the decision tree of data mining with the breast cancer data disclosed on the web. The discriminant analysis is a statistical method to seek certain discriminant criteria and discriminant function to separate the population groups on the basis of observation values obtained from two or more population groups, and use the values obtained to allow the existing observation value to the population group thereto. The decision tree analyzes the record of data collected in the part to show it with the pattern existing in between them, namely, the combination of attribute for the characteristics of each class and make the classification model tree. Through this type of analysis, it may obtain the systematic information on the factors that cause the breast cancer in advance and prevent the risk of recurrence after the surgery.

  • PDF

A Study on Occupancy Estimation Method of a Private Room Using IoT Sensor Data Based Decision Tree Algorithm (IoT 센서 데이터를 이용한 단위실의 재실추정을 위한 Decision Tree 알고리즘 성능분석)

  • Kim, Seok-Ho;Seo, Dong-Hyun
    • Journal of the Korean Solar Energy Society
    • /
    • v.37 no.2
    • /
    • pp.23-33
    • /
    • 2017
  • Accurate prediction of stochastic behavior of occupants is a well known problem for improving prediction performance of building energy use. Many researchers have been tried various sensors that have information on the status of occupant such as $CO_2$ sensor, infrared motion detector, RFID etc. to predict occupants, while others have been developed some algorithm to find occupancy probability with those sensors or some indirect monitoring data such as energy consumption in spaces. In this research, various sensor data and energy consumption data are utilized for decision tree algorithms (C4.5 & CART) for estimation of sub-hourly occupancy status. Although the experiment is limited by space (private room) and period (cooling season), the prediction result shows good agreement of above 95% accuracy when energy consumption data are used instead of measured $CO_2$ value. This result indicates potential of IoT data for awareness of indoor environmental status.