• Title/Summary/Keyword: data mining(CHAID)

Search Result 32, Processing Time 0.026 seconds

A Study on Variable Selection Bias in Data Mining Software Packages (데이터마이닝 패키지에서 변수선택 편의에 관한 연구)

  • 송문섭;윤영주
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.2
    • /
    • pp.475-486
    • /
    • 2001
  • 데이터마이닝 패키지에 구현된 분류나무 알고리즘 가운데 CART, CHAID, QUEST, C4.5에서 변수 선택법을 비교하였다. CART의 전체탐색법이 편의를 갖는다는 사실은 잘알려졌으며, 여기서는 상품화된 패키지들에서 이들 알고리즘의 편의와 선택력을 모의실험 연구를 통하여 비교하였다. 상용 패키지로는 CART, Enterprise Miner, AnswerTree, Clementine을 사용하였다. 본 논문의 제한된 모의실험 연구 결과에 의하면 C4.5와 CART는 모두 변수선택에서 심각한 편의를 갖고 있으며, CHAID와 QUEST는 비교적 안정된 결과를 보여주고 있었다.

  • PDF

On the Determination of Outpatient's Revisit using Data Mining (데이터 마이닝을 활용한 병원 재방문도 영향요인 분석 : 외래환자의 만족도를 중심으로)

  • 이견직
    • Health Policy and Management
    • /
    • v.13 no.3
    • /
    • pp.21-34
    • /
    • 2003
  • Patient revisit to used hospital is a key factor in determining a health care organization's competitive advantage and survival. This article examines the relationship between customer's satisfaction and his/her revisit associated with three different methods which are the Chi Square Automatic Interaction Detection(CHAID) for segmenting the outpatient group, logistic regression and neural networks for addressing the outpatient's revisit. The main findings indicate that the important factors on outpatient's revisit are physician's kindness, nurse's skill, overall level of satisfaction, hospital reputation, recommendation, level of diagnoses and outpatient's age. Among these ones, physician's kindness is the most important factor as guidelines for decision of their revisit. The decision maker of hospital should select the strategy containing the variable amount of the level of revisit and size of outpatient's group under the constraint on the hospital's time, budget and manpower given. Finally, this study shows that neural networks, as non-parametric technique, appear to more correctly predict revisit than does logistic regression as a parametric estimation technique.

Development of an Expert System for Prevention of Industrial Accidents in Manufacturing Industries (제조업에서의 산업재해 예방을 위한 전문가 시스템 개발)

  • Leem Young-Moon;Choi Yo-Han
    • Journal of the Korea Safety Management & Science
    • /
    • v.8 no.1
    • /
    • pp.53-64
    • /
    • 2006
  • Many researches and analyses have been focused on industrial accidents in order to predict and reduce them. As a similar endeavor, this paper is to develop an expert system for prevention of industrial accidents. Although various previous studies have been performed to prevent industrial accidents, these studies only provide managerial and educational policies using frequency analysis and comparative analysis based on data from past industrial accidents. As an initial step for the purpose of this study, this paper provides a comparative analysis of 4 kinds of algorithms including CHAID, CART, C4.5, and QUEST. Decision tree algorithm is utilized to predict results using objective and quantified data as a typical technique of data mining. Enterprise Miner of SAS and Answer Tree of SPSS will be used to evaluate the validity of the results of the four algorithms. The sample for this work was chosen from 10,536 data related to manufacturing industries during three years$(2002\sim2004)$ in korea. The initial sample includes a range of different businesses including the construction and manufacturing industries, which are typically vulnerable to industrial accidents.

A Combinatorial Optimization for Influential Factor Analysis: a Case Study of Political Preference in Korea

  • Yun, Sung Bum;Yoon, Sanghyun;Heo, Joon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.5
    • /
    • pp.415-422
    • /
    • 2017
  • Finding influential factors from given clustering result is a typical data science problem. Genetic Algorithm based method is proposed to derive influential factors and its performance is compared with two conventional methods, Classification and Regression Tree (CART) and Chi-Squared Automatic Interaction Detection (CHAID), by using Dunn's index measure. To extract the influential factors of preference towards political parties in South Korea, the vote result of $18^{th}$ presidential election and 'Demographic', 'Health and Welfare', 'Economic' and 'Business' related data were used. Based on the analysis, reverse engineering was implemented. Implementation of reverse engineering based approach for influential factor analysis can provide new set of influential variables which can present new insight towards the data mining field.

Data Mining Approach to Clinical Decision Support System for Hypertension Management (고혈압관리를 위한 의사지원결정시스템의 데이터마이닝 접근)

  • 김태수;채영문;조승연;윤진희;김도마
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2002.11a
    • /
    • pp.203-212
    • /
    • 2002
  • This study examined the predictive power of data mining algorithms by comparing the performance of logistic regression and decision tree algorithm, called CHAID (Chi-squared Automatic Interaction Detection), On the contrary to the previous studies, decision tree performed better than logistic regression. We have also developed a CDSS (Clinical Decision Support System) with three modules (doctor, nurse, and patient) based on data warehouse architecture. Data warehouse collects and integrates relevant information from various databases from hospital information system (HIS ). This system can help improve decision making capability of doctors and improve accessibility of educational material for patients.

  • PDF

Analyzing vocational outcomes of people with hearing impairments : A data mining approach (청각장애인의 취업결정요인 분석 연구 -데이터마이닝 기법(Exhaustive CHAID)의 적용)

  • Shin, Hyun-Uk
    • Journal of Digital Convergence
    • /
    • v.13 no.11
    • /
    • pp.449-459
    • /
    • 2015
  • The purpose of this study was to examine demographic, human capital and service factors affecting employment outcomes of people with hearing impairments. The total of 422 individuals (age from 20 years to 65 years) with hearing impairments were collected from the Panel Survey of Employment for the Disabled from Korea Employment Agency for the Disabled. The dependent variable is employment outcomes. The predictor variables include a set of personal history, human capital and rehabilitation service variables. The chi-squared automatic interaction detector (CHAID) analysis revealed that the status of the national basic livelihood security played a determining role in predicting the employment of people with hearing impairments. Also, it was found that the three factors of the status on the national basic livelihood security, needed help about activities of dailey living, licenses & employment service factors created bigger synergy effect when they inter-complemented one another.

A Empirical Study on Influence of Safety on Elementary School Road Considering Commuting Distance & Mode Type (통학거리 및 수단특성을 반영한 초등학교 안전도 영향관계 실증연구)

  • Kim, Tae Ho;Kim, Seung Hyun;Lee, Soo Il
    • Journal of the Korean Society of Safety
    • /
    • v.30 no.6
    • /
    • pp.139-147
    • /
    • 2015
  • This study deals with actual commuting distance and influence of risk factors depending on commuting distance and mode in order to reestablish actual commuting zone of primary school students. Data mining analysis(CHAID) was applied for this reestablishment using survey results from 6,927 primary school students in Seoul Metro. Six risk factors; convenience level of commuting path condition, convenience level of road crossing condition, vehicle speed on commuting path, segregation level between commuter and vehicle, congestion level of commuting path, and public security level and two mode; walking and cycle are considered in the analysis. As the results of CHAID analysis, commuting distance was divided into four zones; Internal Zone(0.491km under), External Zone(0.492 ~ 1.492km, 1.493 ~ 2.699km), Commutable Zone(2.70km over), and awareness level on safety is declined as commuting distance is increased. The risk factor affecting on safety is recognized differently by students depending on commuting distance and mode. For students commuting by walking, vehicle speed on commuting path and convenience level of commuting path condition are recognized as the prime risk factor within Internal Zone and Commutable Zone, respectively. For students commuting by cycle, convenience level of road crossing condition and vehicle speed on commuting path are recognized as the prime risk factor within Commutable Zone. Analysis results show that improved plan and program for commuting path for primary school students are required considering actual commuting distance and method.

A Neural Network for Prediction and Sensitivity of Outpatients' Satisfaction (신경망모형을 이용한 외래환자 만족도예측 및 민감도분석)

  • Lee, Kyun-Jick;Chung, Young-Chul;Kim, Mi-Ra
    • Korea Journal of Hospital Management
    • /
    • v.8 no.1
    • /
    • pp.81-94
    • /
    • 2003
  • This paper aims at developing a prediction model and analyzing a sensitivity for the outpatient's overall satisfaction on utilizing hospital services by using data mining techniques within the context of customer satisfaction. From a total of 900 outpatient cases, 80 percent were randomly selected as the training group and the other 20 percent as the validation group. Cases in the training group were used in the development of the CHAID and Neural Networks. The validation group was used to test the performance of these models. The major findings may be summarized as follows: the CHAID provided six useful predictors - satisfaction with treatment level, satisfaction with healthcare facilities and equipments, satisfaction with registration service, awareness of hospital reputation, satisfaction with staffs courtesy and responsiveness, and satisfaction with nurses kindness. The prediction accuracy rates based on MLP (77.90%) is superior to RBF (76.80%).

  • PDF

Evaluation on Performance of Accuracy for Analysis and Classification of Data Related to Industrial Accidents (산업재해 데이터의 분석 및 분류를 위한 정확도 성능 평가)

  • Leem Young-Moon;Ryu Chang-Hyun
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2006.04a
    • /
    • pp.51-56
    • /
    • 2006
  • Recently data mining techniques have been used for analysis and classification of data related to industrial accidents. The main objective of this study is to compare performance of algorithms for data analysis of industrial accidents and this paper provides a comparative analysis of 5 kinds of algorithms including CHAID, CART, C4.5, LR (Logistic Regression) and NN (Neural Network) with ROC chart, lift chart and response threshold. In this study, data on 67,278 accidents were analyzed to create risk groups for a number of complications, including the risk of disease and accident. The sample for this work chosen from data related to manufacturing industries during three years $(2002\sim2004)$ in korea. According to the result analysis, NN has excellent performance for data analysis and classification of industrial accidents.

  • PDF

A Study on the Analysis Effect Factors of Illegal Parking Using Data Mining Techniques (데이터마이닝 기법을 활용한 불법주차 영향요인 분석)

  • Lee, Chang-Hee;Kim, Myung-Soo;Seo, So-Min
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.13 no.4
    • /
    • pp.63-72
    • /
    • 2014
  • With the rapid development in the economy and other fields as well, the standard of living in South Korea has been improved, and consequently, the demand of automobiles has quickly increased. It leads to various traffic issues such as traffic congestion, traffic accident, and parking problem. In particular, this illegal parking caused by the increase in the number of automobiles has been considered one of the main reasons to bring about traffic congestion as intensifying any dispute between neighbors in relation to a parking space, which has been also coming to the fore as a social issue. Therefore, this study looked into Daejeon Metropolitan City, the city that is understood to have the highest automobile sharing rate in South Korea but with relatively few cases of illegal parking crackdowns. In order to investigate the theoretical problems of the illegal parking, this study conducted a decision-making tree model-based Exhaustive CHAID analysis to figure out not only what makes drivers park illegally when they try to park vehicles but also those factors that would tempt the drivers into the illegal parking. The study, then, comes up with solutions to the problem. According to the analysis, in terms of the influential factors that encourage the drivers to park at some illegal areas, it was learned that these factors, the distance, a driver's experience of getting caught, the occupation and the use time in order, have an effect on the drivers' deciding to park illegally. After working on the prediction model, four nodes were finally extracted. Given the analysis result, as a solution to the illegal parking, it is necessary to establish public parking lots additionally and first secure the parking space for the vehicles used for living and working, and to activate the campaign for enhancing illegal parking crackdown and encouraging civic consciousness.