• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.025 seconds

Neural Tree Classifier based on LVQ for Data Mining (데이터 마이닝을 위한 LVQ 기반 신경 트리 분류기)

  • 김세현;김은주;이일병
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.157-159
    • /
    • 2001
  • 신경 트리는 신경망과 결정 트리의 구조를 결합한 형태의 분류기로서 비선형적 결정 경계 형성이 가능하며 기존 신경망에 비해 학습, 출력시 계산량이 적다는 장점을 갖는다. 본 논문에서는 신경 트리의 노드를 구성하는 신경망을 학습하기 위하여 기존의 방법들과는 달리 교사 학습 방법인 LVQ3 알고리즘을 사용하는 신경 트리 분류기를 제안한다. 학습 과정을 통해 생성된 트리는 오인식율 추정을 이용한 가지치기를 통하여 효율적인 트리로 재구성된다. 제안하는 방법은 실제 데이터 집합들을 이용한 실험을 통하여 그 성능을 검증하였다.

  • PDF

The Development of Data Mining Solution based on Web (웹 기반의 데이터 마이닝 솔루션 개발에 대하여)

  • 구자용;박헌진;최대우
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.11a
    • /
    • pp.301-306
    • /
    • 2000
  • 최근 데이터 웨어하우징의 활발한 구축과 우수고객 확보를 위한 치열한 경쟁으로 데이터 마이닝은 많은 업체의 큰 관심을 끌고있다. 본 연구는 풍부한 알고리즘과 과학적 그래프를 제공하여 사용자로 하여금 최상의 데이터 마이닝 효과를 거둘 수 있도록 Statserver를 핵심 엔진으로 사용한 인터넷 기반의 데이터 마이닝 솔루션 개발에 관한 편이다

  • PDF

The Factors that Affects the Employment Type of The Graduates by Data-mining Approach (데이터마이닝 기법을 활용한 대졸자 고용에 미치는 영향요인 분석)

  • Kim, Hyoung-Rae;Jeon, Do-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.7
    • /
    • pp.167-174
    • /
    • 2012
  • Data mining technique can be adapted to analysing Employment information in order to discover valuable information out of large data. As the issue employment such as jobless of college graduate, recruitment for women, recruitment for elders etc. became social problem, there are many efforts of various public employment services and studies. The factors that affects the college graduate's employment type (regular, temporary, daily) can be used to guide employment and to prepare employment for college students. In analyzing large number of attributes and the huge amount of data elements, regular statistical methods faces their limitation; therefore, data-mining technique is more suitable for the dataset of about 170 attributes and 20,000 elements. We divide the factors that may affect the employment type into personal factor, school factor, company factor, and experience factor; decision tree algorithm is used to find out the interesting relationship between the attributes of the factors and employment type. Personal factors such as the income of parents and marital status were the most affective factors to the employment type. The learned decision tree was able to classify the employment type with 87% of accuracy. We also assume the level of the school affects the employment type of the graduates.

Border-based HSFI Algorithm for Hiding Sensitive Frequent Itemsets (민감한 빈발항목집합을 숨기기 위한 경계기반 HSFI 알고리즘)

  • Lee, Dan-Young;An, Hyoung-Keun;Koh, Jae-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1323-1334
    • /
    • 2011
  • This paper suggests the border based HSFI algorithm to hide sensitive frequent itemsets. Node formation of FP-Tree which is different from the previous one uses the border to minimize the impacts of nonsensitive frequent itemsets in hiding process, including the organization of sensitive and border information, and all transaction as well. As a result of applying HSFI algorithms, it is possible to be the example transaction database, by significantly reducing the lost items, it turns out that HSFI algorithm is more effective than the existing algorithm for maintaining the quality of more improved database.

An Analysis of the Characteristics of Companies introducing Smart Factory System Using Data Mining Technique (데이터 마이닝 기법을 활용한 스마트팩토리 도입 기업의 특성 분석)

  • Oh, Jeong-yoon;Choi, Sang-hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.5
    • /
    • pp.179-189
    • /
    • 2018
  • Currently, research on smart factories is steadily being carried out in terms of implementation strategies and considerations in construction. Various studies have not been conducted on companies that introduced smart factories. This study conducted a questionnaire survey for SMEs applying the basic stage of smart factory. And the cluster analysis was conducted to examine the characteristics of the company. In addition, we conducted Decision Tree and Naive Bay to examine how the characteristics of a company are derived and compare the results. As a result of the cluster analysis, it was confirmed that the group was divided into the high satisfaction group and the low satisfaction group. The decision tree and the Naive Bay analysis showed that the higher satisfaction group has high productivity.

A Feature Analysis of Industrial Accidents Using C4.5 Algorithm (C4.5 알고리즘을 이용한 산업 재해의 특성 분석)

  • Leem, Young-Moon;Kwag, Jun-Koo;Hwang, Young-Seob
    • Journal of the Korean Society of Safety
    • /
    • v.20 no.4 s.72
    • /
    • pp.130-137
    • /
    • 2005
  • Decision tree algorithm is one of the data mining techniques, which conducts grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on groups and can be used to detect differences in the type of industrial accidents. This paper uses C4.5 algorithm for the feature analysis. The data set consists of 24,887 features through data selection from total data of 25,159 taken from 2 year observation of industrial accidents in Korea For the purpose of this paper, one target value and eight independent variables are detailed by type of industrial accidents. There are 222 total tree nodes and 151 leaf nodes after grouping. This paper Provides an acceptable level of accuracy(%) and error rate(%) in order to measure tree accuracy about created trees. The objective of this paper is to analyze the efficiency of the C4.5 algorithm to classify types of industrial accidents data and thereby identify potential weak points in disaster risk grouping.

Core Keywords Extraction forEvaluating Online Consumer Reviews Using a Decision Tree: Focusing on Star Ratings and Helpfulness Votes (의사결정나무를 활용한 온라인 소비자 리뷰 평가에 영향을 주는 핵심 키워드 도출 연구: 별점과 좋아요를 중심으로)

  • Min, Kyeong Su;Yoo, Dong Hee
    • The Journal of Information Systems
    • /
    • v.32 no.3
    • /
    • pp.133-150
    • /
    • 2023
  • Purpose This study aims to develop classification models using a decision tree algorithm to identify core keywords and rules influencing online consumer review evaluations for the robot vacuum cleaner on Amazon.com. The difference from previous studies is that we analyze core keywords that affect the evaluation results by dividing the subjects that evaluate online consumer reviews into self-evaluation (star ratings) and peer evaluation (helpfulness votes). We investigate whether the core keywords influencing star ratings and helpfulness votes vary across different products and whether there is a similarity in the core keywords related to star ratings or helpfulness votes across all products. Design/methodology/approach We used random under-sampling to balance the dataset. We progressively removed independent variables based on decreasing importance through backwards elimination to evaluate the classification model's performance. As a result, we identified classification models that best predict star ratings and helpfulness votes for each product's online consumer reviews. Findings We have identified that the core keywords influencing self-evaluation and peer evaluation vary across different products, and even for the same model or features, the core keywords are not consistent. Therefore, companies' producers and marketing managers need to analyze the core keywords of each product to highlight the advantages and prepare customized strategies that compensate for the shortcomings.

An Analysis of the Determinants of Government-Funded Defense Companies using a Decision Tree (의사결정나무를 활용한 방산육성지원 수혜기업 결정요인 분석)

  • Gowoon Jeon;Seulah Baek;Jeonghwan Jeon;Donghee Yoo
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.1
    • /
    • pp.80-93
    • /
    • 2024
  • This study attempted to analyze the factors that influence the participation of beneficiary companies in the government's defense industry promotion support project. To this end, experimental data were analyzed by constructing a prediction model consisting of highly important variables in beneficiary company decisions among various company information using the decision tree model, one of the data mining techniques. In addition, various rules were derived to determine the beneficiary companies of the government's support project using the analysis results expressed as decision trees. Three policy measures were presented based on the important rules that repeatedly appear in different predictive models to increase the effect of the government's industrial development. Using the analysis methods presented in this study and the determinants of the beneficiary companies of the government support project will help create a sustainable future defense industry growth environment.

Estimation of a Nationwide Statistics of Hernia Operation Applying Data Mining Technique to the National Health Insurance Database (데이터마이닝 기법을 이용한 건강보험공단의 수술 통계량 근사치 추정 -허니아 수술을 중심으로-)

  • Kang, Sung-Hong;Seo, Seok-Kyung;Yang, Yeong-Ja;Lee, Ae-Kyung;Bae, Jong-Myon
    • Journal of Preventive Medicine and Public Health
    • /
    • v.39 no.5
    • /
    • pp.433-437
    • /
    • 2006
  • Objectives: The aim of this study is to develop a methodology for estimating a nationwide statistic for hernia operations with using the claim database of the Korea Health Insurance Cooperation (KHIC). Methods: According to the insurance claim procedures, the claim database was divided into the electronic data interchange database (EDI_DB) and the sheet database (Paper_DB). Although the EDI_DB has operation and management codes showing the facts and kinds of operations, the Paper_DB doesn't. Using the hernia matched management code in the EDI_DB, the cases of hernia surgery were extracted. For drawing the potential cases from the Paper_DB, which doesn't have the code, the predictive model was developed using the data mining technique called SEMMA. The claim sheets of the cases that showed a predictive probability of an operation over the threshold, as was decided by the ROC curve, were identified in order to get the positive predictive value as an index of usefulness for the predictive model. Results: Of the claim databases in 2004, 14,386 cases had hernia related management codes with using the EDI system. For fitting the models with applying the data mining technique, logistic regression was chosen rather than the neural network method or the decision tree method. From the Paper_DB, 1,019 cases were extracted as potential cases. Direct review of the sheets of the extracted cases showed that the positive predictive value was 95.3%. Conclusions: The results suggested that applying the data mining technique to the claim database in the KHIC for estimating the nationwide surgical statistics would be useful from the aspect of execution and cost-effectiveness.

Identification of major risk factors association with respiratory diseases by data mining (데이터마이닝 모형을 활용한 호흡기질환의 주요인 선별)

  • Lee, Jea-Young;Kim, Hyun-Ji
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.373-384
    • /
    • 2014
  • Data mining is to clarify pattern or correlation of mass data of complicated structure and to predict the diverse outcomes. This technique is used in the fields of finance, telecommunication, circulation, medicine and so on. In this paper, we selected risk factors of respiratory diseases in the field of medicine. The data we used was divided into respiratory diseases group and health group from the Gyeongsangbuk-do database of Community Health Survey conducted in 2012. In order to select major risk factors, we applied data mining techniques such as neural network, logistic regression, Bayesian network, C5.0 and CART. We divided total data into training and testing data, and applied model which was designed by training data to testing data. By the comparison of prediction accuracy, CART was identified as best model. Depression, smoking and stress were proved as the major risk factors of respiratory disease.