• Title/Summary/Keyword: decision tree regression

Search Result 324, Processing Time 0.03 seconds

독일가문비나무(Picea abies [L.] Karst)의 지상부(地上部)와 지하부(地下部) 생체량(生體量)에 관(關)한 연구(硏究) : 흉고직경(胸高直徑)에 의한 뿌리생체량(生體量) 추정(推定) (Relationship Between Above-and Below-Ground Biomass for Norway Spruce (Picea abies) : Estimating Root System Biomass from Breast Height Diameter)

  • 이도형
    • 한국산림과학회지
    • /
    • 제90권3호
    • /
    • pp.338-345
    • /
    • 2001
  • 본 연구는 독일가문비나무의 지하부인 뿌리구조와 지상부인 수관과의 상호관계를 밝혀, 어렵게 뿌리를 굴취하거나 침엽 생체량을 측정하지 않고도 수고와 흉고직경에 의한 상대적인 뿌리와 침엽 생체량 추정을 위한 회귀식을 구하기 위하여 실시하였다. 독일 중부 Harz 지방의 Barbis 임분에서 30~40년생의 우세목 5본과 준우세목 3본을 선발한 후 조사목의 지상부에 대하여 수고, 흉고직경, 지하고, 침엽량, 가지량, 횡단면, 변재면 등을 조사하였다. 그리고 5본에 대해서는 지하부의 뿌리길이, 뿌리수, 뿌리무게, 뿌리횡단면 등을 수평과 수직뿌리로 구분하여 조사하였다. 조사된 염분에서 독일가문비나무의 지상부(수고, 흉고직경, 침엽량, 가지량 등)와 지하부(뿌리길이, 무게, 수, 횡단면 등) 생체량 사이에는 서로 밀접한 상관을 나타내었다. 측정이 용이한 흉고직경에 대한 지하부 뿌리생체량은 Y = 3.56X - 45.94의 관계식으로 결정계수가 0.96으로 매우 높은 상관관계를 나타내었다. 가지량, 침엽량과 수고에 있어서도 지하부 생체량과 높은 상관관계를 나타내었다. 본 연구에서 얻어진 회귀식은 30~40년생 독일 가문비나무 임분에서 흉고직경을 이용하여 지하부의 상대적인 뿌리 생체량을 추정하는데 유용하게 이용될 수 있을 것이다.

  • PDF

데이터마이닝을 위한 혼합 데이터베이스에서의 속성선택

  • 차운옥;허문열
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 춘계 학술발표회 논문집
    • /
    • pp.103-108
    • /
    • 2003
  • 데이터마이닝을 위한 대용량 데이터베이스를 축소시키는 방법 중에 속성선택 방법이 많이 사용되고 있다. 본 논문에서는 세 가지 속성선택 방법을 사용하여 조건속성 수를 60%이상 축소시켜 결정나무와 로지스틱 회귀모형에 적용시켜보고 이들의 효율을 비교해 본다. 세 가지 속성선택 방법은 MDI, 정보획득, ReliefF 방법이다. 결정나무 방법은 QUEST, CART, C4.5를 사용하였다. 속성선택 방법들의 분류 정확성은 UCI 데이터베이스에 주어진 Credit 승인 데이터베이스와 German Credit 데이터베이스를 사용하여 10층-교차확인 방법으로 평가하였다.

  • PDF

A Study on Improving the predict accuracy rate of Hybrid Model Technique Using Error Pattern Modeling : Using Logistic Regression and Discriminant Analysis

  • Cho, Yong-Jun;Hur, Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권2호
    • /
    • pp.269-278
    • /
    • 2006
  • This paper presents the new hybrid data mining technique using error pattern, modeling of improving classification accuracy. The proposed method improves classification accuracy by combining two different supervised learning methods. The main algorithm generates error pattern modeling between the two supervised learning methods(ex: Neural Networks, Decision Tree, Logistic Regression and so on.) The Proposed modeling method has been applied to the simulation of 10,000 data sets generated by Normal and exponential random distribution. The simulation results show that the performance of proposed method is superior to the existing methods like Logistic regression and Discriminant analysis.

  • PDF

개선된 데이터마이닝을 위한 혼합 학습구조의 제시 (Hybrid Learning Architectures for Advanced Data Mining:An Application to Binary Classification for Fraud Management)

  • Kim, Steven H.;Shin, Sung-Woo
    • 정보기술응용연구
    • /
    • 제1권
    • /
    • pp.173-211
    • /
    • 1999
  • The task of classification permeates all walks of life, from business and economics to science and public policy. In this context, nonlinear techniques from artificial intelligence have often proven to be more effective than the methods of classical statistics. The objective of knowledge discovery and data mining is to support decision making through the effective use of information. The automated approach to knowledge discovery is especially useful when dealing with large data sets or complex relationships. For many applications, automated software may find subtle patterns which escape the notice of manual analysis, or whose complexity exceeds the cognitive capabilities of humans. This paper explores the utility of a collaborative learning approach involving integrated models in the preprocessing and postprocessing stages. For instance, a genetic algorithm effects feature-weight optimization in a preprocessing module. Moreover, an inductive tree, artificial neural network (ANN), and k-nearest neighbor (kNN) techniques serve as postprocessing modules. More specifically, the postprocessors act as second0order classifiers which determine the best first-order classifier on a case-by-case basis. In addition to the second-order models, a voting scheme is investigated as a simple, but efficient, postprocessing model. The first-order models consist of statistical and machine learning models such as logistic regression (logit), multivariate discriminant analysis (MDA), ANN, and kNN. The genetic algorithm, inductive decision tree, and voting scheme act as kernel modules for collaborative learning. These ideas are explored against the background of a practical application relating to financial fraud management which exemplifies a binary classification problem.

  • PDF

Development of Coil Breakage Prediction Model In Cold Rolling Mill

  • Park, Yeong-Bok;Hwang, Hwa-Won
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2005년도 ICCAS
    • /
    • pp.1343-1346
    • /
    • 2005
  • In the cold rolling mill, coil breakage that generated in rolling process makes the various types of troubles such as the degradation of productivity and the damage of equipment. Recent researches were done by the mechanical analysis such as the analysis of roll chattering or strip inclining and the prevention of breakage that detects the crack of coil. But they could cover some kind of breakages. The prediction of Coil breakage was very complicated and occurred rarely. We propose to build effective prediction modes for coil breakage in rolling process, based on data mining model. We proposed three prediction models for coil breakage: (1) decision tree based model, (2) regression based model and (3) neural network based model. To reduce model parameters, we selected important variables related to the occurrence of coil breakage from the attributes of coil setup by using the methods such as decision tree, variable selection and the choice of domain experts. We developed these prediction models and chose the best model among them using SEMMA process that proposed in SAS E-miner environment. We estimated model accuracy by scoring the prediction model with the posterior probability. We also have developed a software tool to analyze the data and generate the proposed prediction models either automatically and in a user-driven manner. It also has an effective visualization feature that is based on PCA (Principle Component Analysis).

  • PDF

연속형 반응변수를 위한 데이터마이닝 방법 성능 향상 연구 (A study for improving data mining methods for continuous response variables)

  • 최진수;이석형;조형준
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권5호
    • /
    • pp.917-926
    • /
    • 2010
  • 배깅과 부스팅의 기법은 예측력을 향상 시킨다고 알려져 있다. 이는 비교 실험을 통하여 성능이 검증 되었는데, 목표변수가 범주형인 경우에 특정 의사결정나무 알고리즘인 회귀분류나무만 주로 고려되었다. 본 논문에서는 의사결정나무 외에도 다른 데이터마이닝 방법도 고려하여 목표변수가 연속형인 경우에 배깅과 부스팅 기법의 성능 검증을 위한 비교 실험을 실시하였다. 구체적으로, 데이터마이닝 알고리즘 기법인 선형회귀, 의사결정나무, 신경망에 배깅 및 부스팅 앙상블 기법을 결합하여 8개의 데이터를 비교 분석하였다. 실험 결과로 연속형 자료에 대한 여러 데이터마이닝 알고리즘에도 배깅과 부스팅의 기법이 성능 향상에 도움이 되는 것으로 확인되었다.

Discovering Relationships between Skin Type and Life Style Using Data Mining Techniques: A Case Study of Korea

  • Kim, Taeheung;Ha, Jihyun;Lee, Jong-Seok;Oh, Younhak;Cho, Yong Ju
    • Industrial Engineering and Management Systems
    • /
    • 제15권1호
    • /
    • pp.110-121
    • /
    • 2016
  • With the growing interest in skincare and maintenance, there are increasing numbers of studies on the classification of skin type and the factors influencing each type. This study presents a novel methodology by using data mining, for the determination of the relationships between skin type, lifestyle, and patterns of cosmetic utilization. Eight skin-specific factors, which are moisture, sebum in U-zone (both cheeks), sebum in T-zone (forehead, nose, and chin), pore, melanin, wrinkle, acne, hemoglobin, were measured in 1,246 subjects living in South Korea, in conjunction with a questionnaire survey analyzing their lifestyles and pattern of cosmetic utilization. Using various multivariate statistical methods and data mining techniques, we classified the skin types based on the skin-specific values, determined the relationship between skin type and lifestyle, and accordingly sorted the subjects into clusters. Logistic regression analysis revealed gender-related differences in the skin; therefore, separate analyses were performed for males and females. Using the Gaussian Mixture Modeling (GMM) technique, we classified the subjects based on skin type (two male and four female). Using the ANOVA and decision tree techniques, we attempted to characterize the relationship between each skin type and the lifestyles of the subjects. Menstruation, eating habits, stress, and smoking were identified as the major factors affecting the skin.

Selecting the Best Prediction Model for Readmission

  • Lee, Eun-Whan
    • Journal of Preventive Medicine and Public Health
    • /
    • 제45권4호
    • /
    • pp.259-266
    • /
    • 2012
  • Objectives: This study aims to determine the risk factors predicting rehospitalization by comparing three models and selecting the most successful model. Methods: In order to predict the risk of rehospitalization within 28 days after discharge, 11 951 inpatients were recruited into this study between January and December 2009. Predictive models were constructed with three methods, logistic regression analysis, a decision tree, and a neural network, and the models were compared and evaluated in light of their misclassification rate, root asymptotic standard error, lift chart, and receiver operating characteristic curve. Results: The decision tree was selected as the final model. The risk of rehospitalization was higher when the length of stay (LOS) was less than 2 days, route of admission was through the out-patient department (OPD), medical department was in internal medicine, 10th revision of the International Classification of Diseases code was neoplasm, LOS was relatively shorter, and the frequency of OPD visit was greater. Conclusions: When a patient is to be discharged within 2 days, the appropriateness of discharge should be considered, with special concern of undiscovered complications and co-morbidities. In particular, if the patient is admitted through the OPD, any suspected disease should be appropriately examined and prompt outcomes of tests should be secured. Moreover, for patients of internal medicine practitioners, co-morbidity and complications caused by chronic illness should be given greater attention.

티셔츠 상품의 판매패턴과 연관된 상품속성 (Sales Pattern and Related Product Attributes of T-shirts)

  • 채진미;김은희
    • 한국의류학회지
    • /
    • 제44권6호
    • /
    • pp.1053-1069
    • /
    • 2020
  • This study examined the sales pattern relationship with respect to product attributes to propose sales forecasting for fashion products. We analyzed 537 SKU sales data of T-shirts in the domestic sports brand using SAS program. The sales pattern of fashion products fluctuated and were influenced by exogenous factors; therefore, we removed the influence of exogenous factors found to be price discounts and holiday effects as a result of regression analysis. In addition, it was difficult to predict sales using the sales patterns of the same product since fashion products were released as new products every year. Therefore, the forecasting model was proposed using sales patterns of related product attributes when attributes were considered descriptive variables. We classified sales patterns using K-means clustering in order to explain the relationship between sales patterns and product attributes along with creating a decision tree classifier using attributes as input and sales patterns as output. As a result, the sales patterns of T-shirts were clustered into six types that featured the characteristic shape of peak and slope. It was also associated with the combination of product attributes and their values in regards to the proposed sales pattern prediction model.

모델트리의 결측치 처리 방법에 따른 콜레스테롤수치 예측의 성능 변화 (Using Missing Values in the Model Tree to Change Performance for Predict Cholesterol Levels)

  • 정용규;원재강;신성철
    • 서비스연구
    • /
    • 제2권2호
    • /
    • pp.35-43
    • /
    • 2012
  • 데이터 마이닝은 특정분야에서만 관심을 갖는 분야가 아니라 현재 우리주변 여러 분야에서 많이 사용되고 응용되고 있다. 즉, 수많은 데이터 가운데 숨겨져 있는 유용한 상관관계를 발견하여, 미래에 실행 가능한 정보를 예측하여 추출해 내고 추후에 의사 결정에 이용하는 과정을 말한다. 하지만, 일부 데이터 집합에서는 매우 많은 결측치를 포함하는 변수들이 존재한다. 다시 말해서 다수의 레코드에서 측정치가 존재하지 않는 데이터 집합이 존재한다. 그래서 본 논문에서는 Cholesterol 값을 예측하기 위한 결측치 처리에 따른 모델트리 알고리즘을 적용하고, 실험을 통해서 각 처리방식에 대한 성능을 분석한다. 또는 이 결과를 통하여 결측치 대체방법에 대한 효율적인 적용사례를 제시한다.

  • PDF