• Title/Summary/Keyword: decision tree regression

Search Result 323, Processing Time 0.022 seconds

Data Mining for Knowledge Management in a Health Insurance Domain

  • Chae, Young-Moon;Ho, Seung-Hee;Cho, Kyoung-Won;Lee, Dong-Ha;Ji, Sun-Ha
    • Journal of Intelligence and Information Systems
    • /
    • v.6 no.1
    • /
    • pp.73-82
    • /
    • 2000
  • This study examined the characteristicso f the knowledge discovery and data mining algorithms to demonstrate how they can be used to predict health outcomes and provide policy information for hypertension management using the Korea Medical Insurance Corporation database. Specifically this study validated the predictive power of data mining algorithms by comparing the performance of logistic regression and two decision tree algorithms CHAID (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5) since logistic regression has assumed a major position in the healthcare field as a method for predicting or classifying health outcomes based on the specific characteristics of each individual case. This comparison was performed using the test set of 4,588 beneficiaries and the training set of 13,689 beneficiaries that were used to develop the models. On the contrary to the previous study CHAID algorithm performed better than logistic regression in predicting hypertension but C5.0 had the lowest predictive power. In addition CHAID algorithm and association rule also provided the segment characteristics for the risk factors that may be used in developing hypertension management programs. This showed that data mining approach can be a useful analytic tool for predicting and classifying health outcomes data.

  • PDF

Mapping for Biodiversity Using National Forest Inventory Data and GIS (국가 생태정보를 활용한 생물다양성 지도 구축)

  • Jung, Da-Jung;Kang, Kyung-Ho;Heo, Joon;Kim, Chang-Jae;Kim, Sung-Ho;Lee, Jung-Bin
    • Journal of Environmental Impact Assessment
    • /
    • v.19 no.6
    • /
    • pp.573-581
    • /
    • 2010
  • Natural ecosystem is an essential part to connect with the plan for biodiversity conservation in response strategy against climate change. For connecting biodiversity conservation with climate change strategy, Europe, America, Japan, and China are making an effort to discuss protection necessity through national biodiversity valuation but precedent studies lack in Korea. In this study, we made biodiversity maps representing biodiversity distribution range using species richness in National Forest Inventory (NFI) and Forest Description data. Using regression tree algorithm, we divided various classes by decision rule and constructed biodiversity maps, which has accuracy level of over 70%. Therefore, the biodiversity maps produced in this study can be used as base information for decision makers and plan for conservation of biodiversity & continuous management. Furthermore, this study can suggest a strategy for increasing efficiency of forest information in national level.

Using Predictive Analytics to Profile Potential Adopters of Autonomous Vehicles

  • Lee, Eun-Ju;Zafarzon, Nordirov;Zhang, Jing
    • Asia Marketing Journal
    • /
    • v.20 no.2
    • /
    • pp.65-83
    • /
    • 2018
  • Technological advances are bringing autonomous vehicles to the ever-evolving transportation system. Anticipating adoption of these technologies by users is essential to vehicle manufacturers for making more precise production and marketing strategies. The research investigates regulatory focus and consumer innovativeness with consumers' adoption of autonomous vehicles (AVs) and to consumers' subsequent willingness to pay for AVs. An online questionnaire was fielded to confirm predictions, and regression analysis was conducted to verify the model's validity. The results show that a promotion focus does not have a significantly positive effect on the automation level at which consumers will adopt AVs, but a prevention focus has a significantly positive effect on conditional AV adoption. Consumer innovativeness, consumers' novelty-seeking have a significantly positive relationship with high and full AV adoption, and consumers' independent decision-making has a significantly positive effect on full AV adoption. The higher the level of automation at which a consumer adopts AVs, the higher the willingness to pay for them. Finally, using a neural network and decision tree analyses, we show methods with which to describe three categories for potential adopters of AVs.

A Study on NOx Emission Control Methods in the Cement Firing Process Using Data Mining Techniques (데이터 마이닝을 이용한 시멘트 소성공정 질소산화물(NOx)배출 관리 방법에 관한 연구)

  • Park, Chul Hong;Kim, Yong Soo
    • Journal of Korean Society for Quality Management
    • /
    • v.46 no.3
    • /
    • pp.739-752
    • /
    • 2018
  • Purpose: The purpose of this study was to investigate the relationship between kiln processing parameters and NOx emissions that occur in the sintering and calcination steps of the cement manufacturing process and to derive the main factors responsible for producing emissions outside emission limit criteria, as determined by category models and classification rules, using data mining techniques. The results from this study are expected to be useful as guidelines for NOx emission control standards. Methods: Data were collected from Precalciner Kiln No.3 used in one of the domestic cement plants in Korea. Thirty-four independent variables affecting NOx generation and dependent variables that exceeded or were below the NOx emiision limit (>1 and <0, respectively) were examined during kiln processing. These data were used to construct a detection model of NOx emission, in which emissions exceeded or were below the set limits. The model was validated using SPSS MODELER 18.0, artificial neural network, decision treee (C5.0), and logistic regression analysis data mining techniques. Results: The decision tree (C5.0) algorithm best represented NOx emission behavior and was used to identify 10 processing variables that resulted in NOx emissions outside limit criteria. Conclusion: The results of this study indicate that the decision tree (C5.0) can be applied for real-time monitoring and management of NOx emissions during the cement firing process to satisfy NOx emission control standards and to provide for a more eco-friendly cement product.

Pattern Classification Model Design and Performance Comparison for Data Mining of Time Series Data (시계열 자료의 데이터마이닝을 위한 패턴분류 모델설계 및 성능비교)

  • Lee, Soo-Yong;Lee, Kyoung-Joung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.730-736
    • /
    • 2011
  • In this paper, we designed the models for pattern classification which can reflect the latest trend in time series. It has been shown that fusion models based on statistical and AI methods are superior to traditional ones for the pattern classification model supporting decision making. Especially, the hit rates of pattern classification models combined with fuzzy theory are relatively increased. The statistical SVM models combined with fuzzy membership function, or the models combining neural network and FCM has shown good performance. BPN, PNN, FNN, FCM, SVM, FSVM, Decision Tree, Time Series Analysis, and Regression Analysis were used for pattern classification models in the experiments of this paper. The economical indices DB with time series properties of the financial market(Korea, KOSPI200 DB) and the electrocardiogram DB of arrhythmia patients in hospital emergencies(USA, MIT-BIH DB) were used for data base.

Assessment of compressive strength of high-performance concrete using soft computing approaches

  • Chukwuemeka Daniel;Jitendra Khatti;Kamaldeep Singh Grover
    • Computers and Concrete
    • /
    • v.33 no.1
    • /
    • pp.55-75
    • /
    • 2024
  • The present study introduces an optimum performance soft computing model for predicting the compressive strength of high-performance concrete (HPC) by comparing models based on conventional (kernel-based, covariance function-based, and tree-based), advanced machine (least square support vector machine-LSSVM and minimax probability machine regressor-MPMR), and deep (artificial neural network-ANN) learning approaches using a common database for the first time. A compressive strength database, having results of 1030 concrete samples, has been compiled from the literature and preprocessed. For the purpose of training, testing, and validation of soft computing models, 803, 101, and 101 data points have been selected arbitrarily from preprocessed data points, i.e., 1005. Thirteen performance metrics, including three new metrics, i.e., a20-index, index of agreement, and index of scatter, have been implemented for each model. The performance comparison reveals that the SVM (kernel-based), ET (tree-based), MPMR (advanced), and ANN (deep) models have achieved higher performance in predicting the compressive strength of HPC. From the overall analysis of performance, accuracy, Taylor plot, accuracy metric, regression error characteristics curve, Anderson-Darling, Wilcoxon, Uncertainty, and reliability, it has been observed that model CS4 based on the ensemble tree has been recognized as an optimum performance model with higher performance, i.e., a correlation coefficient of 0.9352, root mean square error of 5.76 MPa, and mean absolute error of 4.1069 MPa. The present study also reveals that multicollinearity affects the prediction accuracy of Gaussian process regression, decision tree, multilinear regression, and adaptive boosting regressor models, novel research in compressive strength prediction of HPC. The cosine sensitivity analysis reveals that the prediction of compressive strength of HPC is highly affected by cement content, fine aggregate, coarse aggregate, and water content.

Development of Predictive Model for Length of Stay(LOS) in Acute Stroke Patients using Artificial Intelligence (인공지능을 이용한 급성 뇌졸중 환자의 재원일수 예측모형 개발)

  • Choi, Byung Kwan;Ham, Seung Woo;Kim, Chok Hwan;Seo, Jung Sook;Park, Myung Hwa;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.16 no.1
    • /
    • pp.231-242
    • /
    • 2018
  • The efficient management of the Length of Stay(LOS) is important in hospital. It is import to reduce medical cost for patients and increase profitability for hospitals. In order to efficiently manage LOS, it is necessary to develop an artificial intelligence-based prediction model that supports hospitals in benchmarking and reduction ways of LOS. In order to develop a predictive model of LOS for acute stroke patients, acute stroke patients were extracted from 2013 and 2014 discharge injury patient data. The data for analysis was classified as 60% for training and 40% for evaluation. In the model development, we used traditional regression technique such as multiple regression analysis method, artificial intelligence technique such as interactive decision tree, neural network technique, and ensemble technique which integrate all. Model evaluation used Root ASE (Absolute error) index. They were 23.7 by multiple regression, 23.7 by interactive decision tree, 22.7 by neural network and 22.7 by esemble technique. As a result of model evaluation, neural network technique which is artificial intelligence technique was found to be superior. Through this, the utility of artificial intelligence has been proved in the development of the prediction LOS model. In the future, it is necessary to continue research on how to utilize artificial intelligence techniques more effectively in the development of LOS prediction model.

Convergence analysis for geographic variations and risk factors in the prevalence of hyperlipidemia using measures of Korean Community Health Survey (지역사회건강조사 지표를 이용한 고지혈증 유병율의 지역 간 변이와 위험 요인의 융복합적 분석)

  • Kim, Yoo-Mi;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.8
    • /
    • pp.419-429
    • /
    • 2015
  • We investigate how the regional prevalence of hyperlipidemia is affected by health-related and socioeconomic factors with a special emphasis on geographic variations. We focus on the likelihood of hyperlipidemia as function of various region-specific attributes. We analysis a data set at the level of 249 small administrative districts collected from 2012 Korean Community Health Survey by Korea Centers for Disease Control and Prevention. To estimate, we use several methods including correlation analysis, multiple regression and decision tree model. We find that the average prevalence of hyperlipidemia in 249 small districts is 9.6% and its coefficient of variation is 28.3%. Prevalence of hyperlipidemia in continental and capital regions is higher than in southeast coastal regions. Further findings using decision tree model suggest that variations of hyperlipidemia prevalence between regions is more likely to be associated with rate of employee, level of stress, prevalence of hypertension, angina pectoris, and osteoarthritis in their regions.

A study on the prediction of korean NPL market return (한국 NPL시장 수익률 예측에 관한 연구)

  • Lee, Hyeon Su;Jeong, Seung Hwan;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.123-139
    • /
    • 2019
  • The Korean NPL market was formed by the government and foreign capital shortly after the 1997 IMF crisis. However, this market is short-lived, as the bad debt has started to increase after the global financial crisis in 2009 due to the real economic recession. NPL has become a major investment in the market in recent years when the domestic capital market's investment capital began to enter the NPL market in earnest. Although the domestic NPL market has received considerable attention due to the overheating of the NPL market in recent years, research on the NPL market has been abrupt since the history of capital market investment in the domestic NPL market is short. In addition, decision-making through more scientific and systematic analysis is required due to the decline in profitability and the price fluctuation due to the fluctuation of the real estate business. In this study, we propose a prediction model that can determine the achievement of the benchmark yield by using the NPL market related data in accordance with the market demand. In order to build the model, we used Korean NPL data from December 2013 to December 2017 for about 4 years. The total number of things data was 2291. As independent variables, only the variables related to the dependent variable were selected for the 11 variables that indicate the characteristics of the real estate. In order to select the variables, one to one t-test and logistic regression stepwise and decision tree were performed. Seven independent variables (purchase year, SPC (Special Purpose Company), municipality, appraisal value, purchase cost, OPB (Outstanding Principle Balance), HP (Holding Period)). The dependent variable is a bivariate variable that indicates whether the benchmark rate is reached. This is because the accuracy of the model predicting the binomial variables is higher than the model predicting the continuous variables, and the accuracy of these models is directly related to the effectiveness of the model. In addition, in the case of a special purpose company, whether or not to purchase the property is the main concern. Therefore, whether or not to achieve a certain level of return is enough to make a decision. For the dependent variable, we constructed and compared the predictive model by calculating the dependent variable by adjusting the numerical value to ascertain whether 12%, which is the standard rate of return used in the industry, is a meaningful reference value. As a result, it was found that the hit ratio average of the predictive model constructed using the dependent variable calculated by the 12% standard rate of return was the best at 64.60%. In order to propose an optimal prediction model based on the determined dependent variables and 7 independent variables, we construct a prediction model by applying the five methodologies of discriminant analysis, logistic regression analysis, decision tree, artificial neural network, and genetic algorithm linear model we tried to compare them. To do this, 10 sets of training data and testing data were extracted using 10 fold validation method. After building the model using this data, the hit ratio of each set was averaged and the performance was compared. As a result, the hit ratio average of prediction models constructed by using discriminant analysis, logistic regression model, decision tree, artificial neural network, and genetic algorithm linear model were 64.40%, 65.12%, 63.54%, 67.40%, and 60.51%, respectively. It was confirmed that the model using the artificial neural network is the best. Through this study, it is proved that it is effective to utilize 7 independent variables and artificial neural network prediction model in the future NPL market. The proposed model predicts that the 12% return of new things will be achieved beforehand, which will help the special purpose companies make investment decisions. Furthermore, we anticipate that the NPL market will be liquidated as the transaction proceeds at an appropriate price.

Relationship Between Above-and Below-Ground Biomass for Norway Spruce (Picea abies) : Estimating Root System Biomass from Breast Height Diameter (독일가문비나무(Picea abies [L.] Karst)의 지상부(地上部)와 지하부(地下部) 생체량(生體量)에 관(關)한 연구(硏究) : 흉고직경(胸高直徑)에 의한 뿌리생체량(生體量) 추정(推定))

  • Lee, Do-Hyung
    • Journal of Korean Society of Forest Science
    • /
    • v.90 no.3
    • /
    • pp.338-345
    • /
    • 2001
  • This study was conducted to elucidate the relationship between the root structure and the crown structure of Norway spruce(Picea abies [L.] Karst), and thereafter to obtain the regression equation for the estimation of relative root and needle biomass using the tree height and diameter at breast height(DBH) without measurement of root and needle biomass. The study site was Barbis stands of Harz region located in central part of Germany. Five dominant and three co-dominant trees of 30 to 40 year-old Norway spruce were selected and tree height, diameter at breast height, clear bole length, weight of total needle and branch, cross section and sapwood area at breast height for biomass of above ground part and also the length of root, the number of root, the weight of root, the cross section area of root etc. by dividing the horizontal and vertical roots for below ground part of tree were measured. The significantly correlation was shown between the biomass of most of variables of above ground parts and those of below ground parts. For the diameter of breast height to the weight of total root, regression equation was Y = 3.56X - 45.94 and decision coefficient was 0.96 showing highly correlation. The weight of total branches and needles, and the tree height etc. of above ground parts showed highly positive relationship with below ground biomass. The results obtained from this study can be used to the estimating of biomass of below ground using variables of above ground such as DBH in the 30 to 40 year-old Norway spruce stands.

  • PDF