• Title/Summary/Keyword: Decision Tree Regression

Search Result 328, Processing Time 0.03 seconds

Method for Assessing Landslide Susceptibility Using SMOTE and Classification Algorithms (SMOTE와 분류 기법을 활용한 산사태 위험 지역 결정 방법)

  • Yoon, Hyung-Koo
    • Journal of the Korean Geotechnical Society
    • /
    • v.39 no.6
    • /
    • pp.5-12
    • /
    • 2023
  • Proactive assessment of landslide susceptibility is necessary for minimizing casualties. This study proposes a methodology for classifying the landslide safety factor using a classification algorithm based on machine learning techniques. The high-risk area model is adopted to perform the classification and eight geotechnical parameters are adopted as inputs. Four classification algorithms-namely decision tree, k-nearest neighbor, logistic regression, and random forest-are employed for comparing classification accuracy for the safety factors ranging between 1.2 and 2.0. Notably, a high accuracy is demonstrated in the safety factor range of 1.2~1.7, but a relatively low accuracy is obtained in the range of 1.8~2.0. To overcome this issue, the synthetic minority over-sampling technique (SMOTE) is adopted to generate additional data. The application of SMOTE improves the average accuracy by ~250% in the safety factor range of 1.8~2.0. The results demonstrate that SMOTE algorithm improves the accuracy of classification algorithms when applied to geotechnical data.

A Study on the Big Data Analysis and Predictive Models for Quality Issues in Defense C5ISR (국방 C5ISR 분야 품질문제의 빅데이터 분석 및 예측 모델에 대한 연구)

  • Hyoung Jo Huh;Sujin Ko;Seung Hyun Baek
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.4
    • /
    • pp.551-571
    • /
    • 2023
  • Purpose: The purpose of this study is to propose useful suggestions by analyzing the causal effect relationship between the failure rate of quality and the process variables in the C5ISR domain of the defense industry. Methods: The collected data through the in house Systems were analyzed using Big data analysis. Data analysis between quality data and A/S history data was conducted using the CRISP-DM(Cross-Industry Standard Process for Data Mining) analysis process. Results: The results of this study are as follows: After evaluating the performance of candidate models for the influence of inspection data and A/S history data, logistic regression was selected as the final model because it performed relatively well compared to the decision tree with an accuracy of 82%/67% and an AUC of 0.66/0.57. Based on this model, we estimated the coefficients using 'R', a data analysis tool, and found that a specific variable(continuous maximum discharge current time) had a statistically significant effect on the A/S quality failure rate and it was analysed that 82% of the failure rate could be predicted. Conclusion: As the first case of applying big data analysis to quality issues in the defense industry, this study confirms that it is possible to improve the market failure rates of defense products by focusing on the measured values of the main causes of failures derived through the big data analysis process, and identifies improvements, such as the number of data samples and data collection limitations, to be addressed in subsequent studies for a more reliable analysis model.

Cloud Computing Adoption Decision-Making Modeling Using CART (CART 방법론을 사용한 클라우드 컴퓨팅 도입 의사 결정 모델링)

  • Baek, Seung Hyun;Chang, Byeong-Yun
    • Journal of the Korea Society for Simulation
    • /
    • v.23 no.4
    • /
    • pp.189-195
    • /
    • 2014
  • In this paper, we conducted a study on place-free and time-free cloud computing (CC) adoption decision-making model. Panel survey data which is collected from 65 people and CART (classification and regression tree) which is one of data mining approaches are used to construct decision-making model. In this modeling, there are 2 steps: In the first step, significant questions (variables) are selected. After that, the CART decision-making model is constructed using the selected variables. In the variable selection stage, the 25 questions are reduced to 5 ones. The benefits of question reduction are quick response from respondent and reducing model-construction time.

A Study on Regression Class Generation of MLLR Adaptation Using State Level Sharing (상태레벨 공유를 이용한 MLLR 적응화의 회귀클래스 생성에 관한 연구)

  • 오세진;성우창;김광동;노덕규;송민규;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.727-739
    • /
    • 2003
  • In this paper, we propose a generation method of regression classes for adaptation in the HM-Net (Hidden Markov Network) system. The MLLR (Maximum Likelihood Linear Regression) adaptation approach is applied to the HM-Net speech recognition system for expressing the characteristics of speaker effectively and the use of HM-Net in various tasks. For the state level sharing, the context domain state splitting of PDT-SSS (Phonetic Decision Tree-based Successive State Splitting) algorithm, which has the contextual and time domain clustering, is adopted. In each state of contextual domain, the desired phoneme classes are determined by splitting the context information (classes) including target speaker's speech data. The number of adaptation parameters, such as means and variances, is autonomously controlled by contextual domain state splitting of PDT-SSS, depending on the context information and the amount of adaptation utterances from a new speaker. The experiments are performed to verify the effectiveness of the proposed method on the KLE (The center for Korean Language Engineering) 452 data and YNU (Yeungnam Dniv) 200 data. The experimental results show that the accuracies of phone, word, and sentence recognition system increased by 34∼37%, 9%, and 20%, respectively, Compared with performance according to the length of adaptation utterances, the performance are also significantly improved even in short adaptation utterances. Therefore, we can argue that the proposed regression class method is well applied to HM-Net speech recognition system employing MLLR speaker adaptation.

An Application of Support Vector Machines to Personal Credit Scoring: Focusing on Financial Institutions in China (Support Vector Machines을 이용한 개인신용평가 : 중국 금융기관을 중심으로)

  • Ding, Xuan-Ze;Lee, Young-Chan
    • Journal of Industrial Convergence
    • /
    • v.16 no.4
    • /
    • pp.33-46
    • /
    • 2018
  • Personal credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Recently, many classification algorithms and models are used in personal credit scoring. Personal credit scoring technology is usually divided into statistical method and non-statistical method. Statistical method includes linear regression, discriminate analysis, logistic regression, and decision tree, etc. Non-statistical method includes linear programming, neural network, genetic algorithm and support vector machine, etc. But for the development of the credit scoring model, there is no consistent conclusion to be drawn regarding which method is the best. In this paper, we will compare the performance of the most common scoring techniques such as logistic regression, neural network, and support vector machines using personal credit data of the financial institution in China. Specifically, we build three models respectively, classify the customers and compare analysis results. According to the results, support vector machine has better performance than logistic regression and neural networks.

Traffic Flow Estimation System using a Hybrid Approach

  • Aung, Swe Sw;Nagayama, Itaru;Tamaki, Shiro
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.4
    • /
    • pp.281-291
    • /
    • 2017
  • Nowadays, as traffic jams are a daily elementary problem in both developed and developing countries, systems to monitor, predict, and detect traffic conditions are playing an important role in research fields. Comparing them, researchers have been trying to solve problems by applying many kinds of technologies, especially roadside sensors, which still have some issues, and for that reason, any one particular method by itself could not generate sufficient traffic prediction results. However, these sensors have some issues that are not useful for research. Therefore, it may not be best to use them as stand-alone methods for a traffic prediction system. On that note, this paper mainly focuses on predicting traffic conditions based on a hybrid prediction approach, which stands on accuracy comparison of three prediction models: multinomial logistic regression, decision trees, and support vector machine (SVM) classifiers. This is aimed at selecting the most suitable approach by means of integrating proficiencies from these approaches. It was also experimentally confirmed, with test cases and simulations that showed the performance of this hybrid method is more effective than individual methods.

Forecasting Export & Import Container Cargoes using a Decision Tree Analysis (의사결정나무분석을 이용한 컨테이너 수출입 물동량 예측)

  • Son, Yongjung;Kim, Hyunduk
    • Journal of Korea Port Economic Association
    • /
    • v.28 no.4
    • /
    • pp.193-207
    • /
    • 2012
  • The of purpose of this study is to predict export and import container volumes using a Decision Tree analysis. Factors which can influence the volume of container cargo are selected as independent variables; producer price index, consumer price index, index of export volume, index of import volume, index of industrial production, and exchange rate(won/dollar). The period of analysis is from january 2002 to December 2011 and monthly data are used. In this study, CRT(Classification and Regression Trees) algorithm is used. The main findings are summarized as followings. First, when index of export volume is larger than 152.35, monthly export volume is predicted with 858,19TEU. However, when index of export volume is between 115.90 and 152.35, monthly export volume is predicted with 716,582TEU. Second, when index of import volume is larger than 134.60, monthly import volume is predicted with 869,227TEU. However, when index of export volume is between 116.20 and 134.60, monthly import volume is predicted with 738,724TEU.

Analysis on Geographical Variations of the Prevalence of Hypertension Using Multi-year Data (다년도 자료를 이용한 고혈압 유병률의 지역간 변이 분석)

  • Kim, Yoomi;Cho, Daegon;Hong, Sungok;Kim, Eunju;Kang, Sunghong
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.6
    • /
    • pp.935-948
    • /
    • 2014
  • As chronic diseases have become more prevalent and problematic, effective cares for major chronic diseases have been a locus of the healthcare policy. In this regard, this study examines how region-specific characteristics affect the prevalence of hypertension in South Korea. To analyze, we combined a unique multi-year data set including key indicators of health conditions and health behaviors at the 237 small administrative districts. The data are collected from the Annual Community Health Survey between 2009 and 2011 by Korea Centers for Disease Control and Prevention and other government organizations. For the purpose of investigating regional variations, we estimated using Geographically Weighted Regression (GWR) and decision tree model. Our finding first suggests that using the multi-year data is more legitimate than using the single-year data for the geographical analysis of chronic diseases, because the significant annual differences are observed in most variables. We also find that the prevalence of hypertension is more likely to be positively associated with the prevalence of diabetes and obesity but to be negatively associated with population density. More importantly, noticeable geographical variations in these factors are observed according to the results from the GWR. In line with this result, additional findings from the decision tree model suggest that primary influential factors that affect the hypertension prevalence are indeed heterogeneous across regional groups. Taken as a whole, accounting for geographical variations of health conditions, health behaviors and other socioeconomic factors is very important when the regionally customized healthcare policy is implemented to mitigate the hypertension prevalence. In short, our study sheds light on possible ways to manage the chronic diseases for policy makers in the local government.

  • PDF

Global Big Data Analysis Exploring the Determinants of Application Ratings: Evidence from the Google Play Store

  • Seo, Min-Kyo;Yang, Oh-Suk;Yang, Yoon-Ho
    • Journal of Korea Trade
    • /
    • v.24 no.7
    • /
    • pp.1-28
    • /
    • 2020
  • Purpose - This paper empirically investigates the predictors and main determinants of consumers' ratings of mobile applications in the Google Play Store. Using a linear and nonlinear model comparison to identify the function of users' review, in determining application rating across countries, this study estimates the direct effects of users' reviews on the application rating. In addition, extending our modelling into a sentimental analysis, this paper also aims to explore the effects of review polarity and subjectivity on the application rating, followed by an examination of the moderating effect of user reviews on the polarity-rating and subjectivity-rating relationships. Design/methodology - Our empirical model considers nonlinear association as well as linear causality between features and targets. This study employs competing theoretical frameworks - multiple regression, decision-tree and neural network models - to identify the predictors and main determinants of app ratings, using data from the Google Play Store. Using a cross-validation method, our analysis investigates the direct and moderating effects of predictors and main determinants of application ratings in a global app market. Findings - The main findings of this study can be summarized as follows: the number of user's review is positively associated with the ratings of a given app and it positively moderates the polarity-rating relationship. Applying the review polarity measured by a sentimental analysis to the modelling, it was found that the polarity is not significantly associated with the rating. This result best applies to the function of both positive and negative reviews in playing a word-of-mouth role, as well as serving as a channel for communication, leading to product innovation. Originality/value - Applying a proxy measured by binomial figures, previous studies have predominantly focused on positive and negative sentiment in examining the determinants of app ratings, assuming that they are significantly associated. Given the constraints to measurement of sentiment in current research, this paper employs sentimental analysis to measure the real integer for users' polarity and subjectivity. This paper also seeks to compare the suitability of three distinct models - linear regression, decision-tree and neural network models. Although a comparison between methodologies has long been considered important to the empirical approach, it has hitherto been underexplored in studies on the app market.

A Study on the Residential Satisfaction of Single Youth Households Tenants (청년 1인가구 임차인의 주거만족도에 관한 연구: 부산·경남지역을 중심으로)

  • Kwon, Jeongpyo;Kang, Jeonggyu
    • Land and Housing Review
    • /
    • v.13 no.2
    • /
    • pp.65-79
    • /
    • 2022
  • To suggest implications of future housing problems, this study investigates which characteristics affect the housing satisfaction of young single households. Using the survey data, we perform the multiple regression and decision tree models based on the SPSS Statistics 25.0. Our empirical results show several key features. First, housing characteristics and intention to continue single households had a positive (+) effect on housing satisfaction, in the order of natural, housing, physical characteristics, and intention to continue single households. Second, housing characteristics and intention to marry in the future had a positive (+) effect on housing satisfaction in the order of natural, housing, and physical characteristics. Third, housing characteristics and intention to increase household members in the future had a positive (+) effect on housing satisfaction, in the order of natural, housing, and physical characteristics satisfaction. Finally, the results of the decision tree model show that the natural characteristics were over 3.4, and housing satisfaction was the highest in the case of Jeonse. The results of this study provide three implications for policymakers. First, improving the residential environment of young single households is important. Second, providing customized housing for young single households could enhance the housing satisfaction of young people. Finally, housing provision needs to be carried out with suitable space for the lifestyle of young single households.