• Title/Summary/Keyword: decision tree regression

Search Result 323, Processing Time 0.03 seconds

A Study on Regression Class Generation of MLLR Adaptation Using State Level Sharing (상태레벨 공유를 이용한 MLLR 적응화의 회귀클래스 생성에 관한 연구)

  • 오세진;성우창;김광동;노덕규;송민규;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.727-739
    • /
    • 2003
  • In this paper, we propose a generation method of regression classes for adaptation in the HM-Net (Hidden Markov Network) system. The MLLR (Maximum Likelihood Linear Regression) adaptation approach is applied to the HM-Net speech recognition system for expressing the characteristics of speaker effectively and the use of HM-Net in various tasks. For the state level sharing, the context domain state splitting of PDT-SSS (Phonetic Decision Tree-based Successive State Splitting) algorithm, which has the contextual and time domain clustering, is adopted. In each state of contextual domain, the desired phoneme classes are determined by splitting the context information (classes) including target speaker's speech data. The number of adaptation parameters, such as means and variances, is autonomously controlled by contextual domain state splitting of PDT-SSS, depending on the context information and the amount of adaptation utterances from a new speaker. The experiments are performed to verify the effectiveness of the proposed method on the KLE (The center for Korean Language Engineering) 452 data and YNU (Yeungnam Dniv) 200 data. The experimental results show that the accuracies of phone, word, and sentence recognition system increased by 34∼37%, 9%, and 20%, respectively, Compared with performance according to the length of adaptation utterances, the performance are also significantly improved even in short adaptation utterances. Therefore, we can argue that the proposed regression class method is well applied to HM-Net speech recognition system employing MLLR speaker adaptation.

An Application of Support Vector Machines to Personal Credit Scoring: Focusing on Financial Institutions in China (Support Vector Machines을 이용한 개인신용평가 : 중국 금융기관을 중심으로)

  • Ding, Xuan-Ze;Lee, Young-Chan
    • Journal of Industrial Convergence
    • /
    • v.16 no.4
    • /
    • pp.33-46
    • /
    • 2018
  • Personal credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Recently, many classification algorithms and models are used in personal credit scoring. Personal credit scoring technology is usually divided into statistical method and non-statistical method. Statistical method includes linear regression, discriminate analysis, logistic regression, and decision tree, etc. Non-statistical method includes linear programming, neural network, genetic algorithm and support vector machine, etc. But for the development of the credit scoring model, there is no consistent conclusion to be drawn regarding which method is the best. In this paper, we will compare the performance of the most common scoring techniques such as logistic regression, neural network, and support vector machines using personal credit data of the financial institution in China. Specifically, we build three models respectively, classify the customers and compare analysis results. According to the results, support vector machine has better performance than logistic regression and neural networks.

Traffic Flow Estimation System using a Hybrid Approach

  • Aung, Swe Sw;Nagayama, Itaru;Tamaki, Shiro
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.4
    • /
    • pp.281-291
    • /
    • 2017
  • Nowadays, as traffic jams are a daily elementary problem in both developed and developing countries, systems to monitor, predict, and detect traffic conditions are playing an important role in research fields. Comparing them, researchers have been trying to solve problems by applying many kinds of technologies, especially roadside sensors, which still have some issues, and for that reason, any one particular method by itself could not generate sufficient traffic prediction results. However, these sensors have some issues that are not useful for research. Therefore, it may not be best to use them as stand-alone methods for a traffic prediction system. On that note, this paper mainly focuses on predicting traffic conditions based on a hybrid prediction approach, which stands on accuracy comparison of three prediction models: multinomial logistic regression, decision trees, and support vector machine (SVM) classifiers. This is aimed at selecting the most suitable approach by means of integrating proficiencies from these approaches. It was also experimentally confirmed, with test cases and simulations that showed the performance of this hybrid method is more effective than individual methods.

Forecasting Export & Import Container Cargoes using a Decision Tree Analysis (의사결정나무분석을 이용한 컨테이너 수출입 물동량 예측)

  • Son, Yongjung;Kim, Hyunduk
    • Journal of Korea Port Economic Association
    • /
    • v.28 no.4
    • /
    • pp.193-207
    • /
    • 2012
  • The of purpose of this study is to predict export and import container volumes using a Decision Tree analysis. Factors which can influence the volume of container cargo are selected as independent variables; producer price index, consumer price index, index of export volume, index of import volume, index of industrial production, and exchange rate(won/dollar). The period of analysis is from january 2002 to December 2011 and monthly data are used. In this study, CRT(Classification and Regression Trees) algorithm is used. The main findings are summarized as followings. First, when index of export volume is larger than 152.35, monthly export volume is predicted with 858,19TEU. However, when index of export volume is between 115.90 and 152.35, monthly export volume is predicted with 716,582TEU. Second, when index of import volume is larger than 134.60, monthly import volume is predicted with 869,227TEU. However, when index of export volume is between 116.20 and 134.60, monthly import volume is predicted with 738,724TEU.

Analysis on Geographical Variations of the Prevalence of Hypertension Using Multi-year Data (다년도 자료를 이용한 고혈압 유병률의 지역간 변이 분석)

  • Kim, Yoomi;Cho, Daegon;Hong, Sungok;Kim, Eunju;Kang, Sunghong
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.6
    • /
    • pp.935-948
    • /
    • 2014
  • As chronic diseases have become more prevalent and problematic, effective cares for major chronic diseases have been a locus of the healthcare policy. In this regard, this study examines how region-specific characteristics affect the prevalence of hypertension in South Korea. To analyze, we combined a unique multi-year data set including key indicators of health conditions and health behaviors at the 237 small administrative districts. The data are collected from the Annual Community Health Survey between 2009 and 2011 by Korea Centers for Disease Control and Prevention and other government organizations. For the purpose of investigating regional variations, we estimated using Geographically Weighted Regression (GWR) and decision tree model. Our finding first suggests that using the multi-year data is more legitimate than using the single-year data for the geographical analysis of chronic diseases, because the significant annual differences are observed in most variables. We also find that the prevalence of hypertension is more likely to be positively associated with the prevalence of diabetes and obesity but to be negatively associated with population density. More importantly, noticeable geographical variations in these factors are observed according to the results from the GWR. In line with this result, additional findings from the decision tree model suggest that primary influential factors that affect the hypertension prevalence are indeed heterogeneous across regional groups. Taken as a whole, accounting for geographical variations of health conditions, health behaviors and other socioeconomic factors is very important when the regionally customized healthcare policy is implemented to mitigate the hypertension prevalence. In short, our study sheds light on possible ways to manage the chronic diseases for policy makers in the local government.

  • PDF

Global Big Data Analysis Exploring the Determinants of Application Ratings: Evidence from the Google Play Store

  • Seo, Min-Kyo;Yang, Oh-Suk;Yang, Yoon-Ho
    • Journal of Korea Trade
    • /
    • v.24 no.7
    • /
    • pp.1-28
    • /
    • 2020
  • Purpose - This paper empirically investigates the predictors and main determinants of consumers' ratings of mobile applications in the Google Play Store. Using a linear and nonlinear model comparison to identify the function of users' review, in determining application rating across countries, this study estimates the direct effects of users' reviews on the application rating. In addition, extending our modelling into a sentimental analysis, this paper also aims to explore the effects of review polarity and subjectivity on the application rating, followed by an examination of the moderating effect of user reviews on the polarity-rating and subjectivity-rating relationships. Design/methodology - Our empirical model considers nonlinear association as well as linear causality between features and targets. This study employs competing theoretical frameworks - multiple regression, decision-tree and neural network models - to identify the predictors and main determinants of app ratings, using data from the Google Play Store. Using a cross-validation method, our analysis investigates the direct and moderating effects of predictors and main determinants of application ratings in a global app market. Findings - The main findings of this study can be summarized as follows: the number of user's review is positively associated with the ratings of a given app and it positively moderates the polarity-rating relationship. Applying the review polarity measured by a sentimental analysis to the modelling, it was found that the polarity is not significantly associated with the rating. This result best applies to the function of both positive and negative reviews in playing a word-of-mouth role, as well as serving as a channel for communication, leading to product innovation. Originality/value - Applying a proxy measured by binomial figures, previous studies have predominantly focused on positive and negative sentiment in examining the determinants of app ratings, assuming that they are significantly associated. Given the constraints to measurement of sentiment in current research, this paper employs sentimental analysis to measure the real integer for users' polarity and subjectivity. This paper also seeks to compare the suitability of three distinct models - linear regression, decision-tree and neural network models. Although a comparison between methodologies has long been considered important to the empirical approach, it has hitherto been underexplored in studies on the app market.

A Study on the Residential Satisfaction of Single Youth Households Tenants (청년 1인가구 임차인의 주거만족도에 관한 연구: 부산·경남지역을 중심으로)

  • Kwon, Jeongpyo;Kang, Jeonggyu
    • Land and Housing Review
    • /
    • v.13 no.2
    • /
    • pp.65-79
    • /
    • 2022
  • To suggest implications of future housing problems, this study investigates which characteristics affect the housing satisfaction of young single households. Using the survey data, we perform the multiple regression and decision tree models based on the SPSS Statistics 25.0. Our empirical results show several key features. First, housing characteristics and intention to continue single households had a positive (+) effect on housing satisfaction, in the order of natural, housing, physical characteristics, and intention to continue single households. Second, housing characteristics and intention to marry in the future had a positive (+) effect on housing satisfaction in the order of natural, housing, and physical characteristics. Third, housing characteristics and intention to increase household members in the future had a positive (+) effect on housing satisfaction, in the order of natural, housing, and physical characteristics satisfaction. Finally, the results of the decision tree model show that the natural characteristics were over 3.4, and housing satisfaction was the highest in the case of Jeonse. The results of this study provide three implications for policymakers. First, improving the residential environment of young single households is important. Second, providing customized housing for young single households could enhance the housing satisfaction of young people. Finally, housing provision needs to be carried out with suitable space for the lifestyle of young single households.

A Pattern Analysis on the Possibility of Near Miss Connection in Construction Sites (건설현장의 아차사고 연결가능성에 대한 패턴분석)

  • Sang Hyun Kim;Yeon Cheol Shin;Yu Mi Moon
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.1
    • /
    • pp.216-230
    • /
    • 2023
  • Purpose: The purpose is to prevent accidents by predicting disasters through the analysis of near-miss. Method: In this study, a near-miss literature review and data were collected at construction sites, and a questionnaire survey was conducted to use logistic regression analysis and decision tree analysis to classify the possibility of near-miss connection. Result: As a result of analyzing the effects of near-miss types on mental, physical, and safety habits and behaviors, the factor with a high influence on the body is the need for near-miss management, the type of job is electricity·information communication, and health status in order, and the mental factor is the construction scale The influence was high, and the factors with the highest influence on the habit behavior factors were analyzed in the order of experience, number of serious injuries, and occupation in order of illusion, inappropriate work instructions, and body parts. Through decision tree analysis, factors and patterns that affect the possibility of a near-miss being a surprise accident were identified. Conclusion: Construction site officials consider the observation of near-miss and mentally and physically. Specific management of the relevance of physical aspects to near-miss should be implemented, and a work environment in which serious accidents are reduced is expected through personnel allocation, work plans, work procedures and methods, and feedback so that inappropriate work instructions do not lead to near-miss.

Prediction of Water Usage in Pig Farm based on Machine Learning (기계학습을 이용한 돈사 급수량 예측방안 개발)

  • Lee, Woongsup;Ryu, Jongyeol;Ban, Tae-Won;Kim, Seong Hwan;Choi, Heechul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.8
    • /
    • pp.1560-1566
    • /
    • 2017
  • Recently, accumulation of data on pig farm is enabled through the wide spread of smart pig farm equipped with Internet-of-Things based sensors, and various machine learning algorithms are applied on the data in order to improve the productivity of pig farm. Herein, multiple machine learning schemes are used to predict the water usage in pig farm which is known to be one of the most important element in pig farm management. Especially, regression algorithms, which are linear regression, regression tree and AdaBoost regression, and classification algorithms which are logistic classification, decision tree and support vector machine, are applied to derive a prediction scheme which forecast the water usage based on the temperature and humidity of pig farm. Through performance evaluation, we find that the water usage can be predicted with high accuracy. The proposed scheme can be used to detect the malfunction of water system which prevents the death of pigs and reduces the loss of pig farm.

Machine Learning Algorithm for Estimating Ink Usage (머신러닝을 통한 잉크 필요량 예측 알고리즘)

  • Se Wook Kwon;Young Joo Hyun;Hyun Chul Tae
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.1
    • /
    • pp.23-31
    • /
    • 2023
  • Research and interest in sustainable printing are increasing in the packaging printing industry. Currently, predicting the amount of ink required for each work is based on the experience and intuition of field workers. Suppose the amount of ink produced is more than necessary. In this case, the rest of the ink cannot be reused and is discarded, adversely affecting the company's productivity and environment. Nowadays, machine learning models can be used to figure out this problem. This study compares the ink usage prediction machine learning models. A simple linear regression model, Multiple Regression Analysis, cannot reflect the nonlinear relationship between the variables required for packaging printing, so there is a limit to accurately predicting the amount of ink needed. This study has established various prediction models which are based on CART (Classification and Regression Tree), such as Decision Tree, Random Forest, Gradient Boosting Machine, and XGBoost. The accuracy of the models is determined by the K-fold cross-validation. Error metrics such as root mean squared error, mean absolute error, and R-squared are employed to evaluate estimation models' correctness. Among these models, XGBoost model has the highest prediction accuracy and can reduce 2134 (g) of wasted ink for each work. Thus, this study motivates machine learning's potential to help advance productivity and protect the environment.