• Title/Summary/Keyword: Linear regression models

Search Result 961, Processing Time 0.023 seconds

Hybrid Fuzzy Least Squares Support Vector Machine Regression for Crisp Input and Fuzzy Output

  • Shim, Joo-Yong;Seok, Kyung-Ha;Hwang, Chang-Ha
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.2
    • /
    • pp.141-151
    • /
    • 2010
  • Hybrid fuzzy regression analysis is used for integrating randomness and fuzziness into a regression model. Least squares support vector machine(LS-SVM) has been very successful in pattern recognition and function estimation problems for crisp data. This paper proposes a new method to evaluate hybrid fuzzy linear and nonlinear regression models with crisp inputs and fuzzy output using weighted fuzzy arithmetic(WFA) and LS-SVM. LS-SVM allows us to perform fuzzy nonlinear regression analysis by constructing a fuzzy linear regression function in a high dimensional feature space. The proposed method is not computationally expensive since its solution is obtained from a simple linear equation system. In particular, this method is a very attractive approach to modeling nonlinear data, and is nonparametric method in the sense that we do not have to assume the underlying model function for fuzzy nonlinear regression model with crisp inputs and fuzzy output. Experimental results are then presented which indicate the performance of this method.

Determination of Research Octane Number using NIR Spectral Data and Ridge Regression

  • Jeong, Ho Il;Lee, Hye Seon;Jeon, Ji Hyeok
    • Bulletin of the Korean Chemical Society
    • /
    • v.22 no.1
    • /
    • pp.37-42
    • /
    • 2001
  • Ridge regression is compared with multiple linear regression (MLR) for determination of Research Octane Number (RON) when the baseline and signal-to-noise ratio are varied. MLR analysis of near-infrared (NIR) spectroscopic data usually encounters a collinearity problem, which adversely affects long-term prediction performance. The collinearity problem can be eliminated or greatly improved by using ridge regression, which is a biased estimation method. To evaluate the robustness of each calibration, the calibration models developed by both calibration methods were used to predict RONs of gasoline spectra in which the baseline and signal-to-noise ratio were varied. The prediction results of a ridge calibration model showed more stable prediction performance as compared to that of MLR, especially when the spectral baselines were varied. . In conclusion, ridge regression is shown to be a viable method for calibration of RON with the NIR data when only a few wavelengths are available such as hand-carry device using a few diodes.

Prediction of golf scores on the PGA tour using statistical models (PGA 투어의 골프 스코어 예측 및 분석)

  • Lim, Jungeun;Lim, Youngin;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.41-55
    • /
    • 2017
  • This study predicts the average scores of top 150 PGA golf players on 132 PGA Tour tournaments (2013-2015) using data mining techniques and statistical analysis. This study also aims to predict the Top 10 and Top 25 best players in 4 different playoffs. Linear and nonlinear regression methods were used to predict average scores. Stepwise regression, all best subset, LASSO, ridge regression and principal component regression were used for the linear regression method. Tree, bagging, gradient boosting, neural network, random forests and KNN were used for nonlinear regression method. We found that the average score increases as fairway firmness or green height or average maximum wind speed increases. We also found that the average score decreases as the number of one-putts or scrambling variable or longest driving distance increases. All 11 different models have low prediction error when predicting the average scores of PGA Tournaments in 2015 which is not included in the training set. However, the performances of Bagging and Random Forest models are the best among all models and these two models have the highest prediction accuracy when predicting the Top 10 and Top 25 best players in 4 different playoffs.

A Comparative Study on the Spatial Statistical Models for the Estimation of Population Distribution

  • Oh, Doo-Ri;Hwang, Chul Sue
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.33 no.3
    • /
    • pp.145-153
    • /
    • 2015
  • This study aims to accurately estimate population distribution more specifically than administrative unites using a RK (Regression-Kriging) model. The RK model is the areal interpolation technique that involves linear regression and the Kriging model. In order to estimate a population’s distribution using a sample region, four different models were used, namely; a regression model, RK model, OK (Ordinary Kriging) model and CK (Co-Kriging) model. The results were then compared with each other. Evaluation of the accuracy and validity of evaluation analysis results were the basis RMSE (Root Mean Square Error), MAE (Mean Absolute Error), G statistic and correlation coefficient (ρ). In the sample regions, every statistic value of the RK model showed better results than other models. The results of this comparative study will be useful to estimate a population distribution of the metropolitan areas with high population density

Variable Selection in Linear Random Effects Models for Normal Data

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.4
    • /
    • pp.407-420
    • /
    • 1998
  • This paper is concerned with selecting covariates to be included in building linear random effects models designed to analyze clustered response normal data. It is based on a Bayesian approach, intended to propose and develop a procedure that uses probabilistic considerations for selecting premising subsets of covariates. The approach reformulates the linear random effects model in a hierarchical normal and point mass mixture model by introducing a set of latent variables that will be used to identify subset choices. The hierarchical model is flexible to easily accommodate sign constraints in the number of regression coefficients. Utilizing Gibbs sampler, the appropriate posterior probability of each subset of covariates is obtained. Thus, In this procedure, the most promising subset of covariates can be identified as that with highest posterior probability. The procedure is illustrated through a simulation study.

  • PDF

Network traffic prediction model based on linear and nonlinear model combination

  • Lian Lian
    • ETRI Journal
    • /
    • v.46 no.3
    • /
    • pp.461-472
    • /
    • 2024
  • We propose a network traffic prediction model based on linear and nonlinear model combination. Network traffic is modeled by an autoregressive moving average model, and the error between the measured and predicted network traffic values is obtained. Then, an echo state network is used to fit the prediction error with nonlinear components. In addition, an improved slime mold algorithm is proposed for reservoir parameter optimization of the echo state network, further improving the regression performance. The predictions of the linear (autoregressive moving average) and nonlinear (echo state network) models are added to obtain the final prediction. Compared with other prediction models, test results on two network traffic datasets from mobile and fixed networks show that the proposed prediction model has a smaller error and difference measures. In addition, the coefficient of determination and index of agreement is close to 1, indicating a better data fitting performance. Although the proposed prediction model has a slight increase in time complexity for training and prediction compared with some models, it shows practical applicability.

A Flexible Statistical Growth Model for Describing Plant Disease Progress (식물병(植物病) 진전(進展)의 한 유연적(柔軟的)인 통계적(統計的) 생장(生長) 모델)

  • Kim, Choong-Hoe
    • Korean journal of applied entomology
    • /
    • v.26 no.1 s.70
    • /
    • pp.31-36
    • /
    • 1987
  • A piecewise linear regression model able to describe disease progress curves with simplicity and flexibility was developed in this study. The model divides whole epidemic into several pieces of simple linear regression based on changes in pattern of disease progress in the epidemic and then incorporates the pieces of linear regression into a single mathematical function using indicator variables. When twelve epidemic data obtained from the field experiments were fitted to the piecewise linear regression model, logistic model and Gompertz model to compare statistical fit, goodness of fit was greatly improved with piecewise linear regression compared to other two models. Simplicity, flexibility, accuracy and ease in parameter estimation of the piece-wise linear regression model were described with examples of real epidemic data. The result in this study suggests that piecewise linear regression model is an useful technique for modeling plant disease epidemic.

  • PDF

Optimized Neural Network Weights and Biases Using Particle Swarm Optimization Algorithm for Prediction Applications

  • Ahmadzadeh, Ezat;Lee, Jieun;Moon, Inkyu
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.8
    • /
    • pp.1406-1420
    • /
    • 2017
  • Artificial neural networks (ANNs) play an important role in the fields of function approximation, prediction, and classification. ANN performance is critically dependent on the input parameters, including the number of neurons in each layer, and the optimal values of weights and biases assigned to each neuron. In this study, we apply the particle swarm optimization method, a popular optimization algorithm for determining the optimal values of weights and biases for every neuron in different layers of the ANN. Several regression models, including general linear regression, Fourier regression, smoothing spline, and polynomial regression, are conducted to evaluate the proposed method's prediction power compared to multiple linear regression (MLR) methods. In addition, residual analysis is conducted to evaluate the optimized ANN accuracy for both training and test datasets. The experimental results demonstrate that the proposed method can effectively determine optimal values for neuron weights and biases, and high accuracy results are obtained for prediction applications. Evaluations of the proposed method reveal that it can be used for prediction and estimation purposes, with a high accuracy ratio, and the designed model provides a reliable technique for optimization. The simulation results show that the optimized ANN exhibits superior performance to MLR for prediction purposes.

Development of the Algorithm for Optimizing Wavelength Selection in Multiple Linear Regression

  • Hoeil Chung
    • Near Infrared Analysis
    • /
    • v.1 no.1
    • /
    • pp.1-7
    • /
    • 2000
  • A convenient algorithm for optimizing wavelength selection in multiple linear regression (MLR) has been developed. MOP (MLP Optimization Program) has been developed to test all possible MLR calibration models in a given spectral range and finally find an optimal MLR model with external validation capability. MOP generates all calibration models from all possible combinations of wavelength, and simultaneously calculates SEC (Standard Error of Calibration) and SEV (Standard Error of Validation) by predicting samples in a validation data set. Finally, with determined SEC and SEV, it calculates another parameter called SAD (Sum of SEC, SEV, and Absolute Difference between SEC and SEV: sum(SEC+SEV+Abs(SEC-SEV)). SAD is an useful parameter to find an optimal calibration model without over-fitting by simultaneously evaluating SEC, SEV, and difference of error between calibration and validation. The calibration model corresponding to the smallest SAD value is chosen as an optimum because the errors in both calibration and validation are minimal as well as similar in scale. To evaluate the capability of MOP, the determination of benzene content in unleaded gasoline has been examined. MOP successfully found the optimal calibration model and showed the better calibration and independent prediction performance compared to conventional MLR calibration.

Application of Regularized Linear Regression Models Using Public Domain data for Cycle Life Prediction of Commercial Lithium-Ion Batteries (상업용 리튬 배터리의 수명 예측을 위한 고속대량충방전 데이터 정규화 선형회귀모델의 적용)

  • KIM, JANG-GOON;LEE, JONG-SOOK
    • Journal of Hydrogen and New Energy
    • /
    • v.32 no.6
    • /
    • pp.592-611
    • /
    • 2021
  • In this study a rarely available high-throughput cycling data set of 124 commercial lithium iron phosphate/graphite cells cycled under fast-charging conditions, with widely varying cycle lives ranging from 150 to 2,300 cycles including in-cycle temperature and per-cycle IR measurements. We worked out own Python codes which reproduced the various data plots and machine learning approaches for cycle life prediction using early cycles and more details not presented in the article and the supplementary information. Particularly, we applied regularized ridge, lasso and elastic net linear regression models using features extracted from capacity fade curves, discharge voltage curves, and other data such as internal resistance and cell can temperature. We found that due to the limitation in the quantity and quality of the data from costly and lengthy battery testing a careful hyperparameter tuning may be required and that model features need to be extracted based on the domain knowledge.