• Title/Summary/Keyword: Linear regression models

Search Result 944, Processing Time 0.023 seconds

Statistical notes for clinical researchers: simple linear regression 3 - residual analysis

  • Kim, Hae-Young
    • Restorative Dentistry and Endodontics
    • /
    • v.44 no.1
    • /
    • pp.11.1-11.8
    • /
    • 2019
  • In the previous sections, simple linear regression (SLR) 1 and 2, we developed a SLR model and evaluated its predictability. To obtain the best fitted line the intercept and slope were calculated by using the least square method. Predictability of the model was assessed by the proportion of the explained variability among the total variation of the response variable. In this session, we will discuss four basic assumptions of regression models for justification of the estimated regression model and residual analysis to check them.

Effects of curvature on leverage in nonlinear regression

  • Kahng, Myung-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.913-917
    • /
    • 2009
  • The measures of leverage in linear regression has been extended to nonlinear regression models. We consider several curvature measures of nonlinearity in an estimation situation. The relationship between measures of leverage and statistical curvature are explored in nonlinear regression models. The circumstances under which the Jacobian leverage reduces to a tangent plane leverage are discussed in connection with the effective residual curvature of the nonlinear model.

  • PDF

Water consumption prediction based on machine learning methods and public data

  • Kesornsit, Witwisit;Sirisathitkul, Yaowarat
    • Advances in Computational Design
    • /
    • v.7 no.2
    • /
    • pp.113-128
    • /
    • 2022
  • Water consumption is strongly affected by numerous factors, such as population, climatic, geographic, and socio-economic factors. Therefore, the implementation of a reliable predictive model of water consumption pattern is challenging task. This study investigates the performance of predictive models based on multi-layer perceptron (MLP), multiple linear regression (MLR), and support vector regression (SVR). To understand the significant factors affecting water consumption, the stepwise regression (SW) procedure is used in MLR to obtain suitable variables. Then, this study also implements three predictive models based on these significant variables (e.g., SWMLR, SWMLP, and SWSVR). Annual data of water consumption in Thailand during 2006 - 2015 were compiled and categorized by provinces and distributors. By comparing the predictive performance of models with all variables, the results demonstrate that the MLP models outperformed the MLR and SVR models. As compared to the models with selected variables, the predictive capability of SWMLP was superior to SWMLR and SWSVR. Therefore, the SWMLP still provided satisfactory results with the minimum number of explanatory variables which in turn reduced the computation time and other resources required while performing the predictive task. It can be concluded that the MLP exhibited the best result and can be utilized as a reliable water demand predictive model for both of all variables and selected variables cases. These findings support important implications and serve as a feasible water consumption predictive model and can be used for water resources management to produce sufficient tap water to meet the demand in each province of Thailand.

Forecasting Energy Consumption of Steel Industry Using Regression Model (회귀 모델을 활용한 철강 기업의 에너지 소비 예측)

  • Sung-Ho KANG;Hyun-Ki KIM
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.2
    • /
    • pp.21-25
    • /
    • 2023
  • The purpose of this study was to compare the performance using multiple regression models to predict the energy consumption of steel industry. Specific independent variables were selected in consideration of correlation among various attributes such as CO2 concentration, NSM, Week Status, Day of week, and Load Type, and preprocessing was performed to solve the multicollinearity problem. In data preprocessing, we evaluated linear and nonlinear relationships between each attribute through correlation analysis. In particular, we decided to select variables with high correlation and include appropriate variables in the final model to prevent multicollinearity problems. Among the many regression models learned, Boosted Decision Tree Regression showed the best predictive performance. Ensemble learning in this model was able to effectively learn complex patterns while preventing overfitting by combining multiple decision trees. Consequently, these predictive models are expected to provide important information for improving energy efficiency and management decision-making at steel industry. In the future, we plan to improve the performance of the model by collecting more data and extending variables, and the application of the model considering interactions with external factors will also be considered.

Bayesian Curve-Fitting in Semiparametric Small Area Models with Measurement Errors

  • Hwang, Jinseub;Kim, Dal Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.4
    • /
    • pp.349-359
    • /
    • 2015
  • We study a semiparametric Bayesian approach to small area estimation under a nested error linear regression model with area level covariate subject to measurement error. Consideration is given to radial basis functions for the regression spline and knots on a grid of equally spaced sample quantiles of covariate with measurement errors in the nested error linear regression model setup. We conduct a hierarchical Bayesian structural measurement error model for small areas and prove the propriety of the joint posterior based on a given hierarchical Bayesian framework since some priors are defined non-informative improper priors that uses Markov Chain Monte Carlo methods to fit it. Our methodology is illustrated using numerical examples to compare possible models based on model adequacy criteria; in addition, analysis is conducted based on real data.

Predicting the Soluble Solids of Apples by Near Infrared Spectroscopy (I) - Multiple Linear Regression Models - (근적외선을 이용한 사과의 당도예측 (I) - 다중회귀모델 -)

  • ;W. R. Hruschka;J. A. Abbott;;B. S. Park
    • Journal of Biosystems Engineering
    • /
    • v.23 no.6
    • /
    • pp.561-570
    • /
    • 1998
  • The MLR(Multiple Linear Regression) models to estimate soluble solids content non-destructively were presented to make a selection of optimal photosensor utilized to measure the soluble solids content of apples. Visible and NIR absorbance in the 400 to 2498 nanometer(nm) wavelength region, soluble solids content(sugar content), hardness, and weight were measured for 400 apples(gala). Spectrophotometer with fiber optic probe was utilized for spectrum measurement and digital refractometer was used for soluble solids content. Correlation between absorbance spectrum and soluble solids content was analyzed to pick out the optimal wavelengths and to develop corresponding prediction model by means of MLR. For the coefficient of determination($R^2$) to be over 0.92, the MLR models out of the original absorbance were built based on 7 wavelengths of 992, 904, 1096, 1032, 880, 824, 1048nm, and the ones of the second derivative absorbance based on 5 wavelengths of 784, 1056, 992, 808, 872nm. The best model of the second derivative absorbance spectrum had $R^2$=0.91, bias= -0.02bx, SEP=0.28bx for unknown samples.

  • PDF

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

Safety Performance Models of Improvement Projects of Frequent Traffic Accident Locations (사고잦은곳 개선사업의 안전성과 모형)

  • Park, Byung-Ho;Park, Gil-Su;Kim, Tae-Young
    • Journal of the Korean Society of Safety
    • /
    • v.25 no.2
    • /
    • pp.89-94
    • /
    • 2010
  • This study deals with the traffic accident according to the improvement projects of frequent accident locations. The objective is to analyze the impact of improvements on the accident reduction. In pursuing the above, the study gives the particular attentions to developing the models based on the data of 70 intersections improved. The main results analyzed are as follows. First, 4 multiple linear regression accident models(total, side right-angle, rear end and side stripe accident) which were statistically significant were developed. Second, total accidents reduction by sight-distance and turning traffic flow improvements, side right-angle by sight-distance, over-speed and lane operation, rear end by turning traffic flow, signal and lane operation, and side stripe by traffic impedance improvements were analyzed. Finally, the above 4 models were evaluated to be statically significant through the correlation analysis and pair-sample t-test.

Development of Statistical Model and Neural Network Model for Tensile Strength Estimation in Laser Material Processing of Aluminum Alloy (알루미늄 합금의 레이저 가공에서 인장 강도 예측을 위한 회귀 모델 및 신경망 모델의 개발)

  • Park, Young-Whan;Rhee, Se-Hun
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.24 no.4 s.193
    • /
    • pp.93-101
    • /
    • 2007
  • Aluminum alloy which is one of the light materials has been tried to apply to light weight vehicle body. In order to do that, welding technology is very important. In case of the aluminum laser welding, the strength of welded part is reduced due to porosity, underfill, and magnesium loss. To overcome these problems, laser welding of aluminum with filler wire was suggested. In this study, experiment about laser welding of AA5182 aluminum alloy with AA5356 filler wire was performed according to process parameters such as laser power, welding speed and wire feed rate. The tensile strength was measured to find the weldability of laser welding with filler wire. The models to estimate tensile strength were suggested using three regression models and one neural network model. For regression models, one was the multiple linear regression model, another was the second order polynomial regression model, and the other was the multiple nonlinear regression model. Neural network model with 2 hidden layers which had 5 and 3 nodes respectively was investigated to find the most suitable model for the system. Estimation performance was evaluated for each model using the average error rate. Among the three regression models, the second order polynomial regression model had the best estimation performance. For all models, neural network model has the best estimation performance.

Optimize OTDOA-based Positioning Accuracy by Utilizing Multiple Linear Regression Model under NB-IoT Technology (NB-IoT 기술에서 Multiple Linear Regression Model을 활용하여 OTDOA 기반 포지셔닝 정확도 최적화)

  • Pan, Yichen;Kim, Jaesoo
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.139-142
    • /
    • 2020
  • NB-IoT(Narrow Band Internet of Things) is an emerging LPWAN(Low Power Wide Area Network) radio technology. NB-IoT has many advantages like low power, low cost, and high coverage. However low bandwidth and low sampling rates also lead to poor positioning accuracy. This paper proposed a solution to optimize positioning accuracy under the OTDOA(Observed Time Difference of Arrival) approach by utilizing MLR(Multiple Linear Regression) models. Through the MLR model to predict the influence degree of weather(temperature, humidity, light intensity and air pressure) on the arrival time of signal transmission to improve the measurement accuracy. The improvement of measurement accuracy can greatly improve IoT applications based on NB-IoT.

  • PDF