• Title/Summary/Keyword: multiple linear regression

Search Result 1,710, Processing Time 0.032 seconds

Prediction of Pitting Corrosion Characteristics of AL-6XN Steel with Sensitization and Environmental Variables Using Multiple Linear Regression Method (다중선형회귀법을 활용한 예민화와 환경변수에 따른 AL-6XN강의 공식특성 예측)

  • Jung, Kwang-Hu;Kim, Seong-Jong
    • Corrosion Science and Technology
    • /
    • v.19 no.6
    • /
    • pp.302-309
    • /
    • 2020
  • This study aimed to predict the pitting corrosion characteristics of AL-6XN super-austenitic steel using multiple linear regression. The variables used in the model are degree of sensitization, temperature, and pH. Experiments were designed and cyclic polarization curve tests were conducted accordingly. The data obtained from the cyclic polarization curve tests were used as training data for the multiple linear regression model. The significance of each factor in the response (critical pitting potential, repassivation potential) was analyzed. The multiple linear regression model was validated using experimental conditions that were not included in the training data. As a result, the degree of sensitization showed a greater effect than the other variables. Multiple linear regression showed poor performance for prediction of repassivation potential. On the other hand, the model showed a considerable degree of predictive performance for critical pitting potential. The coefficient of determination (R2) was 0.7745. The possibility for pitting potential prediction was confirmed using multiple linear regression.

An Approach to Applying Multiple Linear Regression Models by Interlacing Data in Classifying Similar Software

  • Lim, Hyun-il
    • Journal of Information Processing Systems
    • /
    • v.18 no.2
    • /
    • pp.268-281
    • /
    • 2022
  • The development of information technology is bringing many changes to everyday life, and machine learning can be used as a technique to solve a wide range of real-world problems. Analysis and utilization of data are essential processes in applying machine learning to real-world problems. As a method of processing data in machine learning, we propose an approach based on applying multiple linear regression models by interlacing data to the task of classifying similar software. Linear regression is widely used in estimation problems to model the relationship between input and output data. In our approach, multiple linear regression models are generated by training on interlaced feature data. A combination of these multiple models is then used as the prediction model for classifying similar software. Experiments are performed to evaluate the proposed approach as compared to conventional linear regression, and the experimental results show that the proposed method classifies similar software more accurately than the conventional model. We anticipate the proposed approach to be applied to various kinds of classification problems to improve the accuracy of conventional linear regression.

Inter-comparison of Prediction Skills of Multiple Linear Regression Methods Using Monthly Temperature Simulated by Multi-Regional Climate Models (다중 지역기후모델로부터 모의된 월 기온자료를 이용한 다중선형회귀모형들의 예측성능 비교)

  • Seong, Min-Gyu;Kim, Chansoo;Suh, Myoung-Seok
    • Atmosphere
    • /
    • v.25 no.4
    • /
    • pp.669-683
    • /
    • 2015
  • In this study, we investigated the prediction skills of four multiple linear regression methods for monthly air temperature over South Korea. We used simulation results from four regional climate models (RegCM4, SNURCM, WRF, and YSURSM) driven by two boundary conditions (NCEP/DOE Reanalysis 2 and ERA-Interim). We selected 15 years (1989~2003) as the training period and the last 5 years (2004~2008) as validation period. The four regression methods used in this study are as follows: 1) Homogeneous Multiple linear Regression (HMR), 2) Homogeneous Multiple linear Regression constraining the regression coefficients to be nonnegative (HMR+), 3) non-homogeneous multiple linear regression (EMOS; Ensemble Model Output Statistics), 4) EMOS with positive coefficients (EMOS+). It is same method as the third method except for constraining the coefficients to be nonnegative. The four regression methods showed similar prediction skills for the monthly air temperature over South Korea. However, the prediction skills of regression methods which don't constrain regression coefficients to be nonnegative are clearly impacted by the existence of outliers. Among the four multiple linear regression methods, HMR+ and EMOS+ methods showed the best skill during the validation period. HMR+ and EMOS+ methods showed a very similar performance in terms of the MAE and RMSE. Therefore, we recommend the HMR+ as the best method because of ease of development and applications.

Pre-processing and Bias Correction for AMSU-A Radiance Data Based on Statistical Methods (통계적 방법에 근거한 AMSU-A 복사자료의 전처리 및 편향보정)

  • Lee, Sihye;Kim, Sangil;Chun, Hyoung-Wook;Kim, Ju-Hye;Kang, Jeon-Ho
    • Atmosphere
    • /
    • v.24 no.4
    • /
    • pp.491-502
    • /
    • 2014
  • As a part of the KIAPS (Korea Institute of Atmospheric Prediction Systems) Package for Observation Processing (KPOP), we have developed the modules for Advanced Microwave Sounding Unit-A (AMSU-A) pre-processing and its bias correction. The KPOP system calculates the airmass bias correction coefficients via the method of multiple linear regression in which the scan-corrected innovation and the thicknesses of 850~300, 200~50, 50~5, and 10~1 hPa are respectively used for dependent and independent variables. Among the four airmass predictors, the multicollinearity has been shown by the Variance Inflation Factor (VIF) that quantifies the severity of multicollinearity in a least square regression. To resolve the multicollinearity, we adopted simple linear regression and Principal Component Regression (PCR) to calculate the airmass bias correction coefficients and compared the results with those from the multiple linear regression. The analysis shows that the order of performances is multiple linear, principal component, and simple linear regressions. For bias correction for the AMSU-A channel 4 which is the most sensitive to the lower troposphere, the multiple linear regression with all four airmass predictors is superior to the simple linear regression with one airmass predictor of 850~300 hPa. The results of PCR with 95% accumulated variances accounted for eigenvalues showed the similar results of the multiple linear regression.

A Study on Stochastic Estimation of Monthly Runoff by Multiple Regression Analysis (다중회귀분석에 의한 하천 월 유출량의 추계학적 추정에 관한 연구)

  • 김태철;정하우
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.22 no.3
    • /
    • pp.75-87
    • /
    • 1980
  • Most hydro]ogic phenomena are the complex and organic products of multiple causations like climatic and hydro-geological factors. A certain significant correlation on the run-off in river basin would be expected and foreseen in advance, and the effect of each these causual and associated factors (independant variables; present-month rainfall, previous-month run-off, evapotranspiration and relative humidity etc.) upon present-month run-off(dependent variable) may be determined by multiple regression analysis. Functions between independant and dependant variables should be treated repeatedly until satisfactory and optimal combination of independant variables can be obtained. Reliability of the estimated function should be tested according to the result of statistical criterion such as analysis of variance, coefficient of determination and significance-test of regression coefficients before first estimated multiple regression model in historical sequence is determined. But some error between observed and estimated run-off is still there. The error arises because the model used is an inadequate description of the system and because the data constituting the record represent only a sample from a population of monthly discharge observation, so that estimates of model parameter will be subject to sampling errors. Since this error which is a deviation from multiple regression plane cannot be explained by first estimated multiple regression equation, it can be considered as a random error governed by law of chance in nature. This unexplained variance by multiple regression equation can be solved by stochastic approach, that is, random error can be stochastically simulated by multiplying random normal variate to standard error of estimate. Finally hybrid model on estimation of monthly run-off in nonhistorical sequence can be determined by combining the determistic component of multiple regression equation and the stochastic component of random errors. Monthly run-off in Naju station in Yong-San river basin is estimated by multiple regression model and hybrid model. And some comparisons between observed and estimated run-off and between multiple regression model and already-existing estimation methods such as Gajiyama formula, tank model and Thomas-Fiering model are done. The results are as follows. (1) The optimal function to estimate monthly run-off in historical sequence is multiple linear regression equation in overall-month unit, that is; Qn=0.788Pn+0.130Qn-1-0.273En-0.1 About 85% of total variance of monthly runoff can be explained by multiple linear regression equation and its coefficient of determination (R2) is 0.843. This means we can estimate monthly runoff in historical sequence highly significantly with short data of observation by above mentioned equation. (2) The optimal function to estimate monthly runoff in nonhistorical sequence is hybrid model combined with multiple linear regression equation in overall-month unit and stochastic component, that is; Qn=0. 788Pn+0. l30Qn-1-0. 273En-0. 10+Sy.t The rest 15% of unexplained variance of monthly runoff can be explained by addition of stochastic process and a bit more reliable results of statistical characteristics of monthly runoff in non-historical sequence are derived. This estimated monthly runoff in non-historical sequence shows up the extraordinary value (maximum, minimum value) which is not appeared in the observed runoff as a random component. (3) "Frequency best fit coefficient" (R2f) of multiple linear regression equation is 0.847 which is the same value as Gaijyama's one. This implies that multiple linear regression equation and Gajiyama formula are theoretically rather reasonable functions.

  • PDF

Quantitative Analysis by Derivative Spectrophotometry (III) -Simultaneous quantitation of vitamin B group and vitamin C in by multiple linear regression analysis-

  • Park, Man-Ki;Cho, Jung-Hwan
    • Archives of Pharmacal Research
    • /
    • v.11 no.1
    • /
    • pp.45-51
    • /
    • 1988
  • The feature of resolution enhancement by derivative operation is linked to one of the multivariate analysis, which is multiple linear regression with two options, all possible and stepwise regression. Examined samples were synthetic mixtures of 5 vitamins, thiamine mononitrate, riboflavin phosphate, nicotinamide, pyridoxine hydrochloride and ascorbic acid. All components in mixture were quantified with reasonably good accuracy and precision. Whole data processing procedure was accomplished on-line by the development of three computer programs written in APPLESOFT BASIC language.

  • PDF

Forecasting of Seasonal Inflow to Reservoir Using Multiple Linear Regression (다중선형회귀분석에 의한 계절별 저수지 유입량 예측)

  • Kang, Jaewon
    • Journal of Environmental Science International
    • /
    • v.22 no.8
    • /
    • pp.953-963
    • /
    • 2013
  • Reliable long-term streamflow forecasting is invaluable for water resource planning and management which allocates water supply according to the demand of water users. Forecasting of seasonal inflow to Andong dam is performed and assessed using statistical methods based on hydrometeorological data. Predictors which is used to forecast seasonal inflow to Andong dam are selected from southern oscillation index, sea surface temperature, and 500 hPa geopotential height data in northern hemisphere. Predictors are selected by the following procedure. Primary predictors sets are obtained, and then final predictors are determined from the sets. The primary predictor sets for each season are identified using cross correlation and mutual information. The final predictors are identified using partial cross correlation and partial mutual information. In each season, there are three selected predictors. The values are determined using bootstrapping technique considering a specific significance level for predictor selection. Seasonal inflow forecasting is performed by multiple linear regression analysis using the selected predictors for each season, and the results of forecast using cross validation are assessed. Multiple linear regression analysis is performed using SAS. The results of multiple linear regression analysis are assessed by mean squared error and mean absolute error. And contingency table is established and assessed by Heidke skill score. The assessment reveals that the forecasts by multiple linear regression analysis are better than the reference forecasts.

Bayesian Estimation for the Multiple Regression with Censored Data : Mutivariate Normal Error Terms

  • Yoon, Yong-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.2
    • /
    • pp.165-172
    • /
    • 1998
  • This paper considers a linear regression model with censored data where each error term follows a multivariate normal distribution. In this paper we consider the diffuse prior distribution for parameters of the linear regression model. With censored data we derive the full conditional densities for parameters of a multiple regression model in order to obtain the marginal posterior densities of the relevant parameters through the Gibbs Sampler, which was proposed by Geman and Geman(1984) and utilized by Gelfand and Smith(1990) with statistical viewpoint.

  • PDF

Quantitative Analysis by Diffuse Reflectance Infrared Fourier Transform and Linear Stepwise Multiple Regression Analysis I -Simultaneous quantitation of ethenzamide, isopropylantipyrine, caffeine, and allylisopropylacetylurea in tablet by DRIFT and linear stepwise multiple regression analysis-

  • Park, Man-Ki;Yoon, Hye-Ran;Kim, Kyoung-Ho;Cho, Jung-Hwan
    • Archives of Pharmacal Research
    • /
    • v.11 no.2
    • /
    • pp.99-113
    • /
    • 1988
  • Quantitation of ethenzamide, isopropylantipyrine and caffeine takes about 41 hrs by conventional GC method. Quantitation of allylisoprorylacetylurea takes about 40 hrs by conventional UV method. But quantitation of them takes about 6 hrs by DRIFT developing method. Each standard and sample sieved, powdered and acquired DRIFT spectrum. Out of them peak of each component was selected and ratio of each peak to standard peak was acquired, and then linear stepwise multiple regression was performed with these data and concentration. Reflectance value, Kubelka-Munk equation and Inverse-Kubelka-Munk equation were modified by us. Inverse-Kubelka-Munk equation completed the deficit of Kubelka-Munk equation. Correlation coefficients acquired by conventioanl GC and UV against DRIFT were more than 0.95.

  • PDF

Prediction of New Confirmed Cases of COVID-19 based on Multiple Linear Regression and Random Forest (다중 선형 회귀와 랜덤 포레스트 기반의 코로나19 신규 확진자 예측)

  • Kim, Jun Su;Choi, Byung-Jae
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.249-255
    • /
    • 2022
  • The COVID-19 virus appeared in 2019 and is extremely contagious. Because it is very infectious and has a huge impact on people's mobility. In this paper, multiple linear regression and random forest models are used to predict the number of COVID-19 cases using COVID-19 infection status data (open source data provided by the Ministry of health and welfare) and Google Mobility Data, which can check the liquidity of various categories. The data has been divided into two sets. The first dataset is COVID-19 infection status data and all six variables of Google Mobility Data. The second dataset is COVID-19 infection status data and only two variables of Google Mobility Data: (1) Retail stores and leisure facilities (2) Grocery stores and pharmacies. The models' performance has been compared using the mean absolute error indicator. We also a correlation analysis of the random forest model and the multiple linear regression model.