• 제목/요약/키워드: multiple linear regression models

검색결과 327건 처리시간 0.023초

An Approach to Applying Multiple Linear Regression Models by Interlacing Data in Classifying Similar Software

  • Lim, Hyun-il
    • Journal of Information Processing Systems
    • /
    • 제18권2호
    • /
    • pp.268-281
    • /
    • 2022
  • The development of information technology is bringing many changes to everyday life, and machine learning can be used as a technique to solve a wide range of real-world problems. Analysis and utilization of data are essential processes in applying machine learning to real-world problems. As a method of processing data in machine learning, we propose an approach based on applying multiple linear regression models by interlacing data to the task of classifying similar software. Linear regression is widely used in estimation problems to model the relationship between input and output data. In our approach, multiple linear regression models are generated by training on interlaced feature data. A combination of these multiple models is then used as the prediction model for classifying similar software. Experiments are performed to evaluate the proposed approach as compared to conventional linear regression, and the experimental results show that the proposed method classifies similar software more accurately than the conventional model. We anticipate the proposed approach to be applied to various kinds of classification problems to improve the accuracy of conventional linear regression.

Developing Accident Models of Rotary by Accident Occurrence Location (로터리 사고발생 위치별 사고모형 개발)

  • Na, Hee;Park, Byung-Ho
    • International Journal of Highway Engineering
    • /
    • 제14권4호
    • /
    • pp.83-91
    • /
    • 2012
  • PURPOSES : This study deals with Rotary by Accident Occurrence Location. The purpose of this study is to develop the accident models of rotary by location. METHODS : In pursuing the above, this study gives particular attentions to developing the appropriate models using multiple linear, Poisson and negative binomial regression models and statistical analysis tools. RESULTS : First, four multiple linear regression models which are statistically significant(their $R^2$ values are 0.781, 0.300, 0.784 and 0.644 respectively) are developed, and four Poisson regression models which are statistically significant(their ${\rho}^2$ values are 0.407, 0.306, 0.378 and 0.366 respectively) are developed. Second, the test results of fitness using RMSE, %RMSE, MPB and MAD show that Poisson regression model in the case of circulatory roadway, pedestrian crossing and others and multiple linear regression model in the case of entry/exit sections are appropriate to the given data. Finally, the common variable that affects to the accident is adopted to be traffic volume. CONCLUSIONS : 8 models which are all statistically significant are developed, and the common and specific variables that are related to the models are derived.

Traffic Accident Models of 3-Legged Signalized Intersections in the Case of Cheongju (3지 신호교차로의 교통사고 발생모형 - 청주시를 사례로 -)

  • Park, Byung-Ho;Han, Sang-Uk;Kim, Tae-Young
    • Journal of the Korean Society of Safety
    • /
    • 제24권2호
    • /
    • pp.94-99
    • /
    • 2009
  • This study deals with the traffic accidents at the 3-legged signalized intersections in Cheongu. The goals are to analyze the geometric, traffic and operational conditions of intersections and to develop a various functional forms that predict the accidents. The models are developed through the correlation analysis, the multiple linear, the multiple nonlinear, Poisson and negative binomial regression analysis. In this study, two multiple linear, two multiple nonlinear and two negative binomial regression models were calibrated. These models were all analyzed to be statistically significant. All the models include 2 common variables(traffic volume and lane width) and model-specific variables. These variables are, therefore, evaluated to be critical to the accident reduction of Cheongju.

Inter-comparison of Prediction Skills of Multiple Linear Regression Methods Using Monthly Temperature Simulated by Multi-Regional Climate Models (다중 지역기후모델로부터 모의된 월 기온자료를 이용한 다중선형회귀모형들의 예측성능 비교)

  • Seong, Min-Gyu;Kim, Chansoo;Suh, Myoung-Seok
    • Atmosphere
    • /
    • 제25권4호
    • /
    • pp.669-683
    • /
    • 2015
  • In this study, we investigated the prediction skills of four multiple linear regression methods for monthly air temperature over South Korea. We used simulation results from four regional climate models (RegCM4, SNURCM, WRF, and YSURSM) driven by two boundary conditions (NCEP/DOE Reanalysis 2 and ERA-Interim). We selected 15 years (1989~2003) as the training period and the last 5 years (2004~2008) as validation period. The four regression methods used in this study are as follows: 1) Homogeneous Multiple linear Regression (HMR), 2) Homogeneous Multiple linear Regression constraining the regression coefficients to be nonnegative (HMR+), 3) non-homogeneous multiple linear regression (EMOS; Ensemble Model Output Statistics), 4) EMOS with positive coefficients (EMOS+). It is same method as the third method except for constraining the coefficients to be nonnegative. The four regression methods showed similar prediction skills for the monthly air temperature over South Korea. However, the prediction skills of regression methods which don't constrain regression coefficients to be nonnegative are clearly impacted by the existence of outliers. Among the four multiple linear regression methods, HMR+ and EMOS+ methods showed the best skill during the validation period. HMR+ and EMOS+ methods showed a very similar performance in terms of the MAE and RMSE. Therefore, we recommend the HMR+ as the best method because of ease of development and applications.

Prediction of New Confirmed Cases of COVID-19 based on Multiple Linear Regression and Random Forest (다중 선형 회귀와 랜덤 포레스트 기반의 코로나19 신규 확진자 예측)

  • Kim, Jun Su;Choi, Byung-Jae
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • 제17권4호
    • /
    • pp.249-255
    • /
    • 2022
  • The COVID-19 virus appeared in 2019 and is extremely contagious. Because it is very infectious and has a huge impact on people's mobility. In this paper, multiple linear regression and random forest models are used to predict the number of COVID-19 cases using COVID-19 infection status data (open source data provided by the Ministry of health and welfare) and Google Mobility Data, which can check the liquidity of various categories. The data has been divided into two sets. The first dataset is COVID-19 infection status data and all six variables of Google Mobility Data. The second dataset is COVID-19 infection status data and only two variables of Google Mobility Data: (1) Retail stores and leisure facilities (2) Grocery stores and pharmacies. The models' performance has been compared using the mean absolute error indicator. We also a correlation analysis of the random forest model and the multiple linear regression model.

Multiple Structural Change-Point Estimation in Linear Regression Models

  • Kim, Jae-Hee
    • Communications for Statistical Applications and Methods
    • /
    • 제19권3호
    • /
    • pp.423-432
    • /
    • 2012
  • This paper is concerned with the detection of multiple change-points in linear regression models. The proposed procedure relies on the local estimation for global change-point estimation. We propose a multiple change-point estimator based on the local least squares estimators for the regression coefficients and the split measure when the number of change-points is unknown. Its statistical properties are shown and its performance is assessed by simulations and real data applications.

Development of the Index for Estimating the Arc Status in the Short-circuiting Transfer Region of GMA Welding (GMA용접의 단락이행영역에 있어서 아크 상태 평가를 위한 모델 개발)

  • 강문진;이세헌;엄기원
    • Journal of Welding and Joining
    • /
    • 제17권4호
    • /
    • pp.85-92
    • /
    • 1999
  • In GMAW, the spatter is generated because of the variation of the arc state. If the arc state is quantitatively assessed, the control method to make the spatter be reduced is able to develop. This study was attempted to develop the optimal model that could estimate the arc state quantitatively. To do this, the generated spatters was captured under the limited welding conditions, and the waveforms of the arc voltage and of the welding current were collected. From the collected waveforms, the waveform factors and their standard deviations were produced, and the linear and non-linear regression models constituted using the factors and their standard deviations are proposed to estimate the arc state. the performance test to the proposed models was practiced. Obtained results are as follow. From the results of correlation analysis between the factors and the amount of the generated spatters, the standard deviations of the waveform factors have more the multiple regression coefficients than the waveform factors. Because the correlation coefficient between T and {TEX}$T_{a}${/TEX}, and s[T] and s[{TEX}$T_{a}${/TEX}] was nearly one, it was found that these factors have the same effect to the spatter generation. In the regression models to estimate the arc state, it was fond that the linear and the non linear models were also consisted of similar factors. In addition, the linear regression model was assessed the optimal model for estimating the arc state because the variance of data was narrow and multiple regression coefficient was highest among the models. But in the welding conditions which the amount of the generated spatters were small, it was found that the non linear regression model had better the estimation performance for the spatter generation than the linear.

  • PDF

Prediction Models of Residual Chlorine in Sediment Basin to Control Pre-chlorination in Water Treatment Plant (정수장 전염소 공정 제어를 위한 침전지 잔류 염소 농도 예측모델 개발)

  • Lee, Kyung-Hyuk;Kim, Ju-Hwan;Lim, Jae-Lim;Chae, Seon Ha
    • Journal of Korean Society of Water and Wastewater
    • /
    • 제21권5호
    • /
    • pp.601-607
    • /
    • 2007
  • In order to maintain constant residual chlorine in sedimentation basin, It is necessary to develop real time prediction model of residual chlorine considering water treatment plant data such as water qualities, weather, and plant operation conditions. Based on the operation data acquired from K water treatment plant, prediction models of residual chlorine in sediment basin were accomplished. The input parameters applied in the models were water temperature, turbidity, pH, conductivity, flow rate, alkalinity and pre-chlorination dosage. The multiple regression models were established with linear and non-linear model with 5,448 data set. The corelation coefficient (R) for the linear and non-linear model were 0.39 and 0.374, respectively. It shows low correlation coefficient, that is, these multiple regression models can not represent the residual chlorine with the input parameters which varies independently with time changes related to weather condition. Artificial neural network models are applied with three different conditions. Input parameters are consisted of water quality data observed in water treatment process based on the structure of auto-regressive model type, considering a time lag. The artificial neural network models have better ability to predict residual chlorine at sediment basin than conventional linear and nonlinear multi-regression models. The determination coefficients of each model in verification process were shown as 0.742, 0.754, and 0.869, respectively. Consequently, comparing the results of each model, neural network can simulate the residual chlorine in sedimentation basin better than mathematical regression models in terms of prediction performance. This results are expected to contribute into automation control of water treatment processes.

MLR & ANN approaches for prediction of compressive strength of alkali activated EAFS

  • Ozturk, Murat;Cansiz, Omer F.;Sevim, Umur K.;Bankir, Muzeyyen Balcikanli
    • Computers and Concrete
    • /
    • 제21권5호
    • /
    • pp.559-567
    • /
    • 2018
  • In this study alkali activation of Electric Arc Furnace Slag (EAFS) is studied with a comprehensive test program. Three different silicate moduli (1-1,5-2), three different sodium concentrations (4%-6%-8%) for each silicate module, two different curing conditions (45%-98% relative humidity) for each sodium concentration, two different curing temperatures ($400^{\circ}C-800^{\circ}C$) for each relative humidity condition and two different curing time (6h-12h) for each curing temperature variables are selected and their effects on compressive strength was evaluated then regression equations using multiple linear regressions methods are fitted. And then to select the best regression models confirm with using the variables, the regression models compared between itself. An Artificial Neural Network (ANN) models that use silicate moduli, sodium concentration, relative humidity, curing temperature and curing time variables, are formed. After the investigation of these ANN models' results, ANN and multiple linear regressions based models are compared with each other. After that, an explicit formula is developed with values of the ANN model. As a result of this study, the fluctuations of data set of the compressive strength were very well reflected using both of the methods, multiple linear regression with quadratic terms and ANN.