• 제목/요약/키워드: multiple linear regression models

검색결과 324건 처리시간 0.026초

선형회귀 모형에서 자기공분산 기반 추정 (Autocovariance based estimation in the linear regression model)

  • 박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권5호
    • /
    • pp.839-847
    • /
    • 2011
  • 이 연구에서는 다중 선형회귀 모형에서 자기공분산에 근거한 회귀 계수의 추정량을 도출하였다. 자기공분산에 근거한 방법은 Park (2009)에 제시된 방법으로 직관적으로 매혹적이지는 않지만, 이것에 근거한 추정량이 회귀 계수의 불편추정량이 된다. 설명변수 벡터가 어떤 정칙조건을 만족한다면, 오차가 자기회귀이동평균 모형을 따르면 만족되는 약한 조건 하에서 이 추정량이 최소제곱 추정량과 점근적으로 동일한 분포를 가지며 또한 회귀 계수에 확률 상 수렴한다는 것을 보였다. 마지막으로 모의실험을 통해 이 성질들이 소표본에서도 성립하는 것을 보였다.

Prediction of unconfined compressive and Brazilian tensile strength of fiber reinforced cement stabilized fly ash mixes using multiple linear regression and artificial neural network

  • Chore, H.S.;Magar, R.B.
    • Advances in Computational Design
    • /
    • 제2권3호
    • /
    • pp.225-240
    • /
    • 2017
  • This paper presents the application of multiple linear regression (MLR) and artificial neural network (ANN) techniques for developing the models to predict the unconfined compressive strength (UCS) and Brazilian tensile strength (BTS) of the fiber reinforced cement stabilized fly ash mixes. UCS and BTS is a highly nonlinear function of its constituents, thereby, making its modeling and prediction a difficult task. To establish relationship between the independent and dependent variables, a computational technique like ANN is employed which provides an efficient and easy approach to model the complex and nonlinear relationship. The data generated in the laboratory through systematic experimental programme for evaluating UCS and BTS of fiber reinforced cement fly ash mixes with respect to 7, 14 and 28 days' curing is used for development of the MLR and ANN model. The data used in the models is arranged in the format of four input parameters that cover the contents of cement and fibers along with maximum dry density (MDD) and optimum moisture contents (OMC), respectively and one dependent variable as unconfined compressive as well as Brazilian tensile strength. ANN models are trained and tested for various combinations of input and output data sets. Performance of networks is checked with the statistical error criteria of correlation coefficient (R), mean square error (MSE) and mean absolute error (MAE). It is observed that the ANN model predicts both, the unconfined compressive and Brazilian tensile, strength quite well in the form of R, RMSE and MAE. This study shows that as an alternative to classical modeling techniques, ANN approach can be used accurately for predicting the unconfined compressive strength and Brazilian tensile strength of fiber reinforced cement stabilized fly ash mixes.

Testing for A Change Point by Model Selection Tools in Linear Regression Models

  • Yoon, Yong-Hwa;Kim, Jong-Tae;Cho, Kil-Ho;Shin, Kyung-A
    • Communications for Statistical Applications and Methods
    • /
    • 제7권3호
    • /
    • pp.655-665
    • /
    • 2000
  • Several information criterions, Schwarz information criterion (SIC), Akaike information criterion (AIC), and the modified Akaike information criterion ($AIC_c$), are proposed to locate a change point in the multiple linear regression model. These methods are applied to a stock Exchange data set and compared to the results.

  • PDF

서울시 도시기온 변화에 관한 모델 연구 (Statistical Models of Air Temperatures in Seoul)

  • 김학열;김운수
    • 한국조경학회지
    • /
    • 제31권3호
    • /
    • pp.74-82
    • /
    • 2003
  • Under the assumption that the temperature of one location is closely related to land use characteristics around that location, this study is carried out to assess the impact of urban land use patterns on air temperature. In order to investigate the relationship, GIS techniques and statistical analyses are utilized, after spatially connecting urban land use data in Seoul Metropolitan Area with atmospheric data observed at Automatic Weather Stations (AWS). The research method is as follows: (1) To find out important land use factors on temperature, simple linear regressions for a specific time period (pilot study) are conducted with urban land use characteristics, (2) To make a final model, multiple regressions are carried out with those factors and, (3) To verify that the final model could be appled to explain temperature variations beyond the period, the model is extensively used for 5 different time periods: 1999 as a whole; summer in 1999; 1998 as a whole; summer in 1998; August in 1998. The results of simple linear regression models in the pilot study show that transportation facilities and open space area are very influential on urban air temperature variations, which explain 66 and 61 percent of the variations, respectively. However, the other land use variables (residential, commercial, and mixed land use) are found to have weak or insignificant relationship to the air temperatures. Multiple linear regression with the two important variables in the pilot study is estimated, which shows that the model explains 75 percent of the variability in air temperatures with correct signs of regression coefficients. Thus, it is empirically shown that an increase in open space and a decrease in transportation facilities area can leads to the decrease in air temperature. After the final model is extensively applied to the 5 different time periods, the estimated models explain 68 ∼ 75 percent of the variations in the temperatures is significant regression coefficients for all explanatory variables. This result provides a possibility that one air temperature model for a specific time period could be a good model for other time periods near to the period. The important implications of this result to lessen high air temperature we: (1) to expand and to conserve open space and (2) to control transportation-related factors such as transportation facilities area, road pavement and traffic congestion.

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제40권2호
    • /
    • pp.138-145
    • /
    • 2016
  • This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

Prediction of compressive strength of concrete using multiple regression model

  • Chore, H.S.;Shelke, N.L.
    • Structural Engineering and Mechanics
    • /
    • 제45권6호
    • /
    • pp.837-851
    • /
    • 2013
  • In construction industry, strength is a primary criterion in selecting a concrete for a particular application. The concrete used for construction gains strength over a long period of time after pouring the concrete. The characteristic strength of concrete is defined as the compressive strength of a sample that has been aged for 28 days. Neither waiting for 28 days for such a test would serve the rapidity of construction, nor would neglecting it serve the quality control process on concrete in large construction sites. Therefore, rapid and reliable prediction of the strength of concrete would be of great significance. On this backdrop, the method is proposed to establish a predictive relationship between properties and proportions of ingredients of concrete, compaction factor, weight of concrete cubes and strength of concrete whereby the strength of concrete can be predicted at early age. Multiple regression analysis was carried out for predicting the compressive strength of concrete containing Portland Pozolana cement using statistical analysis for the concrete data obtained from the experimental work done in this study. The multiple linear regression models yielded fairly good correlation coefficient for the prediction of compressive strength for 7, 28 and 40 days curing. The results indicate that the proposed regression models are effectively capable of evaluating the compressive strength of the concrete containing Portaland Pozolana Cement. The derived formulas are very simple, straightforward and provide an effective analysis tool accessible to practicing engineers.

경제⋅사회지표의 다변량 통계 분석을 활용한 국가 간 산업재해 사고사망 상대수준 비교 (Comparison of National Occupational Accident Fatality Rates using Statistical Analysis on Economic and Social Indicators)

  • 김경훈;이수동
    • 한국안전학회지
    • /
    • 제37권6호
    • /
    • pp.128-135
    • /
    • 2022
  • The comparative evaluation of occupational accident fatality rates (OAFRs) of different countries is complicated owing to the differences in their level of socio-economic development. However, such evaluation is necessary to assess the national occupational safety and health system of a country. This study proposes a statistical method to compare the OAFRs of countries taking into consideration the difference in their level of socio-economic development. We first collected data on the socio-economic indicators and OAFRs of 11 countries over a 30-year period. Next, based on literature survey and statistical correlation analysis, we selected the significant independent variables and built multiple linear regression models to predict OAFR. We also determined the groups of countries having heterogeneous relationships between the independent variables and OAFRs, which are represented by the regression models. The proposed method is demonstrated by comparing the OAFR of Korea with the OAFRs of 10 other developed countries.

전기 가격 예측을 위한 맵리듀스 기반의 로컬 단위 선형회귀 모델 (MapReduce-based Localized Linear Regression for Electricity Price Forecasting)

  • 한진주;이인규;온병원
    • 전기학회논문지P
    • /
    • 제67권4호
    • /
    • pp.183-190
    • /
    • 2018
  • Predicting accurate electricity prices is an important task in the electricity trading market. To address the electricity price forecasting problem, various approaches have been proposed so far and it is known that linear regression-based approaches are the best. However, the use of such linear regression-based methods is limited due to low accuracy and performance. In traditional linear regression methods, it is not practical to find a nonlinear regression model that explains the training data well. If the training data is complex (i.e., small-sized individual data and large-sized features), it is difficult to find the polynomial function with n terms as the model that fits to the training data. On the other hand, as a linear regression model approximating a nonlinear regression model is used, the accuracy of the model drops considerably because it does not accurately reflect the characteristics of the training data. To cope with this problem, we propose a new electricity price forecasting method that divides the entire dataset to multiple split datasets and find the best linear regression models, each of which is the optimal model in each dataset. Meanwhile, to improve the performance of the proposed method, we modify the proposed localized linear regression method in the map and reduce way that is a framework for parallel processing data stored in a Hadoop distributed file system. Our experimental results show that the proposed model outperforms the existing linear regression model. Specifically, the accuracy of the proposed method is improved by 45% and the performance is faster 5 times than the existing linear regression-based model.

지역 난방을 위한 열 수요예측 (Heat Demand Forecasting for Local District Heating)

  • 송기범;박진수;김윤배;정철우;박찬민
    • 산업공학
    • /
    • 제24권4호
    • /
    • pp.373-378
    • /
    • 2011
  • High level of accuracy in forecasting heat demand of each district is required for operating and managing the district heating efficiently. Heat demand has a close connection with the demands of the previous days and the temperature, general demand forecasting methods may be used forecast. However, there are some exceptional situations to apply general methods such as the exceptional low demand in weekends or vacation period. We introduce a new method to forecast the heat demand to overcome these situations, using the linearities between the demand and some other factors. Our method uses the temperature and the past 7 days' demands as the factors which determine the future demand. The model consists of daily and hourly models which are multiple linear regression models. Appling these two models to historical data, we confirmed that our method can forecast the heat demand correctly with reasonable errors.

국내 교통사고 밀도 모형 개발 (Development of Accident Density Model in Korea)

  • 박나영;김태양;박병호
    • 한국안전학회지
    • /
    • 제32권3호
    • /
    • pp.130-135
    • /
    • 2017
  • This study deal with the traffic accident. The purpose of this study is to develop the accident density models reflecting the transportation and socioeconomic characteristics based on 230 zones of Korea. In this study, The models which are tested to be statistically significant are developed through multiple linear regression analysis. The main research results are as follows. First, in the transportation-based model, road length, avenue ratio, number of intersections and tunnels are analyzed to be positive to the model, however, school zone is analyzed to be negative to the model. Second, in the socioeconomic-based model, population density, transportation vulnerable ratio, children and truck ratio are analyzed to be positive to the model. Finally, in the integrated models, road ratio, population density, transportation vulnerable ratio, children ratio, truck ratio and number of companies are analyzed to be positive, however, school zone is analyzed to be negative to the model. This results could be expected to give good implications to accident-reduction policy-making.