• Title/Summary/Keyword: multiple regression analysis model

Search Result 1,680, Processing Time 0.031 seconds

A Study on Stochastic Estimation of Monthly Runoff by Multiple Regression Analysis (다중회귀분석에 의한 하천 월 유출량의 추계학적 추정에 관한 연구)

  • 김태철;정하우
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.22 no.3
    • /
    • pp.75-87
    • /
    • 1980
  • Most hydro]ogic phenomena are the complex and organic products of multiple causations like climatic and hydro-geological factors. A certain significant correlation on the run-off in river basin would be expected and foreseen in advance, and the effect of each these causual and associated factors (independant variables; present-month rainfall, previous-month run-off, evapotranspiration and relative humidity etc.) upon present-month run-off(dependent variable) may be determined by multiple regression analysis. Functions between independant and dependant variables should be treated repeatedly until satisfactory and optimal combination of independant variables can be obtained. Reliability of the estimated function should be tested according to the result of statistical criterion such as analysis of variance, coefficient of determination and significance-test of regression coefficients before first estimated multiple regression model in historical sequence is determined. But some error between observed and estimated run-off is still there. The error arises because the model used is an inadequate description of the system and because the data constituting the record represent only a sample from a population of monthly discharge observation, so that estimates of model parameter will be subject to sampling errors. Since this error which is a deviation from multiple regression plane cannot be explained by first estimated multiple regression equation, it can be considered as a random error governed by law of chance in nature. This unexplained variance by multiple regression equation can be solved by stochastic approach, that is, random error can be stochastically simulated by multiplying random normal variate to standard error of estimate. Finally hybrid model on estimation of monthly run-off in nonhistorical sequence can be determined by combining the determistic component of multiple regression equation and the stochastic component of random errors. Monthly run-off in Naju station in Yong-San river basin is estimated by multiple regression model and hybrid model. And some comparisons between observed and estimated run-off and between multiple regression model and already-existing estimation methods such as Gajiyama formula, tank model and Thomas-Fiering model are done. The results are as follows. (1) The optimal function to estimate monthly run-off in historical sequence is multiple linear regression equation in overall-month unit, that is; Qn=0.788Pn+0.130Qn-1-0.273En-0.1 About 85% of total variance of monthly runoff can be explained by multiple linear regression equation and its coefficient of determination (R2) is 0.843. This means we can estimate monthly runoff in historical sequence highly significantly with short data of observation by above mentioned equation. (2) The optimal function to estimate monthly runoff in nonhistorical sequence is hybrid model combined with multiple linear regression equation in overall-month unit and stochastic component, that is; Qn=0. 788Pn+0. l30Qn-1-0. 273En-0. 10+Sy.t The rest 15% of unexplained variance of monthly runoff can be explained by addition of stochastic process and a bit more reliable results of statistical characteristics of monthly runoff in non-historical sequence are derived. This estimated monthly runoff in non-historical sequence shows up the extraordinary value (maximum, minimum value) which is not appeared in the observed runoff as a random component. (3) "Frequency best fit coefficient" (R2f) of multiple linear regression equation is 0.847 which is the same value as Gaijyama's one. This implies that multiple linear regression equation and Gajiyama formula are theoretically rather reasonable functions.

  • PDF

The Development of the DEA-AR Model using Multiple Regression Analysis and Efficiency Evaluation of Regional Corporation in Korea (다중회귀분석을 이용한 DEA-AR 모형 개발 및 국내 지방공사의 효율성 평가)

  • Sim, Gwang-Sic;Kim, Jae-Yun
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.37 no.1
    • /
    • pp.29-43
    • /
    • 2012
  • We design a DEA-AR model using multiple regression analysis with new methods which limit weights. When there are multiple input and single output variables, our model can be used, and the weights of input variables use the regression coefficient and coefficient of determination. To verify the effectiveness of the new model, we evaluate the efficiency of the Regional Corporations in Korea. Accordance with statistical analysis, it proved that there is no difference between the efficiency value of the DEA-AR using AHP and our DEA-AR model. Our model can be applied to a lot of research by substituting DEA-AR model relying on AHP in the future.

Water Demand Forecasting by Characteristics of City Using Principal Component and Cluster Analyses

  • Choi, Tae-Ho;Kwon, O-Eun;Koo, Ja-Yong
    • Environmental Engineering Research
    • /
    • v.15 no.3
    • /
    • pp.135-140
    • /
    • 2010
  • With the various urban characteristics of each city, the existing water demand prediction, which uses average liter per capita day, cannot be used to achieve an accurate prediction as it fails to consider several variables. Thus, this study considered social and industrial factors of 164 local cities, in addition to population and other directly influential factors, and used main substance and cluster analyses to develop a more efficient water demand prediction model that considers unique localities of each city. After clustering, a multiple regression model was developed that proved that the $R^2$ value of the inclusive multiple regression model was 0.59; whereas, those of Clusters A and B were 0.62 and 0.74, respectively. Thus, the multiple regression model was considered more reasonable and valid than the inclusive multiple regression model. In summary, the water demand prediction model using principal component and cluster analyses as the standards to classify localities has a better modification coefficient than that of the inclusive multiple regression model, which does not consider localities.

Application of discrete Weibull regression model with multiple imputation

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.325-336
    • /
    • 2019
  • In this article we extend the discrete Weibull regression model in the presence of missing data. Discrete Weibull regression models can be adapted to various type of dispersion data however, it is not widely used. Recently Yoo (Journal of the Korean Data and Information Science Society, 30, 11-22, 2019) adapted the discrete Weibull regression model using single imputation. We extend their studies by using multiple imputation also with several various settings and compare the results. The purpose of this study is to address the merit of using multiple imputation in the presence of missing data in discrete count data. We analyzed the seventh Korean National Health and Nutrition Examination Survey (KNHANES VII), from 2016 to assess the factors influencing the variable, 1 month hospital stay, and we compared the results using discrete Weibull regression model with those of Poisson, negative Binomial and zero-inflated Poisson regression models, which are widely used in count data analyses. The results showed that the discrete Weibull regression model using multiple imputation provided the best fit. We also performed simulation studies to show the accuracy of the discrete Weibull regression using multiple imputation given both under- and over-dispersed distribution, as well as varying missing rates and sample size. Sensitivity analysis showed the influence of mis-specification and the robustness of the discrete Weibull model. Using imputation with discrete Weibull regression to analyze discrete data will increase explanatory power and is widely applicable to various types of dispersion data with a unified model.

An Approach to Applying Multiple Linear Regression Models by Interlacing Data in Classifying Similar Software

  • Lim, Hyun-il
    • Journal of Information Processing Systems
    • /
    • v.18 no.2
    • /
    • pp.268-281
    • /
    • 2022
  • The development of information technology is bringing many changes to everyday life, and machine learning can be used as a technique to solve a wide range of real-world problems. Analysis and utilization of data are essential processes in applying machine learning to real-world problems. As a method of processing data in machine learning, we propose an approach based on applying multiple linear regression models by interlacing data to the task of classifying similar software. Linear regression is widely used in estimation problems to model the relationship between input and output data. In our approach, multiple linear regression models are generated by training on interlaced feature data. A combination of these multiple models is then used as the prediction model for classifying similar software. Experiments are performed to evaluate the proposed approach as compared to conventional linear regression, and the experimental results show that the proposed method classifies similar software more accurately than the conventional model. We anticipate the proposed approach to be applied to various kinds of classification problems to improve the accuracy of conventional linear regression.

Comparison of Genetic Parameter Estimates of Total Sperm Cells of Boars between Random Regression and Multiple Trait Animal Models

  • Oh, S.-H.;See, M.T.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.21 no.7
    • /
    • pp.923-927
    • /
    • 2008
  • The objective of this study was to compare random regression model and multiple trait animal model estimates of the (co) variance of total sperm cells over the active lifetime of AI boars. Data were provided by Smithfield Premium Genetics (Rose Hill, NC). Total number of records and animals for the random regression model were 19,629 and 1,736, respectively. Data for multiple trait animal model analyses were edited to include only records produced at 9, 12, 15, 18, 21, 24, and 27 months of age. For the multiple trait method estimates of genetic and residual variance for total sperm cells were heterogeneous among age classifications. When comparing multiple trait method to random regression, heritability estimates were similar except for total sperm cells at 24 months of age. The multiple trait method also resulted in higher estimates of heritability of total sperm cells at every age when compared to random regression results. Random regression analysis provided more detail with regard to changes of variance components with age. Random regression methods are the most appropriate to analyze semen traits as they are longitudinal data measured over the lifetime of boars.

A Study on Forecast of Oyster Production using Time Series Models (시계열모형을 이용한 굴 생산량 예측 가능성에 관한 연구)

  • Nam, Jong-Oh;Noh, Seung-Guk
    • Ocean and Polar Research
    • /
    • v.34 no.2
    • /
    • pp.185-195
    • /
    • 2012
  • This paper focused on forecasting a short-term production of oysters, which have been farmed in Korea, with distinct periodicity of production by year, and different production level by month. To forecast a short-term oyster production, this paper uses monthly data (260 observations) from January 1990 to August 2011, and also adopts several econometrics methods, such as Multiple Regression Analysis Model (MRAM), Seasonal Autoregressive Integrated Moving Average (SARIMA) Model, and Vector Error Correction Model (VECM). As a result, first, the amount of short-term oyster production forecasted by the multiple regression analysis model was 1,337 ton with prediction error of 246 ton. Secondly, the amount of oyster production of the SARIMA I and II models was forecasted as 12,423 ton and 12,442 ton with prediction error of 11,404 ton and 11,423 ton, respectively. Thirdly, the amount of oyster production based on the VECM was estimated as 10,425 ton with prediction errors of 9,406 ton. In conclusion, based on Theil inequality coefficient criterion, short-term prediction of oyster by the VECM exhibited a better fit than ones by the SARIMA I and II models and Multiple Regression Analysis Model.

COST PERFORMANCE PREDICTION FOR INTERNATIONAL CONSTRUCTION PROJECTS USING MULTIPLE REGRESSION ANALYSIS AND STRUCTURAL EQUATION MODEL: A COMPARATIVE STUDY

  • D.Y. Kim;S.H. Han;H. Kim;H. Park
    • International conference on construction engineering and project management
    • /
    • 2007.03a
    • /
    • pp.653-661
    • /
    • 2007
  • Overseas construction projects tend to be more complex than domestic projects, being exposed to more external risks, such as politics, economy, society, and culture, as well as more internal risks from the project itself. It is crucial to have an early understanding of the project condition, in order to be well prepared in various phases of the project. This study compares a structural equation model and multiple regression analysis, in their capacity to predict cost performance of international construction projects. The structural equation model shows a more accurate prediction of cost performance than does regression analysis, due to its intrinsic capability of considering various cost factors in a systematic way.

  • PDF

A Comparison of Construction Cost Estimation Using Multiple Regression Analysis and Neural Network in Elementary School Project

  • Cho, Hong-Gyu;Kim, Kyong-Gon;Kim, Jang-Young;Kim, Gwang-Hee
    • Journal of the Korea Institute of Building Construction
    • /
    • v.13 no.1
    • /
    • pp.66-74
    • /
    • 2013
  • In the early stages of a construction project, the most important thing is to predict construction costs in a rational way. For this reason, many studies have been performed on the estimation of construction costs for apartment housing and office buildings at early stage using artificial intelligence, statistics, and the like. In this study, cost data held by a provincial Office of Education on elementary schools constructed from 2004 to 2007 were used to compare the multiple regression model with an artificial neural network model. A total of 96 historical data were classified into 76 historical data for constructing models and 20 historical data for comparing the constructed regression model with the artificial neural network model. The results of an analysis of predicted construction costs were that the error rate of the artificial neural network model is lower than that of the multiple regression model.

Multivariate Analysis for Clinicians (임상의를 위한 다변량 분석의 실제)

  • Oh, Joo Han;Chung, Seok Won
    • Clinics in Shoulder and Elbow
    • /
    • v.16 no.1
    • /
    • pp.63-72
    • /
    • 2013
  • In medical research, multivariate analysis, especially multiple regression analysis, is used to analyze the influence of multiple variables on the result. Multiple regression analysis should include variables in the model and the problem of multi-collinearity as there are many variables as well as the basic assumption of regression analysis. The multiple regression model is expressed as the coefficient of determination, $R^2$ and the influence of independent variables on result as a regression coefficient, ${\beta}$. Multiple regression analysis can be divided into multiple linear regression analysis, multiple logistic regression analysis, and Cox regression analysis according to the type of dependent variables (continuous variable, categorical variable (binary logit), and state variable, respectively), and the influence of variables on the result is evaluated by regression coefficient${\beta}$, odds ratio, and hazard ratio, respectively. The knowledge of multivariate analysis enables clinicians to analyze the result accurately and to design the further research efficiently.