• Title/Summary/Keyword: Non-linear regression model

Search Result 275, Processing Time 0.024 seconds

Prediction Models of Residual Chlorine in Sediment Basin to Control Pre-chlorination in Water Treatment Plant (정수장 전염소 공정 제어를 위한 침전지 잔류 염소 농도 예측모델 개발)

  • Lee, Kyung-Hyuk;Kim, Ju-Hwan;Lim, Jae-Lim;Chae, Seon Ha
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.21 no.5
    • /
    • pp.601-607
    • /
    • 2007
  • In order to maintain constant residual chlorine in sedimentation basin, It is necessary to develop real time prediction model of residual chlorine considering water treatment plant data such as water qualities, weather, and plant operation conditions. Based on the operation data acquired from K water treatment plant, prediction models of residual chlorine in sediment basin were accomplished. The input parameters applied in the models were water temperature, turbidity, pH, conductivity, flow rate, alkalinity and pre-chlorination dosage. The multiple regression models were established with linear and non-linear model with 5,448 data set. The corelation coefficient (R) for the linear and non-linear model were 0.39 and 0.374, respectively. It shows low correlation coefficient, that is, these multiple regression models can not represent the residual chlorine with the input parameters which varies independently with time changes related to weather condition. Artificial neural network models are applied with three different conditions. Input parameters are consisted of water quality data observed in water treatment process based on the structure of auto-regressive model type, considering a time lag. The artificial neural network models have better ability to predict residual chlorine at sediment basin than conventional linear and nonlinear multi-regression models. The determination coefficients of each model in verification process were shown as 0.742, 0.754, and 0.869, respectively. Consequently, comparing the results of each model, neural network can simulate the residual chlorine in sedimentation basin better than mathematical regression models in terms of prediction performance. This results are expected to contribute into automation control of water treatment processes.

A Study on Stochastic Estimation of Monthly Runoff by Multiple Regression Analysis (다중회귀분석에 의한 하천 월 유출량의 추계학적 추정에 관한 연구)

  • 김태철;정하우
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.22 no.3
    • /
    • pp.75-87
    • /
    • 1980
  • Most hydro]ogic phenomena are the complex and organic products of multiple causations like climatic and hydro-geological factors. A certain significant correlation on the run-off in river basin would be expected and foreseen in advance, and the effect of each these causual and associated factors (independant variables; present-month rainfall, previous-month run-off, evapotranspiration and relative humidity etc.) upon present-month run-off(dependent variable) may be determined by multiple regression analysis. Functions between independant and dependant variables should be treated repeatedly until satisfactory and optimal combination of independant variables can be obtained. Reliability of the estimated function should be tested according to the result of statistical criterion such as analysis of variance, coefficient of determination and significance-test of regression coefficients before first estimated multiple regression model in historical sequence is determined. But some error between observed and estimated run-off is still there. The error arises because the model used is an inadequate description of the system and because the data constituting the record represent only a sample from a population of monthly discharge observation, so that estimates of model parameter will be subject to sampling errors. Since this error which is a deviation from multiple regression plane cannot be explained by first estimated multiple regression equation, it can be considered as a random error governed by law of chance in nature. This unexplained variance by multiple regression equation can be solved by stochastic approach, that is, random error can be stochastically simulated by multiplying random normal variate to standard error of estimate. Finally hybrid model on estimation of monthly run-off in nonhistorical sequence can be determined by combining the determistic component of multiple regression equation and the stochastic component of random errors. Monthly run-off in Naju station in Yong-San river basin is estimated by multiple regression model and hybrid model. And some comparisons between observed and estimated run-off and between multiple regression model and already-existing estimation methods such as Gajiyama formula, tank model and Thomas-Fiering model are done. The results are as follows. (1) The optimal function to estimate monthly run-off in historical sequence is multiple linear regression equation in overall-month unit, that is; Qn=0.788Pn+0.130Qn-1-0.273En-0.1 About 85% of total variance of monthly runoff can be explained by multiple linear regression equation and its coefficient of determination (R2) is 0.843. This means we can estimate monthly runoff in historical sequence highly significantly with short data of observation by above mentioned equation. (2) The optimal function to estimate monthly runoff in nonhistorical sequence is hybrid model combined with multiple linear regression equation in overall-month unit and stochastic component, that is; Qn=0. 788Pn+0. l30Qn-1-0. 273En-0. 10+Sy.t The rest 15% of unexplained variance of monthly runoff can be explained by addition of stochastic process and a bit more reliable results of statistical characteristics of monthly runoff in non-historical sequence are derived. This estimated monthly runoff in non-historical sequence shows up the extraordinary value (maximum, minimum value) which is not appeared in the observed runoff as a random component. (3) "Frequency best fit coefficient" (R2f) of multiple linear regression equation is 0.847 which is the same value as Gaijyama's one. This implies that multiple linear regression equation and Gajiyama formula are theoretically rather reasonable functions.

  • PDF

Finite-Sample, Small-Dispersion Asymptotic Optimality of the Non-Linear Least Squares Estimator

  • So, Beong-Soo
    • Journal of the Korean Statistical Society
    • /
    • v.24 no.2
    • /
    • pp.303-312
    • /
    • 1995
  • We consider the following type of general semi-parametric non-linear regression model : $y_i = f_i(\theta) + \epsilon_i, i=1, \cdots, n$ where ${f_i(\cdot)}$ represents the set of non-linear functions of the unknown parameter vector $\theta' = (\theta_1, \cdots, \theta_p)$ and ${\epsilon_i}$ represents the set of measurement errors with unknown distribution. Under suitable finite-sample, small-dispersion asymptotic framework, we derive a general lower bound for the asymptotic mean squared error (AMSE) matrix of the Gauss-consistent estimator of $\theta$. We then prove the fundamental result that the general non-linear least squares estimator (NLSE) is an optimal estimator within the class of all regular Gauss-consistent estimators irrespective of the type of the distribution of the measurement errors.

  • PDF

Least absolute deviation estimator based consistent model selection in regression

  • Shende, K.S.;Kashid, D.N.
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.273-293
    • /
    • 2019
  • We consider the problem of model selection in multiple linear regression with outliers and non-normal error distributions. In this article, the robust model selection criterion is proposed based on the robust estimation method with the least absolute deviation (LAD). The proposed criterion is shown to be consistent. We suggest proposed criterion based algorithms that are suitable for a large number of predictors in the model. These algorithms select only relevant predictor variables with probability one for large sample sizes. An exhaustive simulation study shows that the criterion performs well. However, the proposed criterion is applied to a real data set to examine its applicability. The simulation results show the proficiency of algorithms in the presence of outliers, non-normal distribution, and multicollinearity.

Seismic damage vulnerability of empirical composite material structure of adobe and timber

  • Si-Qi Li
    • Earthquakes and Structures
    • /
    • v.25 no.6
    • /
    • pp.429-442
    • /
    • 2023
  • To study the seismic vulnerability of the composite material structure of adobe and timber, we collected and statistically analysed empirical observation samples of 542,214,937 m2 and 467,177 buildings that were significantly impacted during the 179 earthquakes that occurred in mainland China from 1976 to 2010. In multi-intensity regions, combined with numerical analysis and a probability model, a non-linear continuous regression model of the vulnerability, considering the empirical seismic damage area (number of buildings) and the ratio of seismic damage, was established. Moreover, a probability matrix model of the empirical seismic damage mean value was provided. Considering the coupling effect of the annual and seismic fortification factors, an empirical seismic vulnerability curve model was constructed in the multiple-intensity regions. A probability matrix model of the mean vulnerability index (MVI) was proposed, and was validated through the above-mentioned reconnaissance sample data. A matrix model of the MVI of the regions (19 provinces in mainland China) based on the parameter (MVI) was established.

Bayes Prediction Density in Linear Models

  • Kim, S.H.
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.797-803
    • /
    • 2001
  • This paper obtained Bayes prediction density for the spatial linear model with non-informative prior. It showed the results that predictive inferences is completely unaffected by departures from the normality assumption in the direction of the elliptical family and the structure of prediction density is unchanged by more than one additional future observations.

  • PDF

Determination of Regression Model for Estimating Root Fresh Weight Using Maximum Leaf Length and Width of Root Vegetables Grown in Reclaimed Land (간척지 재배 근채류의 최대 엽장과 엽폭을 이용한 지하부 생체중 추정용 회귀 모델 결정)

  • Jung, Dae Ho;Yi, Pyoung Ho;Lee, In-Bog
    • Korean Journal of Environmental Agriculture
    • /
    • v.39 no.3
    • /
    • pp.204-213
    • /
    • 2020
  • BACKGROUND: Since the number of crops cultivated in reclaimed land is huge, it is very difficult to quantify the total crop production. Therefore, a non-destructive method for predicting crop production is needed. Salt tolerant root vegetables such as red beets and sugar beet are suitable for cultivation in reclaimed land. If their underground biomass can be predicted, it helps to estimate crop productivity. Objectives of this study are to investigate maximum leaf length and weight of red beet, sugar beet, and turnips grown in reclaimed land, and to determine optimal model with regression analysis for linear and allometric growth models. METHODS AND RESULTS: Maximum leaf length, width, and root fresh weight of red beets, sugar beets, and turnips were measured. Ten linear models and six allometric growth models were selected for estimation of root fresh weight and non-linear regression analysis was conducted. The allometric growth model, which have a variable multiplied by square of maximum leaf length and maximum leaf width, showed highest R2 values of 0.67, 0.70, and 0.49 for red beets, sugar beets, and turnips, respectively. Validation results of the models for red beets and sugar beets showed the R2 values of 0.63 and 0.65, respectively. However, the model for turnips showed the R2 value of 0.48. The allometric growth model was suitable for estimating the root fresh weight of red beets and sugar beets, but the accuracy for turnips was relatively low. CONCLUSION: The regression models established in this study may be useful to estimate the total production of root vegetables cultivated in reclaimed land, and it will be used as a non-destructive method for prediction of crop information.

Modeling of Solar Radiation Using Silicon Solar Module

  • Kim, Joon-Yong;Yang, Seung-Hwan;Lee, Chun-Gu;Kim, Young-Joo;Kim, Hak-Jin;Cho, Seong-In;Rhee, Joong-Yong
    • Journal of Biosystems Engineering
    • /
    • v.37 no.1
    • /
    • pp.11-18
    • /
    • 2012
  • Purpose: Short-circuit current of a solar module that is widely used as a power source for wireless environmental sensors is proportional to solar radiation although there are a lot of factors affecting the short-circuit current. The objective of this study is to develop a model for estimating solar radiation for using the solar module as a power source and an irradiance sensor. Methods: An experiment system collected data on the short-circuit current and environmental factors (ambient temperature, cloud cover and solar radiation) during 65 days. Based on these data, two linear regression models and a non-linear regression model were developed and evaluated. Results: The best model was a linear regression model with short-circuit current, angle of incidence and cloud cover and its overall RMSE(Root Means Square Error) was 66.671 $W/m^2$. The other linear model (RMSE 69.038 $W/m^2$) was also acceptable when the cloud cover data is not available.

Application of a Non-stationary Frequency Analysis Method for Estimating Probable Precipitation in Korea (전국 확률강수량 산정을 위한 비정상성 빈도해석 기법의 적용)

  • Kim, Gwang-Seob;Lee, Gi-Chun
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.54 no.5
    • /
    • pp.141-153
    • /
    • 2012
  • In this study, we estimated probable precipitation amounts at the target year (2020, 2030, 2040) of 55 weather stations in Korea using the 24 hour annual maximum precipitation data from 1973 through 2009 which should be useful for management of agricultural reservoirs. Not only trend tests but also non-stationary tests were performed and non-stationary frequency analysis were conducted to all of 55 sites. Gumbel distribution was chosen and probability weighted moment method was used to estimate model parameters. The behavior of the mean of extreme precipitation data, scale parameter, and location parameter were analyzed. The probable precipitation amount at the target year was estimated by a non-stationary frequency analysis using the linear regression analysis for the mean of extreme precipitation data, scale parameter, and location parameter. Overall results demonstrated that the probable precipitation amounts using the non-stationary frequency analysis were overestimated. There were large increase of the probable precipitation amounts of middle part of Korea and decrease at several sites in Southern part. The non-stationary frequency analysis using a linear model should be applicable to relatively short projection periods.

A Causational Study for Urban 4-legged Signalized Intersections using Structural Equation Method (구조방정식을 이용한 도시부 4지 신호교차로의 사고원인 분석)

  • Oh, Jutaek;Lee, Sangkyu;Heo, Taeyoung;Hwang, Jeongwon
    • International Journal of Highway Engineering
    • /
    • v.14 no.6
    • /
    • pp.121-129
    • /
    • 2012
  • PURPOSES : Traffic accidents at intersections have been increased annually so that it is required to examine the causations to reduce the accidents. However, the current existing accident models were developed mainly with non-linear regression models such as Poisson methods. These non-linear regression methods lack to reveal complicated causations for traffic accidents, though they are right choices to study randomness and non-linearity of accidents. Therefore, to reveal the complicated causations of traffic accidents, this study used structural equation methods(SEM). METHODS : SEM used in this study is a statistical technique for estimating causal relations using a combination of statistical data and qualitative causal assumptions. SEM allow exploratory modeling, meaning they are suited to theory development. The method is tested against the obtained measurement data to determine how well the model fits the data. Among the strengths of SEM is the ability to construct latent variables: variables which are not measured directly, but are estimated in the model from several measured variables. This allows the modeler to explicitly capture the unreliability of measurement in the model, which allows the structural relations between latent variables to be accurately estimated. RESULTS : The study results showed that causal factors could be grouped into 3. Factor 1 includes traffic variables, and Factor 2 contains turning traffic variables. Factor 3 consists of other road element variables such as speed limits or signal cycles. CONCLUSIONS : Non-linear regression models can be used to develop accident predictions models. However, they lack to estimate causal factors, because they select only few significant variables to raise the accuracy of the model performance. Compared to the regressions, SEM has merits to estimate causal factors affecting accidents, because it allows the structural relations between latent variables. Therefore, this study used SEM to estimate causal factors affecting accident at urban signalized intersections.