• Title/Summary/Keyword: regression to the mean

Search Result 4,132, Processing Time 0.044 seconds

A Score test for Detection of Outliers in Nonlinear Regression

  • Kahng, Myung-Wook
    • Journal of the Korean Statistical Society
    • /
    • v.22 no.2
    • /
    • pp.201-208
    • /
    • 1993
  • Given the specific mean shift outlier model, the score test for multiple outliers in nonlinear regression is discussed as an alternative to the likelihood ratio test. The geometric interpretation of the score statistic is also presented.

  • PDF

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.149-161
    • /
    • 2019
  • In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

Object Tracking with the Multi-Templates Regression Model Based MS Algorithm

  • Zhang, Hua;Wang, Lijia
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1307-1317
    • /
    • 2018
  • To deal with the problems of occlusion, pose variations and illumination changes in the object tracking system, a regression model weighted multi-templates mean-shift (MS) algorithm is proposed in this paper. Target templates and occlusion templates are extracted to compose a multi-templates set. Then, the MS algorithm is applied to the multi-templates set for obtaining the candidate areas. Moreover, a regression model is trained to estimate the Bhattacharyya coefficients between the templates and candidate areas. Finally, the geometric center of the tracked areas is considered as the object's position. The proposed algorithm is evaluated on several classical videos. The experimental results show that the regression model weighted multi-templates MS algorithm can track an object accurately in terms of occlusion, illumination changes and pose variations.

Estimation of long memory parameter in nonparametric regression

  • Cho, Yeoyoung;Baek, Changryong
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.6
    • /
    • pp.611-622
    • /
    • 2019
  • This paper considers the estimation of the long memory parameter in nonparametric regression with strongly correlated errors. The key idea is to minimize a unified mean squared error of long memory parameter to select both kernel bandwidth and the number of frequencies used in exact local Whittle estimation. A unified mean squared error framework is more natural because it provides both goodness of fit and measure of strong dependence. The block bootstrap is applied to evaluate the mean squared error. Finite sample performance using Monte Carlo simulations shows the closest performance to the oracle. The proposed method outperforms existing methods especially when dependency and sample size increase. The proposed method is also illustreated to the volatility of exchange rate between Korean Won for US dollar.

Optimal Restrictions on Regression Parameters For Linear Mixture Model

  • Ahn, Jung-Yeon;Park, Sung-Hyun
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.3
    • /
    • pp.325-336
    • /
    • 1999
  • Collinearity among independent variables can have severe effects on the precision of response estimation for some region of interest in the experiments with mixture. A method of finding optimal linear restriction on regression parameter in linear model for mixture experiments in the sense of minimizing integrated mean squared error is studied. We use the formulation of optimal restrictions on regression parameters for estimating responses proposed by Park(1981) by transforming mixture components to mathematically independent variables.

  • PDF

Prediction of Blast Vibration in Quarry Using Machine Learning Models (머신러닝 모델을 이용한 석산 개발 발파진동 예측)

  • Jung, Dahee;Choi, Yosoon
    • Tunnel and Underground Space
    • /
    • v.31 no.6
    • /
    • pp.508-519
    • /
    • 2021
  • In this study, a model was developed to predict the peak particle velocity (PPV) that affects people and the surrounding environment during blasting. Four machine learning models using the k-nearest neighbors (kNN), classification and regression tree (CART), support vector regression (SVR), and particle swarm optimization (PSO)-SVR algorithms were developed and compared with each other to predict the PPV. Mt. Yogmang located in Changwon-si, Gyeongsangnam-do was selected as a study area, and 1048 blasting data were acquired to train the machine learning models. The blasting data consisted of hole length, burden, spacing, maximum charge per delay, powder factor, number of holes, ratio of emulsion, monitoring distance and PPV. To evaluate the performance of the trained models, the mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) were used. The PSO-SVR model showed superior performance with MAE, MSE and RMSE of 0.0348, 0.0021 and 0.0458, respectively. Finally, a method was proposed to predict the degree of influence on the surrounding environment using the developed machine learning models.

Analysis of Air Temperature Factors Related to Difference of Fruit Characteristics According to Cultivating Areas of Persimmon (Diospyros kaki Thunb.) (감 재배지 간 과실 품질 차이에 관계한 기온요인 분석)

  • Kim, Ho-Cheol;Jeon, Kyung-Soo;Kim, Tae-Choon
    • Journal of Bio-Environment Control
    • /
    • v.17 no.2
    • /
    • pp.124-131
    • /
    • 2008
  • To investigate main air temperature factors correlated to difference of fruit characteristics according to cultivating areas, fruit and air temperature characteristics of eight cultivating areas of 'Fuyu' persimmon were analyzed by principle components and multiple regression analysis. The first principal components extracted from 16 air temperature factors was annual mean temperature, mean temperature during October, annual mean minimum extreme temperature, mean temperature during growing period, and so forth. The second principal components was mean temperature during May and June and so forth. And cumulative contribution was 91.4%. The five of eight cultivating area had clearly the difference of main factors or the correlated direction among cultivating areas. In multiple regression analysis between the extracted main factors and fruit characteristics, fruit hight were highly correlated with mean temperature during growing period ($X_8$) and cumulative temperature ($X_6$), and the regression equation was $Y=150.55-5.375X_8+ 0.014X_6(r^2=0.843)$. Also this regression equation was affected by mean minimum temperature during growing period, cumulative temperature, and mean temperature during August. Fruit diameter was negatively correlated with mean temperature during growing period, flesh browning rate and Hunter a value of peel color were positively correlated with mean minimum temperature during growing period and annual minimum air temperature, respectively.

A Study on Statistical Forecasting Models of PM10 in Pohang Region by the Variable Transformation (변수변환을 통한 포항지역 미세먼지의 통계적 예보모형에 관한 연구)

  • Lee, Yung-Seop;Kim, Hyun-Goo;Park, Jong-Seok;Kim, Hee-Kyung
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.22 no.5
    • /
    • pp.614-626
    • /
    • 2006
  • Using the data of three environmental monitoring sites in Pohang area(KME112, KME113, and KME114), statistical forecasting models of the daily maximum and mean values of PM10 have been developed. Since the distributions of the daily maximum and mean PM10 values are skewed, which are similar to the Weibull distribution, these values were log-transformed to increase prediction accuracy by approximating the normal distribution. Three statistical forecasting models, which are regression, neural networks(NN) and support vector regression(SVR), were built using the log-transformed response variables, i.e., log(max(PM10)) or log(mean (PM10)). Also, the forecasting models were validated by the measure of RMSE, CORR, and IOA for the model comparison and accuracy. The improvement rate of IOA before and after the log-transformation in the daily maximum PM10 prediction was 12.7% for the regression and 22.5% for NN. In particular, 42.7% was improved for SVR method. In the case of the daily mean PM10 prediction, IOA value was improved by 5.1% for regression, 6.5% for NN, and 6.3% for SVR method. As a conclusion, SVR method was found to be performed better than the other methods in the point of the model accuracy and fitness views.

Estimation of Ridge Regression Under the Integrate Mean Square Error Cirterion

  • Yong B. Lim;Park, Chi H.;Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • v.9 no.1
    • /
    • pp.61-77
    • /
    • 1980
  • In response surface experiments, a polynomial model is often used to fit the response surface by the method of least squares. However, if the vectors of predictor variables are multicollinear, least squares estimates of the regression parameters have a high probability of being unsatisfactory. Hoerland Kennard have demonstrated that these undesirable effects of multicollinearity can be reduced by using "ridge" estimates in place of the least squares estimates. Ridge regrssion theory in literature has been mainly concerned with selection of k for the first order polynomial regression model and the precision of $\hat{\beta}(k)$, the ridge estimator of regression parameters. The problem considered in this paper is that of selecting k of ridge regression for a given polynomial regression model with an arbitrary order. A criterion is proposed for selection of k in the context of integrated mean square error of fitted responses, and illustrated with an example. Also, a type of admissibility condition is established and proved for the propose criterion.criterion.

  • PDF

TIME SERIES PREDICTION USING INCREMENTAL REGRESSION

  • Kim, Sung-Hyun;Lee, Yong-Mi;Jin, Long;Chai, Duck-Jin;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.635-638
    • /
    • 2006
  • Regression of conventional prediction techniques in data mining uses the model which is generated from the training step. This model is applied to new input data without any change. If this model is applied directly to time series, the rate of prediction accuracy will be decreased. This paper proposes an incremental regression for time series prediction like typhoon track prediction. This technique considers the characteristic of time series which may be changed over time. It is composed of two steps. The first step executes a fractional process for applying input data to the regression model. The second step updates the model by using its information as new data. Additionally, the model is maintained by only recent data in a queue. This approach has the following two advantages. It maintains the minimum information of the model by using a matrix, so space complexity is reduced. Moreover, it prevents the increment of error rate by updating the model over time. Accuracy rate of the proposed method is measured by RME(Relative Mean Error) and RMSE(Root Mean Square Error). The results of typhoon track prediction experiment are performed by the proposed technique IMLR(Incremental Multiple Linear Regression) is more efficient than those of MLR(Multiple Linear Regression) and SVR(Support Vector Regression).

  • PDF