• 제목/요약/키워드: Robust regression estimation

검색결과 99건 처리시간 0.02초

Robustness, Data Analysis, and Statistical Modeling: The First 50 Years and Beyond

  • Barrios, Erniel B.
    • Communications for Statistical Applications and Methods
    • /
    • 제22권6호
    • /
    • pp.543-556
    • /
    • 2015
  • We present a survey of contributions that defined the nature and extent of robust statistics for the last 50 years. From the pioneering work of Tukey, Huber, and Hampel that focused on robust location parameter estimation, we presented various generalizations of these estimation procedures that cover a wide variety of models and data analysis methods. Among these extensions, we present linear models, clustered and dependent observations, times series data, binary and discrete data, models for spatial data, nonparametric methods, and forward search methods for outliers. We also present the current interest in robust statistics and conclude with suggestions on the possible future direction of this area for statistical science.

Robustness of Minimum Disparity Estimators in Linear Regression Models

  • Pak, Ro-Jin
    • Journal of the Korean Statistical Society
    • /
    • 제24권2호
    • /
    • pp.349-360
    • /
    • 1995
  • This paper deals with the robustness properties of the minimum disparity estimation in linear regression models. The estimators defined as statistical quantities whcih minimize the blended weight Hellinger distance between a weighted kernel density estimator of the residuals and a smoothed model density of the residuals. It is shown that if the weights of the density estimator are appropriately chosen, the estimates of the regression parameters are robust.

  • PDF

Reexamination of Estimating Beta Coecient as a Risk Measure in CAPM

  • Phuoc, Le Tan;Kim, Kee S.;Su, Yingcai
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제5권1호
    • /
    • pp.11-16
    • /
    • 2018
  • This research examines the alternative ways of estimating the coefficient of non-diversifiable risk, namely beta coefficient, in Capital Asset Pricing Model (CAPM) introduced by Sharpe (1964) that is an essential element of assessing the value of diverse assets. The non-parametric methods used in this research are the robust Least Trimmed Square (LTS) and Maximum likelihood type of M-estimator (MM-estimator). The Jackknife, the resampling technique, is also employed to validate the results. According to finance literature and common practices, these coecients have often been estimated using Ordinary Least Square (LS) regression method and monthly return data set. The empirical results of this research pointed out that the robust Least Trimmed Square (LTS) and Maximum likelihood type of M-estimator (MM-estimator) performed much better than Ordinary Least Square (LS) in terms of eciency for large-cap stocks trading actively in the United States markets. Interestingly, the empirical results also showed that daily return data would give more accurate estimation than monthly return data in both Ordinary Least Square (LS) and robust Least Trimmed Square (LTS) and Maximum likelihood type of M-estimator (MM-estimator) regressions.

Robust Regression and Stratified Residuals for Left-Truncated and Right-Censored Data

  • Kim, Chul-Ki
    • Journal of the Korean Statistical Society
    • /
    • 제26권3호
    • /
    • pp.333-354
    • /
    • 1997
  • Computational algorithms to calculate M-estimators and rank estimators of regression parameters from left-truncated and right-censored data are developed herein. In the case of M-estimators, new statistical methods are also introduced to incorporate leverage assements and concomitant scale estimation in the presence of left truncation and right censoring on the observed response. Furthermore, graphical methods to examine the residuals from these data are presented. Two real data sets are used for illustration.

  • PDF

Identification of Regression Outliers Based on Clustering of LMS-residual Plots

  • Kim, Bu-Yong;Oh, Mi-Hyun
    • Communications for Statistical Applications and Methods
    • /
    • 제11권3호
    • /
    • pp.485-494
    • /
    • 2004
  • An algorithm is proposed to identify multiple outliers in linear regression. It is based on the clustering of residuals from the least median of squares estimation. A cut-height criterion for the hierarchical cluster tree is suggested, which yields the optimal clustering of the regression outliers. Comparisons of the effectiveness of the procedures are performed on the basis of the classic data and artificial data sets, and it is shown that the proposed algorithm is superior to the one that is based on the least squares estimation. In particular, the algorithm deals very well with the masking and swamping effects while the other does not.

Least absolute deviation estimator based consistent model selection in regression

  • Shende, K.S.;Kashid, D.N.
    • Communications for Statistical Applications and Methods
    • /
    • 제26권3호
    • /
    • pp.273-293
    • /
    • 2019
  • We consider the problem of model selection in multiple linear regression with outliers and non-normal error distributions. In this article, the robust model selection criterion is proposed based on the robust estimation method with the least absolute deviation (LAD). The proposed criterion is shown to be consistent. We suggest proposed criterion based algorithms that are suitable for a large number of predictors in the model. These algorithms select only relevant predictor variables with probability one for large sample sizes. An exhaustive simulation study shows that the criterion performs well. However, the proposed criterion is applied to a real data set to examine its applicability. The simulation results show the proficiency of algorithms in the presence of outliers, non-normal distribution, and multicollinearity.

로버스트 회귀추정에 의한 신뢰구간 구축 (On Confidence Intervals of Robust Regression Estimators)

  • 이동희;박유성;김기환
    • 응용통계연구
    • /
    • 제19권1호
    • /
    • pp.97-110
    • /
    • 2006
  • 대부분의 자료는 여러가지 원인으로 인한 특이치로 오염되어 있으며, 이러한 상황에서 신뢰성 있는 추정량을 얻어내고 이에 대한 통계적 추론을 시행하는 것은 중요한 문제이다. 그러나 이제까지 제안된 로버스트 회귀추정량들은 계산상의 어려움과 정규오차모형에서 최소제곱추정량에 비하여 떨어지는 효율성때문에 통계적 추론의 정확성을 확신할 수 없었다. 최근 제안된 Lee(2004)의 가중자기조율회귀추정량(weighted self-tuning estimator, WSTE)은 다른 로버스트 회귀추정량에 비하여 정확한 계산과정과 그에 따른 추정량의 점근적 정규성 및 고붕괴점을 갖는다. 그러나 통계적 추론을 위하여 이제까지 널리 사용해왔던 로버스트 추정량에 기반한 가중최소제곱추정방법(weighted least squares estimator)은 WSTE에서조차 정규오차모형하에서 최소제곱추정량과 동일한 수준의 효율성을 제공해주지 는 못한다. 본 논문에서는 WSTE에 기반한 또다른 통계적 추론 방법을 제안하고, 이 방법을 사용함으로써 정규오차모형 및 대표본에서 보다 정확한 결과를 얻을 수 있음을 몬테칼로 모의실험을 통해 제시하였다.

The Effect of COVID-19 Pandemic on the Philippine Stock Exchange, Peso-Dollar Rate and Retail Price of Diesel

  • CAMBA, Aileen L.;CAMBA, Abraham C. Jr.
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제7권10호
    • /
    • pp.543-553
    • /
    • 2020
  • This paper examines the effect of COVID-19 pandemic on the Philippine stock exchange, peso-dollar rate and retail price of diesel using robust least squares regression and vector autoregression (VAR). The robust least squares regression using MM-estimation method concluded that COVID-19 daily infection has negative and statistically significant effect on the Philippine stock exchange index, peso-dollar exchange rate and retail pump price of diesel. This is consistent with the results of correlation diagnostics. As for the VAR model, the lag values of the independent variable disclose significance in explaining the Philippine stock exchange index, peso-dollar exchange rate and retail pump price of diesel. Moreover, in the short run, the impulse response function confirmed relative effect of COVID-19 daily infections and the variance decomposition divulge that COVID-19 daily infections have accounted for only minor portion in explaining fluctuations of the Philippine stock exchange index, peso-dollar exchange and retail pump price of diesel. In the long term, the influence levels off. The Granger causality test suggests that COVID-19 daily infections cause changes in the Philippine stock exchange index and peso-dollar exchange rate in the short run. However, COVID-19 infection has no causal link with retail pump price of diesel.

Axial load prediction in double-skinned profiled steel composite walls using machine learning

  • G., Muthumari G;P. Vincent
    • Computers and Concrete
    • /
    • 제33권6호
    • /
    • pp.739-754
    • /
    • 2024
  • This study presents an innovative AI-driven approach to assess the ultimate axial load in Double-Skinned Profiled Steel sheet Composite Walls (DPSCWs). Utilizing a dataset of 80 entries, seven input parameters were employed, and various AI techniques, including Linear Regression, Polynomial Regression, Support Vector Regression, Decision Tree Regression, Decision Tree with AdaBoost Regression, Random Forest Regression, Gradient Boost Regression Tree, Elastic Net Regression, Ridge Regression, and LASSO Regression, were evaluated. Decision Tree Regression and Random Forest Regression emerged as the most accurate models. The top three performing models were integrated into a hybrid approach, excelling in accurately estimating DPSCWs' ultimate axial load. This adaptable hybrid model outperforms traditional methods, reducing errors in complex scenarios. The validated Artificial Neural Network (ANN) model showcases less than 1% error, enhancing reliability. Correlation analysis highlights robust predictions, emphasizing the importance of steel sheet thickness. The study contributes insights for predicting DPSCW strength in civil engineering, suggesting optimization and database expansion. The research advances precise load capacity estimation, empowering engineers to enhance construction safety and explore further machine learning applications in structural engineering.

The Regional Homogeneity in the Presence of Heteroskedasticity

  • Chung, Kyoun-Sup;Lee, Sang-Yup
    • 한국시스템다이내믹스연구
    • /
    • 제8권2호
    • /
    • pp.25-49
    • /
    • 2007
  • An important assumption of the classical linear regression model is that the disturbances appearing in the population regression function are homoskedastic; that is, they all have the same variance. If we persist in using the usual testing procedures despite heteroskedasticity, what ever conclusions we draw or inferences we make be very misleading. The contribution of this paper will be to the concrete procedure of the proper estimation when the heteroskedasticity does exist in the data, because the quality of dependent variable predictions, i.e., the estimated variance of the dependent variable, can be improved by giving consideration to the issues of regional homogeneity and/or heteroskedasticity across the research area. With respect to estimation, specific attention should be paid to the selection of the appropriate strategy in terms of the auxiliary regression model. The paper shows that by testing for heteroskedasticity, and by using robust methods in the presence of with and without heteroskedasticity, more efficient statistical inferences are provided.

  • PDF