DOI QR코드

DOI QR Code

Clustering Observations for Detecting Multiple Outliers in Regression Models

  • Seo, Han-Son (Department of Applied Statistics, Konkuk University) ;
  • Yoon, Min (Department of Statistics, Pukyong National University)
  • Received : 2012.03.02
  • Accepted : 2012.04.17
  • Published : 2012.06.30

Abstract

Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.

Keywords

References

  1. Ahn, B. J. and Seo, H. S. (2011). Outlier detection using dynamic plots, The Korean Journal of Applied Statistics, 24, 979-986. https://doi.org/10.5351/KJAS.2011.24.5.979
  2. Atkinson, A. C. (1994). Fast very robust methods for the detection of multiple outliers, Journal of the American Statistical Association, 89, 1329-1339. https://doi.org/10.1080/01621459.1994.10476872
  3. Atkinson, A. C., Riani, M. and Cerioli, A. (2004). Exploring Multivariate Data with The Forward Search, Springer, New York.
  4. Cormack, R. M. (1971). A review of classification, Journal of the Royal Statistical Society, Series A, 134, 321-367. https://doi.org/10.2307/2344237
  5. Gentleman, J. F. and Wilk, M. B. (1975). Detecting outliers.II. supplementing the direct analysis of residuals, Biometrics, 31, 387-410. https://doi.org/10.2307/2529428
  6. Gray, J. B. and Ling, R. F. (1984). K-clustering as a detection tool for influential subsets in regression, Technometrics, 26, 305-318.
  7. Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272. https://doi.org/10.1080/01621459.1993.10476407
  8. Jajo, N. K. (2005). A review of Robust regression an diagnostic procedures in linear regression, Acta Mathematicae Applicatae Sinica, 21, 209-224. https://doi.org/10.1007/s10255-005-0230-2
  9. Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York.
  10. Kianifard, F. and Swallow, W. H. (1989). Using recursive residuals, calculated on adaptively-ordered observations, to identify outliers in linear regression, Biometrics, 45, 571-585. https://doi.org/10.2307/2531498
  11. Kianifard, F. and Swallow, W. H. (1990). A Monte Carlo comparison of five procedures for identifying outliers in linear regression, Communications in Statistics, 19, 1913-1938. https://doi.org/10.1080/03610929008830300
  12. Ling, R. F. (1972). On the theory and construction of k-clusters, Computer Journal, 15, 326-332. https://doi.org/10.1093/comjnl/15.4.326
  13. Marasinghe, M. G. (1985). A multistage procedure for detecting several outliers in linear regression, Technometrics, 27, 395-399. https://doi.org/10.1080/00401706.1985.10488078
  14. Paul, S. R. and Fung, K. Y. (1991). A generalized extreme studentized residual multiple-outlier-detection procedure in linear regression, Technometrics, 33, 339-348. https://doi.org/10.1080/00401706.1991.10484839
  15. Pena, D. and Yohai, V. J. (1999). A fast procedure for outlier diagnostics in linear regression problems, Journal of the American Statistical Association, 94, 434-445.
  16. Rousseeuw, P. J. (1984). Least median of squares regression, Journal of the American Statistical Association, 79, 871-880. https://doi.org/10.1080/01621459.1984.10477105