DOI QR코드

DOI QR Code

An Outlier Detection Method in Penalized Spline Regression Models

벌점 스플라인 회귀모형에서의 이상치 탐지방법

  • Seo, Han Son (Department of Applied Statistics, Konkuk University) ;
  • Song, Ji Eun (Department of Applied Statistics, Konkuk University) ;
  • Yoon, Min (Department of Statistics, Pukyong National University)
  • 서한손 (건국대학교 응용통계학과) ;
  • 송지은 (건국대학교 응용통계학과) ;
  • 윤민 (부경대학교 통계학과)
  • Received : 2013.07.15
  • Accepted : 2013.08.21
  • Published : 2013.08.31

Abstract

The detection and the examination of outliers are important parts of data analysis because some outliers in the data may have a detrimental effect on statistical analysis. Outlier detection methods have been discussed by many authors. In this article, we propose to apply Hadi and Simonoff's (1993) method to penalized spline a regression model to detect multiple outliers. Simulated data sets and real data sets are used to illustrate and compare the proposed procedure to a penalized spline regression and a robust penalized spline regression.

이상치가 존재하는 경우 모형 적합의 결과가 왜곡될 수 있기 때문에 이상치 탐색은 데이터분석에 있어서 매우 중요하다. 이상치 탐지 방법은 많은 학자들에 의해 연구되어 왔다. 본 논문에서는 Hadi와 Simonoff (1993)가 제안한 직접적 이상치 탐지 방법을 벌점 스플라인 회귀모형에 적용하여 이상치를 탐지하는 과정을 제안하며 모의실험과 실제 데이터에 적용을 통하여 스플라인 회귀모형, 강건 벌점 스플라인 회귀모형과 효율성을 비교한다.

Keywords

References

  1. Cantoni, E. and Ronchetti, E. (2001). Resistant selection of the smoothing parameter for smoothing splines, Statistics and Computing, 11, 141-146. https://doi.org/10.1023/A:1008975231866
  2. Davies, L. and Gather, U. (1993). The identification of multiple outliers (with discussion), Journal of the American Statistical Association, 88, 782-801. https://doi.org/10.1080/01621459.1993.10476339
  3. Gentleman, J. F. and Wilk, M. B. (1975). Detecting outliers.II.Supplementing The Direct Analysis of Residuals, Biometrics, 31, 387-410. https://doi.org/10.2307/2529428
  4. Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the Identification of Multiple Outliers in Linear Models, Journal of the American Statistical Association, 88, 1264-1272. https://doi.org/10.1080/01621459.1993.10476407
  5. Hoeting, J., Raftery, A. E. and Madigan, D. (1996). A method for simultaneous variable selection and outlier identification in linear regression, Computational Statistics & Data Analysis, 22, 251-270. https://doi.org/10.1016/0167-9473(95)00053-4
  6. Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo, Annals of Statistics, 1, 799-821. https://doi.org/10.1214/aos/1176342503
  7. Kianifard, F. and Swallow, W. H. (1989). Using recursive residuals, calculated on adaptively-ordered observations, to identify outliers in linear regression, Biometrics, 45, 571-585. https://doi.org/10.2307/2531498
  8. Kovac, A. and Silverman, B. W. (2000). Extending the scope of wavelet regression methods by coefficient dependent thresholding, Journal of the American Statistical Association, 95, 172-183. https://doi.org/10.1080/01621459.2000.10473912
  9. Lee, T. C. M. and Oh, H.-S. (2007). Robust penalized regression spline fitting with application to additive mixed modeling, Computational Statistics, 22, 159-171. https://doi.org/10.1007/s00180-007-0031-6
  10. Marasinghe, M. G. (1985). A Multistage procedure for detecting several outliers in linear regression, Technometrics, 27, 395-399. https://doi.org/10.1080/00401706.1985.10488078
  11. Neter, J., Wasserman, W. and Kutner, M. H. (1990). Applied Linear Statistical Models (3rd ed.), Richard D. Irwin, Homewood.
  12. Rousseeuw, P. J. (1984). Least median of squares regression, Journal of the American Statistical Association, 79, 871-880. https://doi.org/10.1080/01621459.1984.10477105
  13. Ruppert, D. and Wand, M. P. (2003). Semiparametric Regression, Cambridge University Press, Cambridge.