MULTIPLE OUTLIER DETECTION IN LOGISTIC REGRESSION BY USING INFLUENCE MATRIX

  • Lee, Gwi-Hyun (Department of Statistics, Seoul National University) ;
  • Park, Sung-Hyun (Department of Statistics, Seoul National University)
  • Published : 2007.12.31

Abstract

Many procedures are available to identify a single outlier or an isolated influential point in linear regression and logistic regression. But the detection of influential points or multiple outliers is more difficult, owing to masking and swamping problems. The multiple outlier detection methods for logistic regression have not been studied from the points of direct procedure yet. In this paper we consider the direct methods for logistic regression by extending the $Pe\tilde{n}a$ and Yohai (1995) influence matrix algorithm. We define the influence matrix in logistic regression by using Cook's distance in logistic regression, and test multiple outliers by using the mean shift model. To show accuracy of the proposed multiple outlier detection algorithm, we simulate artificial data including multiple outliers with masking and swamping.

Keywords

References

  1. BECKMAN, R. J. AND COOK, R. D. (1983). 'Outlier ... s', Technometrics, 25, 119-163 https://doi.org/10.2307/1268541
  2. BIANCO, A. M. AND YOHAI, V. J. (1996). 'Robust estimation in the logistic regression model', In Robust Statistics, Data Analysis, and Computer Intensive Methods; Lecture Notes in Statistics 109 (Rieder, H. ed.), 17-34, Springer-Verlag, New York
  3. CANTONI, E. AND RONCHETTI, E. (2001). 'Robust inference for generalized linear models', Journal of the American Statistical Association, 96, 1022-1030 https://doi.org/10.1198/016214501753209004
  4. CHATTERJEE, S. AND HADI, A. S. (1986). 'Influential observations, high leverage points, and outliers in linear regression', Statistical Science, 1, 379-416 https://doi.org/10.1214/ss/1177013622
  5. COOK, R. D. (1979). 'Influential observations in linear regression', Journal of the American Statistical Association, 74, 169-174 https://doi.org/10.2307/2286747
  6. CROUX, C. AND HAESBROECK, G. (2003). 'Implementing the Bianco and Yohai estimator for logistic regression', Computational Statistics & Data Analysis, 44, 273-295 https://doi.org/10.1016/S0167-9473(03)00042-2
  7. PREGIBON, D. (1981). 'Logistic regression diagnostics', The Annals of Statistics, 9, 705-724 https://doi.org/10.1214/aos/1176345513
  8. PENA, D. AND YOHAI, V. J. (1995). 'The detection of influential subsets in linear regression by using an influence matrix', Journal of the Royal Statistical Society, Ser. B, 57, 145-156
  9. RYAN, T. P. (1996). Modern Regression Methods, John Wiley & Sons, New York
  10. WILLIAMS, D. A. (1987). 'Generalized linear model diagnostics using the deviance and single case deletions', Journal of the Royal Statistical Society, Ser. C, 36, 181-191