DOI QR코드

DOI QR Code

V-mask Type Criterion for Identification of Outliers In Logistic Regression

  • Kim Bu-Yong (Department of Statistics, Sookmyung Women's University)
  • Published : 2005.12.01

Abstract

A procedure is proposed to identify multiple outliers in the logistic regression. It detects the leverage points by means of hierarchical clustering of the robust distances based on the minimum covariance determinant estimator, and then it employs a V-mask type criterion on the scatter plot of robust residuals against robust distances to classify the observations into vertical outliers, bad leverage points, good leverage points, and regular points. Effectiveness of the proposed procedure is evaluated on the basis of the classic and artificial data sets, and it is shown that the procedure deals very well with the masking and swamping effects.

Keywords

References

  1. 김순귀, 정동빈, 박영술(2003). SPSS를 활용한 로지스틱회귀모형의 이해와 응용, 데이터솔루션
  2. Becker, C. and Gather, U.(1999), The masking breakdown point of multivariate outlier identification rules, Journal of the American Statistical Association, Vol. 94, 947-955 https://doi.org/10.2307/2670009
  3. Gnanadesikan, R. and Kettenring, J.(1972). Robust estimates, residuals, and outlier detection with multiresponse data, Biometrics, Vol. 28, 81-124 https://doi.org/10.2307/2528963
  4. Hadi, A. S.(1994). A modification of a method for the detection of outliers in multivariate samples, Journal of the Royal Statistical Society, Vol. 56, 393-396
  5. Hardin, J. and Rocke, D. M.(2004). Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator, Computational Statistics & Data Analysis, Vol. 44, 625-638 https://doi.org/10.1016/S0167-9473(02)00280-3
  6. Hosmer, D. W. and Lemeshow, S.(2000). Applied Logistic Regression, John Wiley & Sons
  7. Jennings, D. E.(1986). Outliers and residual distributions in logistic regression, Journal of the American Statistical Association, Vol. 81, 987-990 https://doi.org/10.2307/2289072
  8. Kim, B. Y. and Oh, M. H.(2004). Identification of regression outliers based on the clustering of LMS-residual plots, The Korean Communications in Statistics, Vol. 11, 485-494 https://doi.org/10.5351/CKSS.2004.11.3.485
  9. Kosinski, A. S.(1999). A procedure for the detection of multivariate outliers, Computational Statistics & Data Analysis, Vol. 29, 145-161 https://doi.org/10.1016/S0167-9473(98)00073-5
  10. Mardia, K., Kent, J. and Bibby, J.(1979). Multivariate Analysis, Academic Press
  11. Pregibon, D.(1981) Logistic regression diagnostics, The Annals of Statistics, Vol. 9, 705-724 https://doi.org/10.1214/aos/1176345513
  12. Pregibon, D.(1982). Resistant fits for some commonly used logistic models with medical applications, Biometrics, Vol. 38, 485-498 https://doi.org/10.2307/2530463
  13. Rocke, D. M. and Woodruff, D. L.(1996). Identification of outliers in multivariate data, Journal of the American Statistical Association, Vol. 91, 1047-1061 https://doi.org/10.2307/2291724
  14. Rocke, D. M. and Woodruff, D. L.(1997). Robust estimation of multivariate location and shape. Journal of Statistical Planning and Inference, Vol. 57, 245-255 https://doi.org/10.1016/S0378-3758(96)00047-X
  15. Rousseeuw, P. J.(1985). Multivariate estimation with high breakdown point, Mathematical Statistics and Applications, Vol. B, eds. W. Grossmann, G. Pflug, I. Vincze, and W. Werz
  16. Rousseeuw, P. J. and Driessen, K.(1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, Vol. 41, 212-223 https://doi.org/10.2307/1270566
  17. Rousseeuw, P. J. and Leroy, A. M.(2003). Robust Regression and Outlier Detection, Wiley- Interscience
  18. Viljoen, H. and Venter, J. H.(2002). Identifying multivariate discordant observations: a computer-intensive approach, Computational Statistics & Data Analysis, Vol. 40, 159-172 https://doi.org/10.1016/S0167-9473(01)00103-7
  19. Woodruff, D. L. and Rocke, D. M.(1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators, Journal of the American Statistical Association, Vol. 89, 888-896 https://doi.org/10.2307/2290913