DOI QR코드

DOI QR Code

로지스틱회귀모형의 로버스트 추정을 위한 알고리즘

Algorithm for the Robust Estimation in Logistic Regression

  • 김부용 (숙명여자대학교 통계학과) ;
  • 강명욱 (숙명여자대학교 통계학과) ;
  • 최미애 (삼성전자 컴퓨터사업부)
  • Kim, Bu-Yong (Department of Statistics, Sookmyung Women's University) ;
  • Kahng, Myung-Wook (Department of Statistics, Sookmyung Women's University) ;
  • Choi, Mi-Ae (Computer Systems Division, Samsung Electronics Co.)
  • 발행 : 2007.11.30

초록

로지스틱회귀에서 일반적으로 사용되는 최대우도추정법은 이상점에 대해 로버스트 하지 않다. 따라서 본 논문에서는 로지스틱회귀모형의 로버스트 추정을 위한 알고리즘을 제안하고자 한다. 이 알고리즘은 V-마스크 형태의 경계기준에 의해 나쁜 지렛점과 수직이상점을 식별하고, 식별 결과를 바탕으로 이상점의 영향력을 감소시키기 위한 효과적인 방안을 모색한다. 이상점의 영향력 감소는 가중치와 조정치를 적절히 선정함으로 가능하며, 그 결과 붕괴점이 높은 추정치를 얻게 된다. 제안된 알고리즘을 다양한 자료에 적용하여 정분류율을 측정하여 비교하였는데, 새로운 알고리즘이 최대우도추정보다 정확한 분류를 해 주는 것으로 평가되었다.

The maximum likelihood estimation is not robust against outliers in the logistic regression. Thus we propose an algorithm for the robust estimation, which identifies the bad leverage points and vertical outliers by the V-mask type criterion, and then strives to dampen the effect of outliers. Our main finding is that, by an appropriate selection of weights and factors, we could obtain the logistic estimates with high breakdown point. The proposed algorithm is evaluated by means of the correct classification rate on the basis of real-life and artificial data sets. The results indicate that the proposed algorithm is superior to the maximum likelihood estimation in terms of the classification.

키워드

참고문헌

  1. Bianco, A. M. and Yohai, V. J. (1996). Robust estimation in the logistic regression model, Robust Statistics, Data Analysis, and Computer Intensive Methods (Rieder, H. ed.), 17-34, Springer-Verlag, New York
  2. Carroll, R. J. and Pederson, S. (1993). On robustness in the logistic regression model, Journal of the Royal Statistical Society, Ser. B, 55, 693-706
  3. Copas, J. B. (1988). Binary regression models for contaminated data, Journal of the Royal Statistical Society, Ser. B, 50, 225-265
  4. Croux, C. and Haesbroeck, G. (2003). Implementing the Bianco and Yohai estimator for logistic regression, Computational Statistics & Data Analysis, 44, 273-295 https://doi.org/10.1016/S0167-9473(03)00042-2
  5. Finney, D. J. (1947). The estimation from individual records of the relationship between dose and quantal response, Biometrika, 34, 320-334 https://doi.org/10.1093/biomet/34.3-4.320
  6. Gnanadesikan, R. and Kettenring, J. R. (1972). Robust estimates, residuals, and outlier detection with multiresponse data, Biometrics, 28, 81-124 https://doi.org/10.2307/2528963
  7. Hardin, J. and Rocke, D. M. (2004). Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator, Computational Statistics & Data Analysis, 44, 625-638 https://doi.org/10.1016/S0167-9473(02)00280-3
  8. Jennings, D. E. (1986). Outliers and residual distributions in logistic regression, Journal of the American Statistical Association, 81, 987-990 https://doi.org/10.2307/2289072
  9. Kim, B. Y. (2005). V-mask type criterion for identification of outliers in logistic regression, The Korean Communications in Statistics, 12, 625-634 https://doi.org/10.5351/CKSS.2005.12.3.625
  10. Kordzakhia, N., Mishra, G. D. and Reiersolmoen, L. (2001). Robust estimation in the logistic regression model, Journal of Statistical Planning and Inference, 98, 211-223 https://doi.org/10.1016/S0378-3758(00)00312-8
  11. Krall, J. M., Uthoff, V. A. and Harley, J. B. (1975). A step-up procedure for selecting variables associated with survival, Biometrics, 31, 49-57 https://doi.org/10.2307/2529709
  12. Pregibon, D. (1981). Logistic regression diagnostics, The Annals of Statistics, 9, 705-724 https://doi.org/10.1214/aos/1176345513
  13. Pregibon, D. (1982). Resistant fits for some commonly used logistic models with medical applications, Biometrics, 38, 485-498 https://doi.org/10.2307/2530463
  14. Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point, Mathematical Statistics and Applications (Grossmann, W., Pflug, G., Vincze, I. and Wertz, W. eds.), 283-297, Reidel, Dordrecht
  15. Rousseeuw, P. J. and Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223 https://doi.org/10.2307/1270566
  16. Rousseeuw, P. J. and Leroy, A. M. (2003). Robust Regression and Outlier Detection, Wiley-Interscience, New York
  17. Woodruff, D. L. and Rocke, D. M. (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators, Journal of the American Statistical Association, 89, 888-896 https://doi.org/10.2307/2290913