DOI QR코드

DOI QR Code

Principal Components Logistic Regression based on Robust Estimation

로버스트추정에 바탕을 둔 주성분로지스틱회귀

  • 김부용 (숙명여자대학교 통계학과) ;
  • 강명욱 (숙명여자대학교 통계학과) ;
  • 장혜원 ((주)피스트글로벌 종합리스크사업팀)
  • Published : 2009.06.30

Abstract

Logistic regression is widely used as a datamining technique for the customer relationship management. The maximum likelihood estimator has highly inflated variance when multicollinearity exists among the regressors, and it is not robust against outliers. Thus we propose the robust principal components logistic regression to deal with both multicollinearity and outlier problem. A procedure is suggested for the selection of principal components, which is based on the condition index. When a condition index is larger than the cutoff value obtained from the model constructed on the basis of the conjoint analysis, the corresponding principal component is removed from the logistic model. In addition, we employ an algorithm for the robust estimation, which strives to dampen the effect of outliers by applying the appropriate weights and factors to the leverage points and vertical outliers identified by the V-mask type criterion. The Monte Carlo simulation results indicate that the proposed procedure yields higher rate of correct classification than the existing method.

로지스틱회귀분석은 고객관계관리를 위한 데이터마이닝 분야에서 많이 사용되는 기법인데, 이 분야의 모형설정 과정에서는 연관성이 매우 높은 설명변수들이 모형에 함께 포함되어 다중공선성의 문제를 유발하며, 더욱이 회귀자료에 이상점들이 포함되면 최우추정량은 심각한 결함을 갖게 된다. 두 가지 문제점을 동시에 해결하기 위하여 로버스트주성분로지스틱회귀를 적용할 수 있는데, 본 논문에서는 주성분의 선정기준을 결정하는 모형을 개발하고, 주성분모형에서의 추정치에 미치는 이상점의 영향을 축소하기 위한 로버스트추정법을 제안하였다. 제안된 추정법은 다중공선성과 이상점이 유발하는 문제들을 적절히 해결해 준다는 사실이 모의실험을 통하여 확인되었다.

Keywords

References

  1. Aguilera, A. M., Escabias, M. and Valderrama, M. J. (2006). Using principal components for estimating logistic regression with high-dimensional multicollinear data, Computational Statistics & Data Analysis, 50, 1905-1924 https://doi.org/10.1016/j.csda.2005.03.011
  2. Carroll, R. J. and Pederson, S. (1993), On robustness in the logistic regression model, Journal of the Royal Statistical Society, Series E, 55, 693-706
  3. Copas, J. B. (1988). Binary regression models for contaminated data, Journal of the Royal Statistical Society, Series E, 50, 225-265
  4. Croux, C. and Haesbroeck, G. (2003). Implementing the Bianco and Yohai estimator for logistic regression, Computational Statistics & Data Analysis, 44, 273-295 https://doi.org/10.1016/S0167-9473(03)00042-2
  5. Hadi, A. S. (1994). A modification of a method for the detection of outliers in multivariate samples, Journal of the Royal Statistical Society, Series E, 56, 393-396
  6. Hardin, J. and Rocke, D. M. (2004). Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator, Computational Statistics & Data Analysis, 44, 625-638 https://doi.org/10.1016/S0167-9473(02)00280-3
  7. Kim, B. Y. (2005). V-mask type criterion for identification of outliers in logistic regression, The Korean Communications in Statistics, 12, 625-634 https://doi.org/10.5351/CKSS.2005.12.3.625
  8. Kim, B. Y. and Kahng, M. W. (2008). Principal components regression in logistic model, The Korean Journal of Applied Statistics, 21, 571-580 https://doi.org/10.5351/KJAS.2008.21.4.571
  9. Kim, B. Y., Kahng, M. W. and Choi, M. A. (2007). Algorithm for the robust estimation in logistic regression, The Korean Journal of Applied Statistics, 20, 551-559 https://doi.org/10.5351/KJAS.2007.20.3.551
  10. Kordzakhia, N., Mishra, G. D. and Reiersolmoen, L. (2001). Robust estimation in the logistic regression model, Journal of Statistical Planning and Inference, 98, 211-223 https://doi.org/10.1016/S0378-3758(00)00312-8
  11. Mason, R. L. and Gunst, R. F. (1985). Selecting principal components in regression, Statistics & Probability Letters, 3, 299-301 https://doi.org/10.1016/0167-7152(85)90059-8
  12. Rousseeuw, P. J. and Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223 https://doi.org/10.2307/1270566
  13. Rousseeuw, P. J. and Leroy, A. M. (2003). Robust Regression and Outlier Detection, John Wiley & Sons, New York
  14. Schaefer, R. L. (1986). Alternative estimators in logistic regression when the data are collinear, Journal of Statistical Computation and Simulations, 25, 75-91 https://doi.org/10.1080/00949658608810925
  15. Woodruff, D. L. and Rocke, D. M. (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators, Journal of the American Statistical Association, 89, 888-896 https://doi.org/10.2307/2290913

Cited by

  1. Diet and Lifestyle Factors Affecting Obesity: A Korea National Health and Nutrition Survey Analysis vol.16, pp.2, 2011, https://doi.org/10.3746/jfn.2011.16.2.117