DOI QR코드

DOI QR Code

Principal Components Regression in Logistic Model

로지스틱모형에서의 주성분회귀

  • 김부용 (숙명여자대학교 통계학과) ;
  • 강명욱 (숙명여자대학교 통계학과)
  • Published : 2008.08.31

Abstract

The logistic regression analysis is widely used in the area of customer relationship management and credit risk management. It is well known that the maximum likelihood estimation is not appropriate when multicollinearity exists among the regressors. Thus we propose the logistic principal components regression to deal with the multicollinearity problem. In particular, new method is suggested to select proper principal components. The selection method is based on the condition index instead of the eigenvalue. When a condition index is larger than the upper limit of cutoff value, principal component corresponding to the index is removed from the estimation. And hypothesis test is sequentially employed to eliminate the principal component when a condition index is between the upper limit and the lower limit. The limits are obtained by a linear model which is constructed on the basis of the conjoint analysis. The proposed method is evaluated by means of the variance of the estimates and the correct classification rate. The results indicate that the proposed method is superior to the existing method in terms of efficiency and goodness of fit.

로지스틱회귀분석은 고객관계관리나 신용위험관리 등의 분야에서 많이 사용되는 기법인데, 이러한 분야에서의 로지스틱회귀모형에는 연관성이 높은 설명변수들이 다수 포함되어 다중공선성의 문제를 유발하는 경우가 있다. 다중공선성이 존재하는 상황에서 최우추정량은 심각한 결함을 갖는다는 사실은 잘 알려졌다. 이 문제를 해결하기 위하여 로지스틱주성분회귀를 연구하되, 분석상의 주요 과정인 주성분 선정을 위한 방법을 새롭게 제안하였다. 추정량의 분산을 최소가 되게 하는 상태지수 값을 측정하고, 이 값에 영향을 미치는 주요 요인들을 컨조인트분석에 의해 파악하여 주성분 선정기준을 결정하는 모형을 구축하였다. 제안된 방법은 다중공선성 문제를 적절히 해결하면서도 모형의 적합성을 향상시킨다는 사실이 모의실험을 통하여 확인되었다.

Keywords

References

  1. Aguilera, A. M., Escabias, M. and Valderrama, M. J. (2006). Using principal components for estimating logistic regression with high-dimensional multicollinear data, Computational Statistics & Data Analysis, 50, 1905-1924 https://doi.org/10.1016/j.csda.2005.03.011
  2. Hadi, A. S. and Ling, R. F. (1998). Some cautionary notes on the use of principle components regression, The American Statistician, 52, 15-19 https://doi.org/10.2307/2685559
  3. Hosmer, D. W. and Lemeshow, S. (2000). Applied Logistic Regression, John Wiley & Sons, New York
  4. Jolliffe, I. T. (1982). A note on the use of principal components in regression, Applied Statistics, 31, 300-303 https://doi.org/10.2307/2348005
  5. Kim, B. Y. (2005). V-mask type criterion for identification of outliers in logistic regression, The Korean Communications in Statistics, 12, 625-634 https://doi.org/10.5351/CKSS.2005.12.3.625
  6. Mansfield, E. R., Webster, J. T. and Gunst, R. F. (1977). An analytic variable selection technique for principal component regression, Applied Statistics, 26, 34-40 https://doi.org/10.2307/2346865
  7. Mason, R. L. and Gunst, R. F. (1985). Selecting principal components in regression, Statistics & Probability Letters, 3, 299-301 https://doi.org/10.1016/0167-7152(85)90059-8
  8. Schaefer, R. L. (1986). Alternative estimators in logistic regression when the data are collinear, Journal of Statistical Computation and Simulations, 25, 75-91 https://doi.org/10.1080/00949658608810925