DOI QR코드

DOI QR Code

일반화가법모형에서 축소방법의 적용연구

A Study on Applying Shrinkage Method in Generalized Additive Model

  • 기승도 (한국외국어대학교 정보통계학과) ;
  • 강기훈 (한국외국어대학교 정보통계학과)
  • Ki, Seung-Do (Department of Statistics, Hankuk University of Foreign Studies, Korea Insurance Research Institute) ;
  • Kang, Kee-Hoon (Department of Statistics, Hankuk University of Foreign Studies)
  • 투고 : 20100100
  • 심사 : 20100200
  • 발행 : 2010.02.28

초록

일반화가법모형은 기존 선형회귀모형의 문제점을 대부분 해결한 통계모형이지만 의미있는 독립변수의 수를 줄이는 방법이 적용되지 않을 경우 과대적합 문제가 발생할 수 있다. 그러므로 일반화가법모형에서 변수 축소방법을 적용하는 연구가 필요하다. 회귀분석에서 변수 축소방법으로 최근에는 Lasso 계열의 접근법이 연구되고 있다. 본 연구에서는 활용성이 높은 통계모형인 일반화가법모형에 Lasso 계열의 모형 중에서 Group Lasso와 Elastic net 모형을 적용하는 방법을 제시하고 이들의 해를 구하는 절차를 제안하였다. 그리고 제안된 방법을 모의실험과 실제자료인 회계년도 2005년 자동차보혐 자료에 적용을 통해 비교하여 보았다. 그 결과 본 논문에서 제안한 Group Lasso와 Elastic net을 이용하여 변수 축소를 통한 일반화가법모형이 기존의 방법보다 더 나은 결과를 제공하는 것으로 분석 되었다.

Generalized additive model(GAM) is the statistical model that resolves most of the problems existing in the traditional linear regression model. However, overfitting phenomenon can be aroused without applying any method to reduce the number of independent variables. Therefore, variable selection methods in generalized additive model are needed. Recently, Lasso related methods are popular for variable selection in regression analysis. In this research, we consider Group Lasso and Elastic net models for variable selection in GAM and propose an algorithm for finding solutions. We compare the proposed methods via Monte Carlo simulation and applying auto insurance data in the fiscal year 2005. lt is shown that the proposed methods result in the better performance.

키워드

참고문헌

  1. Bakin, S. (1999). Adaptive regression and model selection in data mining problems, Ph.D. Dissertation, The Australian National University, Canberra.
  2. Fu, W. (1998). Penalized regressions; The Bridge versus the Lasso, Journal of Computational and Graphical Statistics, 7, 397-416. https://doi.org/10.2307/1390712
  3. Genkin, A., Lewis, D. D. and Madigan, D. (2007). Large-scale bayesian logistic regression for text categorization, Technometrics, 49, 291-304. https://doi.org/10.1198/004017007000000245
  4. Hastie, T. and Tibshirani, R. (1986). Generalized additive models (with discussion), Statistical Science, 1, 297-318. https://doi.org/10.1214/ss/1177013604
  5. Kim, Y., Kim, J. and Kim, Y. (2006). Blockwise sparse regression, Statistica Sinica, 16, 375-390.
  6. Krishnapuram, B., Carin, L., Figueiredo, M. A. and Hartemink, A. J. (2005). Sparse multinomial logistic regression; Fast algorithms and generalization bounds, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 957-968. https://doi.org/10.1109/TPAMI.2005.127
  7. Lokhorst, J. (1999). The Lasso and generalized linear models, Honors Project, University of Adelaide, Adelaide.
  8. Meier, L., van de Geer, S. and Buhlmann, P. (2008). The Group Lasso for logistic regression, Journal of the Royal Statistical Society, 70, 53-71. https://doi.org/10.1111/j.1467-9868.2007.00627.x
  9. Roth, V. (2004). The generalized Lasso, IEEE Transactions on Neural Networks, 15, 16-28. https://doi.org/10.1109/TNN.2003.809398
  10. Shevade, S. and Keerthi, S. (2003). A simple and efficient algorithm for gene selection using sparse logistic regression, Bioinformatics, 19, 2246-2253. https://doi.org/10.1093/bioinformatics/btg308
  11. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society, 68, 49-67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
  12. Zhao, P., Rocha, G. and Yu, B. (2006). Grouped and hierarchical model selection through composite absolute penalties, Technical Report, University of California at Berkeley, Department of Statistics.
  13. Zhou, N. and Zhu, J. (2007). Group variable selection via hierarchical Lasso and its oracle property, manuscript.
  14. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the Elastic net, Journal of the Royal Statistical Society, 67, 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x