DOI QR코드

DOI QR Code

Variable selection in partial linear regression using the least angle regression

부분선형모형에서 LARS를 이용한 변수선택

  • Seo, Han Son (Department of Applied Statistics, Konkuk University) ;
  • Yoon, Min (Department of Applied Mathematics, Pukyong National University) ;
  • Lee, Hakbae (Department of Applied Statistics, Yonsei University)
  • 서한손 (건국대학교 응용통계학과) ;
  • 윤민 (부경대학교 응용수학과) ;
  • 이학배 (연세대학교 응용통계학과)
  • Received : 2021.08.10
  • Accepted : 2021.08.16
  • Published : 2021.12.31

Abstract

The problem of selecting variables is addressed in partial linear regression. Model selection for partial linear models is not easy since it involves nonparametric estimation such as smoothing parameter selection and estimation for linear explanatory variables. In this work, several approaches for variable selection are proposed using a fast forward selection algorithm, least angle regression (LARS). The proposed procedures use t-test, all possible regressions comparisons or stepwise selection process with variables selected by LARS. An example based on real data and a simulation study on the performance of the suggested procedures are presented.

본 연구는 부분선형모형에서 변수선택의 문제를 다룬다. 부분선형모형은 평활화모수 추정과 같은 비모수 추정과 선형설명변수에 대한 추정의 문제를 함께 포함하고 있어 변수선택이 쉽지 않다. 본 연구에서는 빠른 전진선택법인 LARS 를 이용한 변수선택법을 제시한다. 제안된 방법은 LARS에 의하여 선별된 변수들에 대하여 t-검정, 가능한 모든 회귀모형 비교 또는 단계별 선택법을 적용한다. 제안된 방법들의 효율성을 비교하기 위하여 실제데이터에 적용한 예제와 모의실험 결과가 제시된다.

Keywords

Acknowledgement

이 논문은 2021학년도 건국대학교의 연구년교원 지원에 의하여 연구되었음.

References

  1. Akaike H (1973). Information theory and an extension of the maximum likelihood principle. In Proceedings of the 2nd International Symposium on Information Theory, 267-281, Budapest.
  2. Akaike H (1974). A new look a the statistical model identification, IEEE Transactions on Automatic Control, 19, 716-723. https://doi.org/10.1109/TAC.1974.1100705
  3. Aneiros G, Ferraty F, and Vieu P (2015). Variable selection in partial linear regression with functional covariate, Statistics, 49, 1322-1347. https://doi.org/10.1080/02331888.2014.998675
  4. Bunea F (2004). Consistent covariate selection and post model selection inference in semiparametric regression, Annals of Statistics, 32, 898-927. https://doi.org/10.1214/009053604000000247
  5. Bunea F and Wegkamp M (2004). Two-stage model selection procedures in partially linear regression, The Canadian Journal of Statistics, 32, 105-118. https://doi.org/10.2307/3315936
  6. Chen, H. and Chen, K.(1991). Selection of the splined variables and convergence rates in a partial spline model, The Canadian Journal of Statistics, 19, 323-339. https://doi.org/10.2307/3315397
  7. Efron B, Hastie T, Johnstone I, and Tibshirani R (2004). Least angle regression, The Annals of Statistics, 32, 407-499. https://doi.org/10.1214/009053604000000067
  8. Fan J and Li R (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis, Journal of American Statistical Association, 99, 710-723. https://doi.org/10.1198/016214504000001060
  9. Fan J and Peng H (2004). Nonconcave penalized likelihood with a diverging number of parameters, Annals of Statistics, 32, 928-961. https://doi.org/10.1214/009053604000000256
  10. Hardle W, Liang H, and Gao J (2000). Partially Linear Models, Physica-Verlag, Heidelberg.
  11. McCann L and Welsch R (2007). Robust variable selection using least angle regression and elemental set sampling, Computational Statistics and Data Analysis, 52, 249-257. https://doi.org/10.1016/j.csda.2007.01.012
  12. Ni X, Zhang H, and Zhang D (2009). Automatic model selection for partially linear models, Journal of Multivariate Analysis, 100, 2100-2111. https://doi.org/10.1016/j.jmva.2009.06.009
  13. Schwarz G (1978). Some comments on Cp, Technometrics, 15, 662-676.
  14. Xie H and Huang J (2009). SCAD-penalized regression in high-dimensional partially linear model, Annals of Statistics, 37, 673-696. https://doi.org/10.1214/07-AOS580