DOI QR코드

DOI QR Code

A note on standardization in penalized regressions

  • Lee, Sangin (Department of Clinical Sciences, University of Texas Southwestern Medical Center)
  • Received : 2015.01.20
  • Accepted : 2015.02.16
  • Published : 2015.03.31

Abstract

We consider sparse high-dimensional linear regression models. Penalized regressions have been used as effective methods for variable selection and estimation in high-dimensional models. In penalized regressions, it is common practice to standardize variables before fitting a penalized model and then fit a penalized model with standardized variables. Finally, the estimated coefficients from a penalized model are recovered to the scale on original variables. However, these procedures produce a slightly different solution compared to the corresponding original penalized problem. In this paper, we investigate issues on the standardization of variables in penalized regressions and formulate the definition of the standardized penalized estimator. In addition, we compare the original penalized estimator with the standardized penalized estimator through simulation studies and real data analysis.

Keywords

References

  1. Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics, 5, 232-253. https://doi.org/10.1214/10-AOAS388
  2. Chiang, A. P., Beck, J. S., Yen, H. J., Tayeh, M. K., Scheetz, T. E., Swiderski, R. E., Sheffeld, V. C. et al. (2006). Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). Proceedings of the National Academy of Sciences, 103, 6287-6292. https://doi.org/10.1073/pnas.0600158103
  3. Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407-499. https://doi.org/10.1214/009053604000000067
  4. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360. https://doi.org/10.1198/016214501753382273
  5. Fan, J. and Li, R. (2001). Variable selection for Cox's proportional hazards model and frailty model. Annals of Statistics, 30, 74-99.
  6. Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432-441. https://doi.org/10.1093/biostatistics/kxm045
  7. Friedman, J., Hastie, T. and Tibshirani, R. (2008). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.
  8. Kim, Y., Choi, H. and Oh, H. S. (2008). Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association, 103, 1665-1673. https://doi.org/10.1198/016214508000001066
  9. Kim, Y. and Kwon, S. (2012). Global optimality of nonconvex penalized estimators. Biometrika, 99, 315-325. https://doi.org/10.1093/biomet/asr084
  10. Kwon, S., Han, S. and Lee, S. (2013). A small review and further studies on the lasso. Journal of the Korean Data & Information Science Society, 24, 1077-1088. https://doi.org/10.7465/jkdi.2013.24.5.1077
  11. Park, C. (2013). Simple principal component analysis using lasso. Journal of the Korean Data & Information Science Society, 24, 533-541. https://doi.org/10.7465/jkdi.2013.24.3.533
  12. Scheetz, T. E., Kim, K. Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudtson, K. L., Dibona, G. F., Stone, E. M. et al. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences, 103, 14429-14434. https://doi.org/10.1073/pnas.0602562103
  13. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267-288.
  14. Tseng, P. (2001). Convergence of a block coordinate descent method for nondi erentiable minimization. Journal of Optimization Theory and Applications, 109, 475-494. https://doi.org/10.1023/A:1017501703105
  15. Van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Annals of Statistics, 36, 614-645. https://doi.org/10.1214/009053607000000929
  16. Yuille, A. L. and Rangarajan, A. (2003). The concave-convex procedure. Neural Computation, 15, 915-936. https://doi.org/10.1162/08997660360581958
  17. Zhang, C. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38, 894-942. https://doi.org/10.1214/09-AOS729

Cited by

  1. Estimation for misclassified data with ultra-high levels vol.27, pp.1, 2016, https://doi.org/10.7465/jkdi.2016.27.1.217
  2. Analysis of multi-center bladder cancer survival data using variable-selection method of multi-level frailty models vol.27, pp.2, 2016, https://doi.org/10.7465/jkdi.2016.27.2.499
  3. Variable selection in Poisson HGLMs using h-likelihoood vol.26, pp.6, 2015, https://doi.org/10.7465/jkdi.2015.26.6.1513