Variable Selection in Frailty Models using FrailtyHL R Package: Breast Cancer Survival Data

frailtyHL 통계패키지를 이용한 프레일티 모형의 변수선택: 유방암 생존자료

  • Kim, Bohyeon (Department of Statistics, Pukyong National University) ;
  • Ha, Il Do (Department of Statistics, Pukyong National University) ;
  • Noh, Maengseok (Department of Statistics, Pukyong National University) ;
  • Na, Myung Hwan (Department of Statistics, Chonnam National University) ;
  • Song, Ho-Chun (Department of Nuclear Medicine, Chonnam National University Hospital) ;
  • Kim, Jahae (Department of Nuclear Medicine, Chonnam National University Hospital)
  • Received : 2015.07.27
  • Accepted : 2015.08.06
  • Published : 2015.10.31


Determining relevant variables for a regression model is important in regression analysis. Recently, a variable selection methods using a penalized likelihood with various penalty functions (e.g. LASSO and SCAD) have been widely studied in simple statistical models such as linear models and generalized linear models. The advantage of these methods is that they select important variables and estimate regression coefficients, simultaneously; therefore, they delete insignificant variables by estimating their coefficients as zero. We study how to select proper variables based on penalized hierarchical likelihood (HL) in semi-parametric frailty models that allow three penalty functions, LASSO, SCAD and HL. For the variable selection we develop a new function in the "frailtyHL" R package. Our methods are illustrated with breast cancer survival data from the Medical Center at Chonnam National University in Korea. We compare the results from three variable-selection methods and discuss advantages and disadvantages.


Supported by : 한국연구재단


  1. Androulakis, E., Koukouvinos, C. and Vonta, F. (2012). Estimation and variable selection via frailty models with penalized likelihood, Statistics in Medicine, 31, 2223-2239.
  2. Breiman, L. (1996). Heuristics of instability and stabilization in model selection, The Annals of Statistics, 24, 2350-2383.
  3. Breslow, N. E. (1972). Discussion of Professor Cox's paper, Journal of the Royal Statistical Society B, 34, 216-217.
  4. Clayton, D. G. (1991). A Monte Carlo method for Bayesian inference in frailty models, Biometrics, 47, 467-480.
  5. Cox, D. R. (1972). Regression models and life tables (with Discussion), Journal of the Royal Statistical Society B, 74, 187-220.
  6. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 1348-1360.
  7. Fan, J. and Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model, The Annals of Statistics, 30, 74-99.
  8. Ha, I. D. and Lee, Y. (2003). Estimating frailty models via Poisson hierarchical generalized linear models, Journal of Computational and Graphical Statistics, 12, 663-681.
  9. Ha, I. D. and Lee, Y. (2005). Comparison of hierarchical likelihood versus orthodox best linear unbiased predictor approaches for frailty models, Biometrika, 92, 717-723.
  10. Ha, I. D., Lee, Y. and Song, J.-K. (2001). Hierarchical likelihood approach for frailty models, Biometrika, 88, 233-243.
  11. Ha, I. D., Noh, M. and Lee, Y. (2012). frailtyHL: A package for fitting frailty models with h-likelihood, The R Journal, 4, 307-320.
  12. Ha, I. D., Pan, J., Oh, S. and Lee, Y. (2014). Variable selection in general frailty models using penalized h-likelihood, Journal of Computational and Graphical Statistics, 23, 1044-1060.
  13. Ha, I. D., Sylvester, R., Legrand, C. and MacKenzie, G. (2011). Frailty modelling for survival data from multi-centre clinical trials, Statistics in Medicine, 30, 28-37.
  14. Hougaard, P. (2000). Analysis of Multivariate Survival Data, Springer, New York.
  15. Lee, Y. and Nelder, J. A. (1996). Hierarchical generalized linear models (with discussion), Journal of the Royal Statistical Society B, 58, 619-678.
  16. Lee, Y. and Oh, H. S. (2014). A new sparse variable selection via random-effect model, Journal of Multivariate Analysis, 125, 89-99.
  17. Lee, Y., Nelder, J. A. and Pawitan, Y. (2006). Generalized Linear Models with Random Effects: Unified Analysis via H-Likelihood, Chapman and Hall, London.
  18. Legrand, C, Ducrocq, V., Janssen, P., Sylvester, R. and Duchateau, L. (2005). A Bayesian approach to jointly estimate centre and treatment by centre heterogeneity in a proportional hazards model, Statistics in Medicine, 24, 3789-3804.
  19. Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models, Journal of the Royal Statistical Society, Series A, 135, 370-384.
  20. Ripatti, S. and Palmgren. J. (2000). Estimation of multivariate frailty models using penalized partial likelihood, Biometrics, 56, 1016-1022.
  21. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society Series B, 58, 267-288.
  22. Tibshirani, R. (1997). The LASSO method for variable selection in the Cox Model, Statistics in Medicine, 16, 385-395.<385::AID-SIM380>3.0.CO;2-3
  23. Vaida, F. and Xu, R. (2000). Proportional hazards models with random effects, Statistics in Medicine, 19, 3309-3324.<3309::AID-SIM825>3.0.CO;2-9