DOI QR코드

DOI QR Code

ELCIC: An R package for model selection using the empirical-likelihood based information criterion

  • Chixiang Chen (Division of Biostatistics and Bioinformatics, University of Maryland School of Medicine) ;
  • Biyi Shen (Regeneron Pharmaceuticals) ;
  • Ming Wang (Department of Population and Quantitative Health Sciences, Case Western Reserve University)
  • Received : 2022.10.05
  • Accepted : 2023.05.12
  • Published : 2023.07.31

Abstract

This article introduces the R package ELCIC (https://cran.r-project.org/web/packages/ELCIC/index.html), which provides an empirical likelihood-based information criterion (ELCIC) for model selection that includes, but is not limited to, variable selection. The empirical likelihood is a semi-parametric approach to draw statistical inference that does not require distribution assumptions for data generation. Therefore, ELCIC is more robust and versatile in the context of model selection compared to the currently existing information criteria. This paper illustrates several applications of ELCIC, including its use in generalized linear models, generalized estimating equations (GEE) for longitudinal data, and weighted GEE (WGEE) for missing longitudinal data under the mechanisms of missing at random and dropout.

Keywords

References

  1. Akaike H (1974). A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19, 716-723. https://doi.org/10.1109/TAC.1974.1100705
  2. Bible J, Beck JD, and Datta S (2016). Cluster adjusted regression for displaced subject data, Biometrics, 72, 441-451. https://doi.org/10.1111/biom.12456
  3. Chen B, Yi GY, and Cook RJ (2010). Weighted generalized estimating functions for longitudinal response and covariate data that are missing at random, Journal of the American Statistical Association, 105, 336-353. https://doi.org/10.1198/jasa.2010.tm08551
  4. Chen C, Han P, and He F (2022). Improving main analysis by borrowing information from auxiliary data, Statistics in Medicine, 41, 567-579. https://doi.org/10.1002/sim.9252
  5. Chen C, Shen B, Liu A, Wu R, and Wang M (2021). A multiple robust propensity score method 8 for longitudinal analysis with intermittent missing data, Biometrics, 77, 519-532. https://doi.org/10.1111/biom.13330
  6. Chen C, Shen B, Zhang L, Xue Y, and Wang M (2019). Empirical-likelihood-based criteria for model selection on marginal analysis of longitudinal data with dropout missingness, Biometrics, 75, 950-965. https://doi.org/10.1111/biom.13060
  7. Chen C, Wang M, Wu R, and Li R (2020). A robust consistent information criterion for model selection based on empirical likelihood, Statistica Sinica, 32, 1205-1223.
  8. Chen J and Lazar NA (2012). Selection of working correlation structure in generalized estimating equations via empirical likelihood, Journal of Computational and Graphical Statistics, 21, 18-41. https://doi.org/10.1198/jcgs.2011.09128
  9. Chen J, Variyath AM, and Abraham B (2008). Adjusted empirical likelihood and its properties, Journal of Computational and Graphical Statistics, 17, 426-443. https://doi.org/10.1198/106186008X321068
  10. Gibbons RD and Hedeker D (1994). Application of random-effects probit regression models. Journal of Consulting and Clinical Psychology, 62, 285-296. https://doi.org/10.1037/0022-006X.62.2.285
  11. Gosho M (2016). Model selection in the weighted generalized estimating equations for longitudinal data with dropout, Biometrical Journal, 58, 570-587. https://doi.org/10.1002/bimj.201400045
  12. Hickey GL, Philipson P, Jorgensen A, and Kolamunnage-Dona R (2016). Joint modelling of time-to-event and multivariate longitudinal outcomes: Recent developments and issues, BMC Medical Research Methodology, 16, 1-15. https://doi.org/10.1186/s12874-015-0105-z
  13. Kolaczyk ED (1995). An information criterion for empirical likelihood with general estimating equations, Department of Statistics, University of Chicago.
  14. Konishi S and Kitagawa G (1996). Generalised information criteria in model selection, Biometrika, 83, 875-890. https://doi.org/10.1093/biomet/83.4.875
  15. Liang K-Y and Zeger SL (1986). Longitudinal data analysis using generalized linear models, Biometri ka, 73, 13-22. https://doi.org/10.1093/biomet/73.1.13
  16. Nelder J and Wedderburn R (1972). Generalized linear models, Journal of the Royal Statistical Society. Series A, 135, 370-384. https://doi.org/10.2307/2344614
  17. Owen AB (1988). Empirical likelihood ratio confidence intervals for a single functional, Biometrika, 75, 237-249. https://doi.org/10.1093/biomet/75.2.237
  18. Owen AB (2001). Empirical Likelihood (2nd ed), CRC Press, London.
  19. Pan W (2001). Akaike's information criterion in generalized estimating equations, Biometrics, 57, 120-125. https://doi.org/10.1111/j.0006-341X.2001.00120.x
  20. Parsons N (2017). Repolr: An R package for fitting proportional-odds models to repeated ordinal scores, Avalible from: https://CRAN.R-project.org/package=repolr
  21. Qin J and Lawless J (1994). Empirical likelihood and general estimating equations, The Annals of Statistics, 22, 300-325. https://doi.org/10.1214/aos/1176325370
  22. Robins JM, Rotnitzky A, and Zhao LP (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data, Journal of the American Statistical Association, 90, 106-121. https://doi.org/10.1080/01621459.1995.10476493
  23. Schwarz G (1978). Estimating the dimension of a model, The Annals of Statistics, 6, 461-464. https://doi.org/10.1214/aos/1176344136
  24. Shao J (1997). An asymptotic theory for linear model selection, Statistica Sinica, 7, 221-264.
  25. Shen B, Chen C, Chinchilli VM, Ghahramani N, Zhang L, and Wang M (2022). Semipara metric marginal methods for clustered data adjusting for informative cluster size with nonignorable zeros, Biometrical Journal, 64, 898-911. https://doi.org/10.1002/bimj.202100161
  26. Shen CW and Chen YH (2012). Model selection for generalized estimating equations accommodating dropout missingness, Biometrics, 68, 1046-1054. https://doi.org/10.1111/j.1541-0420.2012.01758.x
  27. Shen CW and Chen YH (2018). Joint model selection of marginal mean regression and correlation structure for longitudinal data with missing outcome and covariates, Biometrical Journal, 60, 20-33. https://doi.org/10.1002/bimj.201600195
  28. Variyath AM, Chen J, and Abraham B (2010). Empirical likelihood based variable selection, Journal of Statistical Planning and Inference, 140, 971-981. https://doi.org/10.1016/j.jspi.2009.09.025
  29. Xu C, Chinchilli VM, and Wang M (2018). Joint modeling of recurrent events and a terminal event adjusted for zero inflation and a matched design, Statistics in Medicine, 37, 2771-2786. https://doi.org/10.1002/sim.7682
  30. Xu C, Li Z, Xue Y, Zhang L, and Wang M (2019). An r package for model fitting, model selection and the simulation for longitudinal data with dropout missingness, Communications in Statistics Simulation and Computation, 48, 2812-2829. https://doi.org/10.1080/03610918.2018.1468457