DOI QR코드

DOI QR Code

A procedure for simultaneous variable selection, variable transformation and outlier identification in linear regression

선형회귀에서 변수선택, 변수변환과 이상치 탐지의 동시적 수행을 위한 절차

  • Seo, Han Son (Department of Applied Statistics, Konkuk University) ;
  • Yoon, Min (Department of Applied Mathematics, Pukyong National University)
  • 서한손 (건국대학교 응용통계학과) ;
  • 윤민 (부경대학교 응용수학과)
  • Received : 2019.10.02
  • Accepted : 2019.12.18
  • Published : 2020.02.29

Abstract

We propose a unified approach to variable selection, transformation and outliers in the linear model. The procedure includes a sequential method for outlier detection and a least trimmed squares estimator for variable transformation. It uses all possible subsets regressions for model selection. Some real data analyses and the simulation results are provided to show the efficiency of the methods in the context of the correct variable selection and the fitness of the estimated model.

본 연구에서는 선형회귀모형에서 이상치와 변수변환을 고려한 변수선택 알고리즘을 다룬다. 제안된 방법은 잠재적 이상치를 탐지하여 제거한 후 변수변환 추정을 위해 최소 절사 제곱 추정법을 적용하며 가능한 모든 회귀모형을 비교하여 최종적으로 변수를 선택한다. 정확한 변수 선택과 추정된 모델의 적합도의 맥락에서 방법의 효율성을 보여주기 위해 실제 데이터 분석 및 시뮬레이션 결과가 제시된다.

Keywords

References

  1. Atkinson, A. C. (1985). Plots, Transformations and Regression: An Introduction to Graphical Method of Diagnostic Regression Analysis, Oxford University Press, Oxford.
  2. Atkinson, A. C. (1986). Diagnostic tests for transformation, Technometrics, 28, 29-37. https://doi.org/10.1080/00401706.1986.10488095
  3. Atkinson, A. C. and Riani, M. (2000). Robust Diagnostic Regression Analysis, Springer, New York.
  4. Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations (with discussion), Journal of Royal Statistical Society, Series B, 26, 211-246.
  5. Brownlee, K. A. (1965). Statistical Theory and Methodology in Science and Engineering (2nd ed), Wiley, New York.
  6. Carroll, R. J. and Ruppert, D. (1988). Transformation and Weighting in Regression (2nd ed), Wiley, New York.
  7. Cheng, T. C. (2005). Robust regression diagnostics with data transformations, Computational Statistics and Data Analysis, 49, 875-891. https://doi.org/10.1016/j.csda.2004.06.010
  8. Daniel, C. and Wood, F. S. (1980). Fitting Equations to Data: Computer Analysis of Multifactor Data, John Wiley & Sons, New York.
  9. Dupuis, D. J. and Victoria-Feser, M. P. (2011). Fast robust model selection in large datasets, Journal of the American Statistical Association, 106, 203-212. https://doi.org/10.1198/jasa.2011.tm09650
  10. Dupuis, D. J. and Victoria-Feser, M. P. (2013). Robust VIF regression with application to variable selection in large data sets, Annals of Applied Statistics, 7, 319-341. https://doi.org/10.1214/12-AOAS584
  11. Gottardo, R. and Raftery, A. (2009). Bayesian robust transformation and variable selection: a unified approach, Canadian Journal of Statistics, 37, 361-380. https://doi.org/10.1002/cjs.10021
  12. Hadi, A. S. and Luceno, A. (1997). Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms, Computational Statistics and Data Analysis, 25, 251-272. https://doi.org/10.1016/S0167-9473(97)00011-X
  13. Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272. https://doi.org/10.1080/01621459.1993.10476407
  14. Hoeting, J., Raftery, A. E., and Madigan, D. (1996). A method for simultaneous variable selection and outlier identification in linear regression, Computational Statistics and Data Analysis, 22, 251-270. https://doi.org/10.1016/0167-9473(95)00053-4
  15. Kim, S., Park, S. H., and Krzanowski, W. J. (2008). Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model, Journal of Applied Statistics, 35, 283-291. https://doi.org/10.1080/02664760701833040
  16. McCann, L. and Welsch, R. E. (2007). Robust variable selection using least angle regression and elemental set sampling, Computational Statistics and Data Analysis, 52, 249-257. https://doi.org/10.1016/j.csda.2007.01.012
  17. Parker, I. (1988). Transformations and influential observations in minimum sum of absolute errors regression, Technometrics, 30, 215-220. https://doi.org/10.1080/00401706.1988.10488369
  18. Ryan, T. A., Joiner, B. L., and Ryan, B. F. (1976). Minitab Student Handbook, Duxbury Press, Mass.
  19. Sakia, R. M. (1992). The Box-Cox transformation technique: a review, The Statistician, 41, 169-178. https://doi.org/10.2307/2348250
  20. Seo, H. S. (2018). Fast robust variable selection using VIF regression in large datasets, The Korean Journal of Applied Statistics, 31, 463-473. https://doi.org/10.5351/KJAS.2018.31.4.463
  21. Seo, H. S. (2019). Unified methods for variable selection and outlier detection in linear regression, Communications for Statistical Applications and Methods, 26, 575-582. https://doi.org/10.29220/CSAM.2019.26.6.575
  22. Seo, H. S., Lee, G. Y., and Yoon, M. (2012). Robust response transformation using outlier detection in regression model, The Korean Journal of Applied Statistics, 25, 205-213. https://doi.org/10.5351/KJAS.2012.25.1.205
  23. Wisnowski, J. W., Simpson, J. R., Montgomery, D. C., and Runger, G. C. (2003). Resampling methods for variable selection in robust regression, Computational Statistics and Data Analysis, 43, 341-355. https://doi.org/10.1016/S0167-9473(02)00235-9
  24. Yeo, I. (2005). Variable selection and transformation in linear regression models, Statistics and Probability Letters, 72, 219-226. https://doi.org/10.1016/j.spl.2004.12.018
  25. Zhou, J., Foster, D. P., and Ungar, L. H. (2006). Streamwise feature selection, Journal of Machine Learning Researches, 7, 1861-1885.