Geometrical description based on forward selection & backward elimination methods for regression models

다중회귀모형에서 전진선택과 후진제거의 기하학적 표현

  • Hong, Chong-Sun (Department of Statistics, Sungkyunkwan University) ;
  • Kim, Moung-Jin (Research Institute of Applied Statistics, Sungkyunkwan University)
  • 홍종선 (성균관대학교 통계학과) ;
  • 김명진 (성균관대학교 응용통계연구소)
  • Received : 2010.07.23
  • Accepted : 2010.09.23
  • Published : 2010.09.30

Abstract

A geometrical description method is proposed to represent the process of the forward selection and backward elimination methods among many variable selection methods for multiple regression models. This graphical method shows the process of the forward selection and backward elimination on the first and second quadrants, respectively, of half circle with a unit radius. At each step, the SSR is represented by the norm of vector and the extra SSR or partial determinant coefficient is represented by the angle between two vectors. Some lines are dotted when the partial F test results are statistically significant, so that statistical analysis could be explored. This geometrical description can be obtained the final regression models based on the forward selection and backward elimination methods. And the goodness-of-fit for the model could be explored.

다중회귀모형에서 변수선택법 중에서 전진선택과 후진제거의 과정을 기하학적으로 표현하는 그래픽적 방법을 제안한다. 반지름이 1인 반원의 제1사분면에는 전진선택 과정을, 제2사분면에는 후진제거 과정을 표현한다. 각 단계에서 회귀제곱합을 벡터로 표현하고, 추가제곱합 또는 부분결정계수를 벡터 사이의 각도로 나타내며 벡터의 끝을 연결할 때 통계적으로 유의하면 점선으로 표현하여 부분가설검정의 통계적 분석결과를 인지할 수 있도록 작성한다. 이 방법을 이용하면 전진선택과 후진제거 방법에 의한 최종모형을 비교 분석하고 전체적으로 모형의 적합도를 파악할 수 있다.

Keywords

References

  1. 이우리, 홍종선, 이의기 (2007). 다중회귀모형의 그래픽적 방법. <응용통계연구>, 20, 195-204.
  2. Box, G. E. P., Hunter, W. G. and Hunter, J. S. (1978). Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, John Wiley.
  3. Bryant, P. (1984). Geometry, statistics, probability: Variations on a common theme. The American Statistician, 38, 38-48. https://doi.org/10.2307/2683558
  4. Chatterjee, S., Hadi, A. S. and Price, B. (2000). Regression Analysis by Example (3rd ed.), John Wiley & Sons.
  5. Christensen, R. (2006). Comment and reply to Friedman and Wall (2005). The American Statistician, 60, 101-102. https://doi.org/10.1198/000313006X93276
  6. Draper, N. and Smith, H. (1981). Applied Regression Analysis (2nd ed.), John Wiley.
  7. Friedman, L. and Wall, M. (2005). Graphical views of suppression and multicollinearity in multiple linear regression. The American Statistician, 59, 127-136. https://doi.org/10.1198/000313005X41337
  8. Hamilton, D. (1987). Sometimes $R^{2}$ > $r_{yx1}^{2}$ + $r_{yx2}^{2}$ Correlated variables are not always redundant. The American Statistician, 41, 129-132. https://doi.org/10.2307/2684224
  9. Hamilton, D. (1988). Reply to Freund and Mitra. The American Statistician, 42, 90-91.
  10. Herr, D. G. (1980). On the history of the use of geometry in the general linear model. The American Statistician, 34, 43-47. https://doi.org/10.2307/2682995
  11. Margolis, M. S. (1979). Perpendicular projections and elementary statistics. The American Statistician, 33, 131-135. https://doi.org/10.2307/2683814
  12. Rawlings, J. O., Pantula, S. G. and Dickey, D. A. (1998). Applied Regression Analysis: A Research Tool (2nd ed.), Springer.
  13. Schey, H. M. (1993). The relationship between the magnitudes of SSR(x2) and SSR$(x2{\mid}x1)$: A geometric description. The American Statistician, 47, 26-30. https://doi.org/10.2307/2684778
  14. Sharpe, N. R. and Roberts, R. A. (1997). The relationship among sums of squares, correlation coefficients, and suppression. The American Statistician, 51, 46-48. https://doi.org/10.2307/2684693