DOI QR코드

DOI QR Code

Development of Regression Models Resolving High-Dimensional Data and Multicollinearity Problem for Heavy Rain Damage Data

호우피해자료에서의 고차원 자료 및 다중공선성 문제를 해소한 회귀모형 개발

  • 김정환 (인하대학교 수자원시스템연구소) ;
  • 박지현 (인하대학교 통계학과) ;
  • 최창현 (인하대학교 토목공학과) ;
  • 김형수 (인하대학교 사회인프라공학과)
  • Received : 2018.09.04
  • Accepted : 2018.11.13
  • Published : 2018.12.01

Abstract

The learning of the linear regression model is stable on the assumption that the sample size is sufficiently larger than the number of explanatory variables and there is no serious multicollinearity between explanatory variables. In this study, we investigated the difficulty of model learning when the assumption was violated by analyzing a real heavy rain damage data and we proposed to use a principal component regression model or a ridge regression model after integrating data to overcome the difficulty. We evaluated the predictive performance of the proposed models by using the test data independent from the training data, and confirmed that the proposed methods showed better predictive performances than the linear regression model.

선형회귀모형의 학습은 일반적으로 자료의 개수가 설명변수의 개수보다 충분히 크고, 설명변수들 사이에 심각한 다중공선성이 없다는 가정 하에서 안정적으로 이루어진다. 본 연구에서는 이러한 가정이 위배되었을 경우 모형 학습의 어려움을 실제 호우피해자료를 분석함으로써 조명하였고, 이를 해결하기 위해 자료를 통합한 다음 주성분회귀모형 또는 능형회귀모형을 사용할 것을 검토하였다. 모형의 학습에 사용된 자료와 별도의 독립된 자료에서 제안된 모형들의 예측력을 평가하였고, 제안된 방법이 선형회귀모형보다 더 나은 예측력을 보이는 것을 확인하였다.

Keywords

TMHHC2_2018_v38n6_801_f0001.png 이미지

Fig. 1. Sample Sizes of Sigungu

TMHHC2_2018_v38n6_801_f0002.png 이미지

Fig. 2. Correlation Among Explanatory Variables

TMHHC2_2018_v38n6_801_f0003.png 이미지

Fig. 3. Cross-Validation Plot for 

Table 1. Principal Loadings

TMHHC2_2018_v38n6_801_t0001.png 이미지

Table 2. Predictive Performances (Unit: 1,000 KW)

TMHHC2_2018_v38n6_801_t0002.png 이미지

Table 3. Estimated Regression Coefficients

TMHHC2_2018_v38n6_801_t0003.png 이미지

References

  1. Choi, C. H., Kim, J. H., Kim, J. S., Kim, D. H., Bae, Y. H. and Kim, H. S. (2018). "Development of heavy rain damage prediction model using machine learning based on big data." Journal of Advances in Meteorology, Vol. 2018, Article ID 5024930.
  2. Choi, C. H., Kim, J. S., Kim, J. H., Kim, H. Y., Lee, W. J. and Kim, H. S. (2017a). "Development of heavy rain damage prediction function using statistical methodology." Journal of the Korean Society of Hazard Mitigation, Vol. 17, No. 3, pp. 331-338 (in Korean).
  3. Choi, C. H., Kim, J. S., Lee, M. J., Kim, J. H., Lee, W. J. and Kim, H. S. (2017b). "Development of heavy rain damage prediction function using principal component analysis and logistic regression model." Journal of the Korean Society of Hazard Mitigation, Vol. 17, No. 6, pp. 159-166 (in Korean).
  4. Choi, C. H., Park, K. H., Park, H. K., Lee, M. J., Kim, J. S. and Kim, H. S. (2017c). "Development of heavy rain damage prediction function for public facility using machin learning." Journal of the Korean Society of Hazard Mitigation, Vol. 17, No. 6, pp. 443-450 (in Korean). https://doi.org/10.9798/KOSHAM.2017.17.6.443
  5. Choo, T. H., Kwak, K. S., Ahn, S. H., Yang, D. U. and Son, J. K. (2017). "Development for the function of wind wave damage estimation at the western coastal zone based on disaster statistics." Journal of the Korea Academia-Industrial cooperation Society, Vol. 18, No. 2, pp. 14-22 (in Korean). https://doi.org/10.5762/KAIS.2017.18.2.14
  6. Furquim, G., Pessin, G., Faical, B. S., Mendiondo, E. M. and Ueyama, J. (2016). "Improving the accuracy of a flood forecasting model by means of machine learning and chaos theory." Neural computing and applications, Vol. 27, No. 5, pp. 1129-1141. https://doi.org/10.1007/s00521-015-1930-z
  7. Hoerl, A. E. and Kennard, R. W. (1970). "Ridge regression: biased estimation for nonorthogonal problems." Technometrics, Vol. 12, No. 1, pp. 55-67. https://doi.org/10.1080/00401706.1970.10488634
  8. Jeong, J. H. and Lee, S. H. (2014). "Estimating the direct economic damages from heavy snowfall in Korea." Journal of climate research, Vol. 9, No. 2, pp. 125-139. https://doi.org/10.14383/cri.2014.9.2.125
  9. Johnstone, I. M. and Titterington, D. M. (2009). "Statistical challenges of high-dimensional data." Philos Trans A Math Phys Eng Sci, Vol. 367, No. 1906, pp. 4237-4253. https://doi.org/10.1098/rsta.2009.0159
  10. Kim, J. H. Kim, T. G. and Lee, B. R. (2017a). "An analysis of typhoon damage pattern type and development of typhoon damage forecasting function." Journal of the Korean Society of Hazard Mitigation, Vol. 17, No. 2, pp. 339-347 (in Korean).
  11. Kim, J. S., Choi, C. H., Kim, D. H., Lee, M. J. and Kim, H. S. (2017b). "Development of heavy rain damage prediction function using artificial neural network and multiple regression model." Journal of the Korean Society of Hazard Mitigation, Vol. 17, No. 6, pp. 73-80 (in Korean).
  12. Kim, J. S., Choi, C. H., Lee, J. S. and Kim, H. S. (2017c). "Damage prediction using heavy rain risk assessment : (2) Development of heavy rain damage prediction function." Journal of Korean Society of Hazard Mitigation, Vol. 17, No. 2, pp. 371-379 (in Korean). https://doi.org/10.9798/KOSHAM.2017.17.2.371
  13. Kwon, S. H., Lee, J. W. and Chung, G. H. (2017). "Snow damages estimation using artificial neural network and multiple regression analysis." Journal of the Korean Society of Hazard Mitigation, Vol. 17, No. 2, pp. 315-325 (in Korean). https://doi.org/10.9798/KOSHAM.2017.17.2.315
  14. Lee, J. S., Eo, G., Choi, C. H., Jung, J. W. and Kim, H. S. (2016). "Development of rainfall-flood damage estimation function using nonlinear regression equation." Journal of the Korean Society of Disaster Information, Vol. 12, No. 1, pp. 74-88 (in Korean). https://doi.org/10.15683/KOSDI.2016.3.31.74
  15. Lee, S. I. (2012). "A study on damage scale prediction by rainfall and wind velocity with typhoon. master's thesis." Sunchon National University.
  16. Mandal, S., Saha, D. and Banerjee, T. (2005). "A neural network based prediction model for flood in a disaster management system with sensor networks." In Intelligent sensing and information processing, Proc. of 2005 international conference, pp. 78-82.
  17. Mendelsohn, R. and Saher, G. (2011) "The global impact of climate change on extreme events." World Bank.
  18. Munich, R. (2002). "Winter storms in europe: analysis of 1990 losses and future loss potentials."
  19. Oh, Y. R. and Chung, G. H. (2017). "Estimation of snow damage and proposal of snow damage threshold based on historical disaster data." Journal of the Korean Society of Civil Engineers, Vol. 37, No. 2, pp. 325-331. https://doi.org/10.12652/KSCE.2017.37.2.0325
  20. Pielke, R. A. and Downton, M. W. (2000). "Precipitation and damaging floods: trends in the united states, 1932-97." Journal of Climate, Vol. 13, No. 20, pp. 3625-3637. https://doi.org/10.1175/1520-0442(2000)013<3625:PADFTI>2.0.CO;2
  21. Zhai, A. R. and Jiang, J. H. (2014). "Dependence of US hurricane economic loss on maximum wind speed and storm size." Environmental Research Letters, Vol. 9, No. 6, pp. 1-9.