단독주택가격 추정을 위한 기계학습 모형의 응용

Application of machine learning models for estimating house price

  • 이창로 (서울대학교 국토문제연구소) ;
  • 박기호 (서울대학교 지리학과, 국토문제연구소)
  • Lee, Chang Ro (Institute for Korean Regional Studies) ;
  • Park, Key Ho (Department of Geography, Seoul National University, Institute for Korean Regional Studies)
  • 투고 : 2016.04.18
  • 발행 : 2016.04.30

초록

수리 또는 계량적 모형을 사용하는 사회과학연구에서 분석의 초점은 종속변수와 설명변수의 관계를 밝히는 것, 즉 설명 중심의 모형(explanatory modeling)이 지금까지 주류를 이루었다. 반면 예측(prediction) 능력 제고에 초점을 맞춘 분석은 드물었다. 본 연구에서는 이론 및 가설을 검증하거나 변수 간의 관계를 밝히는 설명 중심의 모형이 아니라 신규 관찰치에 대한 예측 오차를 줄이는, 예측 중심의 비모수 모형(non-parametric model)을 검토하였다. 서울시 강남구를 사례지역으로 선정한 후, 2011년부터 2014년까지 신고된 단독주택 실거래가를 기초자료로 하여 주택가격을 추정하였다. 적용한 비모수 모형은 기계학습 분야에서 제시된 일반가산모형(generalized additive model), 랜덤 포리스트, MARS(multivariate adaptive regression splines), SVM(support vector machines) 등이며 비교적 최근에 개발된 MARS나 SVM의 예측력이 뛰어남을 확인할 수 있었다. 마지막으로 이러한 비모수 모형에 공간적 자기상관성을 추가적으로 반영한 결과, 모형의 가격 예측력이 보다 개선되었음을 알 수 있었다. 본 연구를 계기로 그간 모수 모형에 집중되었던 부동산 가격추정 방법론이 비모수 모형으로 확대 및 다양화되기를 기대한다.

In social science fields, statistical models are used almost exclusively for causal explanation, and explanatory modeling has been a mainstream until now. In contrast, predictive modeling has been rare in the fields. Hence, we focus on constructing the predictive non-parametric model, instead of the explanatory model. Gangnam-gu, Seoul was chosen as a study area and we collected single-family house sales data sold between 2011 and 2014. We applied non-parametric models proposed in machine learning area including generalized additive model(GAM), random forest, multivariate adaptive regression splines(MARS) and support vector machines(SVM). Models developed recently such as MARS and SVM were found to be superior in predictive power for house price estimation. Finally, spatial autocorrelation was accounted for in the non-parametric models additionally, and the result showed that their predictive power was enhanced further. We hope that this study will prompt methodology for property price estimation to be extended from traditional parametric models into non-parametric ones.

키워드

참고문헌

  1. 김종수.이성근, 2012, "헤도닉가격모형과 서포트 벡터 회귀분석모형을 이용한 공업용 부동산의 가격추정," 감정평가학 논집, 11(1), 71-89.
  2. 안지아.박헌수, 2005, "공간종속성을 이용한 아파트 가격의 공간효과에 관한 연구," 대한국토도시계획학회 정기학술대회, 957-965.
  3. 이창로.박기호, 2013, "인근지역 범위 설정이 공간회귀모형 적합에 미치는 영향," 대한지리학회지, 48(6), 978-993.
  4. Abbott, D., 2014, Applied Predictive Analytics: principles and techniques for the professional data analyst, Wiley, New York.
  5. Anselin, L., 1988, Spatial econometrics: methods and models, Kluwer Academic Publishers, Dordrecht.
  6. Bao, H. X. and Wan, A. T., 2004, On the use of spline smoothing in estimating hedonic housing price models: empirical evidence using Hong Kong data, Real estate economics, 32(3), 487-507. https://doi.org/10.1111/j.1080-8620.2004.00100.x
  7. Chang, C. C. and Lin, C. J., 2001, Training ${\nu}$-support vector classifiers: theory and algorithms, Neural computation, 13(9), 2119-2147. https://doi.org/10.1162/089976601750399335
  8. Cui, D. and Curry, D., 2005, Prediction in marketing using the support vector machine, Marketing Science, 24(4), 595-615. https://doi.org/10.1287/mksc.1050.0123
  9. De Andres, J., Lorca, P., de Cos Juez, F. J. and Sanchez-Lasheras, F., 2011, Bankruptcy forecasting: A hybrid approach using Fuzzy c-means clustering and Multivariate Adaptive Regression Splines (MARS), Expert Systems with Applications, 38(3), 1866-1875. https://doi.org/10.1016/j.eswa.2010.07.117
  10. Ekeland, I., 1988, Mathematics of the Unexpected, University of Chicago Press, Chicago.
  11. Fortmann-Roe, S., 2015, Consistent and Clear Reporting of Results from Diverse Modeling Techniques: The A3 Method, Journal of Statistical Software, 66(1), 1-23.
  12. Friedman, J. H., 1991, Multivariate adaptive regression splines, Annals of Statistics, 1-67.
  13. Gloudemans, R. and Almy, R., 2011, Fundamentals of Mass Appraisal, IAAO, Kansas City.
  14. Guo, L., Ma, Z. and Zhang, L., 2008, Comparison of bandwidth selection in application of geographically weighted regression: a case study, Canadian Journal of Forest Research, 38, 2526-2534. https://doi.org/10.1139/X08-091
  15. Guyon, I., Weston, J., Barnhill, S. and Vapnik, V, 2002, Gene selection for cancer classification using support vector machines, Machine learning, 46, 389-422. https://doi.org/10.1023/A:1012487302797
  16. Hastie, T., Friedman, J. and Tibshirani, R., 2009, The elements of statistical learning, Springer, New York.
  17. James, G., Witten, D., Hastie, T. and Tibshirani, R., 2013, An Introduction to Statistical Learning with Applications in R, Springer, New York.
  18. Karato, K., Movshuk, O. and Shimizu, C., 2010, Semiparametric Estimation of Time, Age and Cohort Effects in An Hedonic Model of House Prices, Faculty of Economics, University of Toyama.
  19. Kummerow M. and Galfalvy, H., 2002, Error Trade-offs in Regression Appraisal Methods. In Real Estate Valuation Theory (pp. 105-131), Kluwer Academic Publishers, Dordrecht.
  20. Lasota, T., Luczak, T. and Trawinski, B., 2011, Investigation of random subspace and random forest methods applied to property valuation data. In Computational Collective Intelligence: Technologies and Applications(pp. 142-151), Springer, Berlin and Heidelberg.
  21. Lee, T. S., Chiu, C. C., Chou, Y. C. and Lu, C. J., 2006, Mining the customer credit using classification and regression tree and multivariate adaptive regression splines, Computational Statistics and Data Analysis, 50(4), 1113-1130. https://doi.org/10.1016/j.csda.2004.11.006
  22. Maclennan, D., 1977, Some Thoughts on the Nature and Purpose of House Price Studies, Urban Studies, 14, 5-71.
  23. Mason, C. and Quigley, J. M., 1996, Non-parametric hedonic housing prices, Housing studies, 11(3), 373-385. https://doi.org/10.1080/02673039608720863
  24. Pace, R. K., 1998, Appraisal using generalized additive models, Journal of Real Estate Research, 15(1), 77-99.
  25. Shmueli, G., 2010, To Explain or to Predict? Statistical Science, 25(3), 289-310. https://doi.org/10.1214/10-STS330
  26. Shmueli, G. and Koppius, O. R., 2011, Predictive analytics in information systems research, MIS Quarterly, 35(3), 553-572. https://doi.org/10.2307/23042796
  27. Vapnik, V., 1996, The nature of statistical learning theory, Springer, New York.
  28. Weirick, W. N. and Ingram, F. J., 1990, Functional Form Choice in Applied Real Estate Analysis, Appraisal Journal (January), 57-73.