DOI QR코드

DOI QR Code

A Study on the Clustering Method of Row and Multiplex Housing in Seoul Using K-Means Clustering Algorithm and Hedonic Model

K-Means Clustering 알고리즘과 헤도닉 모형을 활용한 서울시 연립·다세대 군집분류 방법에 관한 연구

  • 권순재 (대구대학교 경영학과) ;
  • 김성현 (한국정보화진흥원 빅데이터센터) ;
  • 탁온식 (케이앤컴퍼니 데이터연구팀) ;
  • 정현희 (대구대학교 경영학과)
  • Received : 2017.07.31
  • Accepted : 2017.09.20
  • Published : 2017.09.30

Abstract

Recent centrally the downtown area, the transaction between the row housing and multiplex housing is activated and platform services such as Zigbang and Dabang are growing. The row housing and multiplex housing is a blind spot for real estate information. Because there is a social problem, due to the change in market size and information asymmetry due to changes in demand. Also, the 5 or 25 districts used by the Seoul Metropolitan Government or the Korean Appraisal Board(hereafter, KAB) were established within the administrative boundaries and used in existing real estate studies. This is not a district classification for real estate researches because it is zoned urban planning. Based on the existing study, this study found that the city needs to reset the Seoul Metropolitan Government's spatial structure in estimating future housing prices. So, This study attempted to classify the area without spatial heterogeneity by the reflected the property price characteristics of row housing and Multiplex housing. In other words, There has been a problem that an inefficient side has arisen due to the simple division by the existing administrative district. Therefore, this study aims to cluster Seoul as a new area for more efficient real estate analysis. This study was applied to the hedonic model based on the real transactions price data of row housing and multiplex housing. And the K-Means Clustering algorithm was used to cluster the spatial structure of Seoul. In this study, data onto real transactions price of the Seoul Row housing and Multiplex Housing from January 2014 to December 2016, and the official land value of 2016 was used and it provided by Ministry of Land, Infrastructure and Transport(hereafter, MOLIT). Data preprocessing was followed by the following processing procedures: Removal of underground transaction, Price standardization per area, Removal of Real transaction case(above 5 and below -5). In this study, we analyzed data from 132,707 cases to 126,759 data through data preprocessing. The data analysis tool used the R program. After data preprocessing, data model was constructed. Priority, the K-means Clustering was performed. In addition, a regression analysis was conducted using Hedonic model and it was conducted a cosine similarity analysis. Based on the constructed data model, we clustered on the basis of the longitude and latitude of Seoul and conducted comparative analysis of existing area. The results of this study indicated that the goodness of fit of the model was above 75 % and the variables used for the Hedonic model were significant. In other words, 5 or 25 districts that is the area of the existing administrative area are divided into 16 districts. So, this study derived a clustering method of row housing and multiplex housing in Seoul using K-Means Clustering algorithm and hedonic model by the reflected the property price characteristics. Moreover, they presented academic and practical implications and presented the limitations of this study and the direction of future research. Academic implication has clustered by reflecting the property price characteristics in order to improve the problems of the areas used in the Seoul Metropolitan Government, KAB, and Existing Real Estate Research. Another academic implications are that apartments were the main study of existing real estate research, and has proposed a method of classifying area in Seoul using public information(i.e., real-data of MOLIT) of government 3.0. Practical implication is that it can be used as a basic data for real estate related research on row housing and multiplex housing. Another practical implications are that is expected the activation of row housing and multiplex housing research and, that is expected to increase the accuracy of the model of the actual transaction. The future research direction of this study involves conducting various analyses to overcome the limitations of the threshold and indicates the need for deeper research.

최근 도심을 중심으로 연립 다세대의 거래가 활성화되고 직방, 다방등과 같은 플랫폼 서비스가 성장하고 있다. 연립 다세대는 수요 변화에 따른 시장 규모 확대와 함께 정보 비대칭으로 인해 사회적 문제가 발생 되는 등 부동산 정보의 사각지대이다. 또한, 서울특별시 또는 한국감정원에서 사용하는 5개 또는 25개의 권역 구분은 행정구역 내부를 중심으로 설정되었으며, 기존의 부동산 연구에서 사용되어 왔다. 이는 도시계획에 의한 권역구분이기 때문에 부동산 연구를 위한 권역 구분이 아니다. 이에 본 연구에서는 기존 연구를 토대로 향후 주택가 격추정에 있어 서울특별시의 공간구조를 재설정할 필요가 있다고 보았다. 이에 본 연구에서는 연립 다세대 실거래가 데이터를 기초로 하여 헤도닉 모형에 적용하였으며, 이를 K-Means Clustering 알고리즘을 사용해 서울특별시의 공간구조를 다시 군집하였다. 본 연구에서는 2014년 1월부터 2016년 12월까지 3년간 국토교통부의 서울시 연립 다세대 실거래가 데이터와 2016년 공시지가를 활용하였다. 실거래가 데이터에서 본 연구에서는 지하거래 제거, 면적당 가격 표준화 및 5이상 -5이하의 실거래 사례 제거와 같이 데이터 제거를 통한 데이터 전처리 작업을 수행하였다. 데이터전처리 후 고정된 초기값 설정으로 결정된 중심점이 매번 같은 결과로 나오게 K-means Clustering을 수행한 후 군집 별로 헤도닉 모형을 활용한 회귀분석을 하였으며, 코사인 유사도를 계산하여 유사성 분석을 진행하였다. 이에 본 연구의 결과는 모형 적합도가 평균 75% 이상으로, 헤도닉 모형에 사용된 변수는 유의미하였다. 즉, 기존 서울을 행정구역 25개 또는 5개의 권역으로 나뉘어 실거래가지수 등 부동산 가격 관련 통계지표를 작성하던 방식을 속성의 영향력이 유사한 영역을 묶어 16개의 구역으로 나누었다. 따라서 본 연구에서는 K-Means Clustering 알고리즘에 실거래가 데이터로 헤도닉 모형을 활용하여 연립 다세대 실거래가를 기반으로 한 군집분류방법을 도출하였다. 또한, 학문적 실무적 시사점을 제시하였고, 본 연구의 한계점과 향후 연구 방향에 대해 제시하였다.

Keywords

References

  1. Adriaans, P. and D. Zantinge, Data Mining, Addision-Wesley Harlow, 1996
  2. Arthur, D. and S. Vassilvitskii. "How Slow is the K-means Method?." Proceedings of the Twenty-Second Annual Symposium on Computational Geometry. ACM (2006), 144-153.
  3. Anderberg, M. R., Cluster Analysis for Applications. Monographs and Textbooks on Probability and 15 Mathematical Statistics., in Academic Press, Inc., New York, 1973.
  4. Berry, M. J. A and G. S. Linoff, Data Mining Techniques for Marketing, Sales and Customer Relationship Management, Third Edition, John Wiley & Sons Inc, 2011.
  5. Benfratello, L., M. Piacenza, and S. Sacchetto. "Taste or Reputation: What Drives Market Prices in the Wine Industry? Estimation of a Hedonic Model for Italian Premium Wines." Applied Economics, Vol. 41, No. 17 (2009), 2197-2209. https://doi.org/10.1080/00036840701222439
  6. Brachman, R. J., and T. Anand., The Process of Knowledge Discovery in Databases., Advances in Knowledge Discovery and Data Mining, 1996
  7. Chen, M. S., J. Han, and P. S. Yu, "Data Mining: an Overview from a Database Perspective." IEEE Transactions on Knowledge and data Engineering, Vo1. 8, No. 6 (1996), 866-883.
  8. Fayyad, U. M. "Data Mining and Knowledge Discovery: Making Sense out of Data." IEEE Expert: Intelligent Systems and Their Applications, Vol. 11, No. 5 (1996), 20-25. https://doi.org/10.1109/64.539013
  9. Hall, M., I. Witten, and E. Frank., Data Mining: Practical Machine Learning Tools and Techniques., Kaufmann, Burlington, 2011
  10. Jain, A. K., "Data Clustering: 50 years beyond K-means." Pattern Recognition Letters, Vol. 31, No. 8 (2010), 651-666. https://doi.org/10.1016/j.patrec.2009.09.011
  11. Jang, M., and C. Kang., "A Study on the Spatial Structure of Row-House and Multi-Family House and Its Policy Implications in Seoul," Journal of the Korea Real Estate Analysts Association, Vol. 24, No. 2 (2014), 87-96.
  12. Jang, N. S., S. W. Hong and J. H. Jang, Data mining, Seoul: Daechung Media, 1999
  13. Jung, U. B. and H. R. Lee, "Core Attributes Influencing the Room Rate of Deluxe Hotels in Seoul: Focused on a Hedonic Price Model", Journal of Tourism Sciences, Vol. 41, No. 3 (2017), 131-149.
  14. Kang, H. C., S. T. Han, J. H. Choi, S. G. Lee., E. S. Kim, I. H. Eom., and M. G. Kim., Data Mining Methodology., Seoul: Free Academy, 2006.
  15. Kim, B. R., Y. I. Yoon, and M. S. Chung., "A Hedonic Model Effects for Consumeroriented Retargeting Advertising Based on Internet of Things." Journal of the Korea Society of Computer and Information, Vol. 22, No. 2 (2017), 75-80. https://doi.org/10.9708/jksci.2017.22.02.075
  16. Kim, H. H., T. S. Lee., J. M. Kim., and T. H. Ahn., "Small Area Categorization by Socioeconomic Characteristics for Local Government Policy Development.", The Geographical Journal of Korea, Vol. 49, No. 2 (2015), 229-240.
  17. Kim, S. W. and K. S. Chung, "Comparative Study of the Fitness between Traditional OLS Models and Spatial Econometrics Models Using the Real Transaction Housing Price in the Busan.", Journal of the Korea Real Estate Analysts Association, Vol. 16, No. 3 (2010), 41-55.
  18. Kim, J. H., "An Analysis on the Spatio-temporal Heterogeneity of Real Transaction Price of Apartment in Seoul Using the Geostatistical Methods", Journal of the Korean Society for Geospatial Information Science, Vol. 24, No. 4 (2016), 75-81.
  19. Kim, J. I., "The Comparison of Local Housing Price Determinants by Housing Type", Housing Studies Review, Vol. 25, No. 2 (2017), 175-195.
  20. Kim, J. M., "New Optimization Algorithm for Data Clustering", Journal of Intelligence and Information Systems, Vol. 13, No. 3 (2007), 31-45.
  21. Koo, W. Y. "Understanding Data Mining and Utilizing the Mechanical Field " Magazine of the SAREK, Vol. 45, No. 1 (2016), 38-43.
  22. Kwon, J. W., and H.C. Kim. "Estimation of Housing Price Index using a Varying Parameter Model." Journal of the Korean Urban Management Association, Vol. 19, No. 1 (2006), 175-200.
  23. Lee, C., J. Lee, and S. Lim, "The Non-Apartment Rental Housing Market Analysis," Journal of the Korea Real Estate Analysts Association, Vol. 13, No. 1 (2007), 25-47.
  24. Lee, S. W., and J. Y. Kim, "Transactions Clustering based on Item Similarity", Journal of Intelligence and Information Systems, Vol. 9, No. 1 (2003), 179-193.
  25. Lee, S. W. and W. H. Lee, "Refining Initial Seeds using Max Average Distance for K-Means Clustering." Journal of Korean Society for Internet Information, Vo.12, No. 2 (2011), 103-111.
  26. Lee, S. W., "Comparison of Initial Seeds Methods for K-Means Clustering." Journal of Korean Society for Internet Information, Vol. 13, No. 6 (2012), 1-8.
  27. Lee, G., and K. Kim, "A Study on the Spatial Mismatch between the Assessed Land Value and Housing Market Price: Exploring the Scale Effect of the MAUP." Journal of the Korean Geographical Society, Vol. 48, No. 6 (2013), 879-896.
  28. Lee, Y. M, "A Review of the Hedonic Price Model." Journal of the Korea Real Estate Analysis Association, Vol. 14, No. 1 (2008), 81-87.
  29. Leonard, T., T. M. Powell-Wiley., C. Ayers., J. C. Murdoch, W. Yin, and S. L. Pruitt, "Property Values as a Measure of Neighborhoods: An Application of Hedonic Price Theory." Epidemiology, Vol. 27, No. 4 (2016), 518-524. https://doi.org/10.1097/EDE.0000000000000470
  30. Lloyd, S., "Least Squares Quantization in PCM," IEEE Transactions on Information Theory, Vol. 28, No. 2 (1982), 129-137. https://doi.org/10.1109/TIT.1982.1056489
  31. MacQueen, J., "Some Methods for Classification and Analysis of Multivariate Observations," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. No. 14 (1967), 281-297.
  32. Malpezzi, S., "Hedonic Pricing Models: a Selective and Applied Review." In: O'Sullivan, T., Gibb, K. (Eds.), Housing Economics and Public Policy., Blackwell, Oxford, UK, 2002, 67-89.
  33. Meghani, S. H., and G. J. Knafl., "Salient Concerns in Using Analgesia for Cancer Pain among Outpatients: A Cluster Analysis Study." World Journal of Clinical Oncology, Vol. 8, No. 1 (2017), 75. https://doi.org/10.5306/wjco.v8.i1.75
  34. Na, M. Y., A Technique to Extract Useful Knowledge from a Large Knowledge Database., Data Base World, 1997.
  35. Nam, J. and J. H.Kim. "An Analysis of Factor Influencing on the Choice of Housing Types and Tenure by Income Bracket in Seoul" Journal of the Korean Urban Management Association, Vol. 28, No. 2 (2015), 199-222.
  36. Pejman, A., G. N. Bidhendi, M. Ardestani, M. Saeedi, and A. Baghvand, " Fractionation of Heavy Metals in Sediments and Assessment of their Availability Risk: A Case Study in the Northwestern of Persian Gulf." Marine Pollution Bulletin, Vol. 114 No. 2 (2017), 881-887. https://doi.org/10.1016/j.marpolbul.2016.11.021
  37. Park, D. H., H. K. Kim, I. Y. Choi, and J. K. Kim, "A Literature Review and Classification of Recommender Systems on Academic Journals", Journal of Intelligence and Information Systems, Vol. 17, No. 1 (2011), 139-152.
  38. Park, W. S. and B. J. Rhlm. "A Study on the Factors Affection Apartment Price by Using Hedonic Price Model". Korea Real Estate Society, Vol. 28, No. 2 (2010). 245-271.
  39. Romesburg, C., Cluster Analysis for Researchers., North Carolina: Lulu Press. 2004.
  40. Redmond, S. J., and. H. Conor, "A Method for Initialising the K-means Clustering Algorithm Using Kd-trees,", Pattern recognition letters , Vol. 28, No. 8 (2007), 965-973. https://doi.org/10.1016/j.patrec.2007.01.001
  41. Rosen, S., "Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition.", Journal of Political Economy, Vol. 82, No. 1 (1974), 34-55. https://doi.org/10.1086/260169
  42. Ricardo, B. Y., and R. N. Berthier., Modern Information Retrieval., New York: ACM press, 1999.
  43. Ryu, K., S. Choi, and S. Lee, "Median Price Index for Single-family housing and Multi-family housing in Seoul," Journal of the Korea Real Estate Analysts Association, Vol. 18, No. 2 (2012), 57-72.
  44. Seo, S. B., and S. N. Kwak., "A Study on the Adequacy of Standard Comparison Table of Land Price by Hedonic Price Model.", Journal of Korea Planning Association, Vol. 49, No. 5 (2014), 187-204
  45. Yang, M., Y. Lee., and J. S. Song., "Application of Hedonic Price Model to Korean Antique Art Data.", Journal of Information Technology Applications & Management, Vol. 23, No. 4 (2016), 41-53. https://doi.org/10.21219/JITAM.2016.23.4.041
  46. Yeom. M. B., and K. M. Kim., " Deriving the Causes of Low Fertility and Policy Demand through Cluster Analysis."., Journal of Economy, Vol. 29, No. 1 (2011), 163-190.
  47. Yong, H. S., Y. M. Na., J. S. Park., H. W. Seung., M. S. Lee., and R. Choi., Data Mining., Seoul: Infiniti Books, 2007.
  48. Yun, H. Y., Y. S. Koo and D. R. Choi. "A Development of Ensemble Model Based on Cluster Analysis to improve PM10 Forecasting Accuracy : Focus on the Weighted Average Ensemble by Weather Cluster." Journal of Korean Society of Urban Environment, Vol. 17, No. 1 (2017), 33-42.

Cited by

  1. 미세먼지 배출원과 취약계층 분포 추정을 통한 미세먼지 저감 녹지 입지 선정 연구 - 서울시 성동구를 대상으로 - vol.24, pp.1, 2017, https://doi.org/10.13087/kosert.2021.24.1.53