DOI QR코드

DOI QR Code

Predicting Crime Risky Area Using Machine Learning

머신러닝기반 범죄발생 위험지역 예측

  • HEO, Sun-Young (Engineering Research Institute(ERI), Gyeongsang National University) ;
  • KIM, Ju-Young (Dong Myeong Engineering Consultants & Architecture, Urban Development, Urban Planning) ;
  • MOON, Tae-Heon (Dept. of Urban Engineering, Gyeongsang National University)
  • 허선영 (경상대학교 공학연구원) ;
  • 김주영 ((주)동명기술공단종합건축사사무소 도시사업본부 도시계획부) ;
  • 문태헌 (경상대학교 도시공학과)
  • Received : 2018.10.30
  • Accepted : 2018.11.26
  • Published : 2018.12.31

Abstract

In Korea, citizens can only know general information about crime. Thus it is difficult to know how much they are exposed to crime. If the police can predict the crime risky area, it will be possible to cope with the crime efficiently even though insufficient police and enforcement resources. However, there is no prediction system in Korea and the related researches are very much poor. From these backgrounds, the final goal of this study is to develop an automated crime prediction system. However, for the first step, we build a big data set which consists of local real crime information and urban physical or non-physical data. Then, we developed a crime prediction model through machine learning method. Finally, we assumed several possible scenarios and calculated the probability of crime and visualized the results in a map so as to increase the people's understanding. Among the factors affecting the crime occurrence revealed in previous and case studies, data was processed in the form of a big data for machine learning: real crime information, weather information (temperature, rainfall, wind speed, humidity, sunshine, insolation, snowfall, cloud cover) and local information (average building coverage, average floor area ratio, average building height, number of buildings, average appraised land value, average area of residential building, average number of ground floor). Among the supervised machine learning algorithms, the decision tree model, the random forest model, and the SVM model, which are known to be powerful and accurate in various fields were utilized to construct crime prevention model. As a result, decision tree model with the lowest RMSE was selected as an optimal prediction model. Based on this model, several scenarios were set for theft and violence cases which are the most frequent in the case city J, and the probability of crime was estimated by $250{\times}250m$ grid. As a result, we could find that the high crime risky area is occurring in three patterns in case city J. The probability of crime was divided into three classes and visualized in map by $250{\times}250m$ grid. Finally, we could develop a crime prediction model using machine learning algorithm and visualized the crime risky areas in a map which can recalculate the model and visualize the result simultaneously as time and urban conditions change.

우리나라의 시민들은 범죄에 대한 일반적인 사항만을 알 수 있을 뿐, 자신이 범죄위험에 얼마나 노출되어 있는지를 파악하기 어렵다. 경찰의 입장에서도 범죄발생 지역을 예측할 수 있다면 경찰력이 부족한 상황에서 효율성 있게 범죄에 대처 가능할 것이지만 아직 우리나라에서는 예측시스템이 없고, 관련 연구도 매우 부족한 실정이다. 이에 본 연구에서는 범죄발생 위험지역 예측 자동화 시스템 개발의 첫 번째 단계로 빅데이터로 구축 가능한 범죄정보와 도시지역 자료를 바탕으로 머신러닝 방식을 통해 한국형 범죄발생 위험지역 예측 모형을 개발하고자 한다. 또한 시나리오를 가정하여 범죄발생 확률을 지도로 시각화함으로써 사용자의 이해도를 높이도록 하였다. 선행 연구 및 사례에서 범죄발생에 영향을 미치는 요인 중 빅데이터로 구축 가능한 범죄정보, 날씨정보(기온, 강수량, 풍속, 습도, 일조, 일사, 적설, 전운량), 지역정보(평균 건폐율, 평균 용적율, 평균 높이, 총 건축물수, 평균 공시지가, 평균 주거용도면적, 평균 지상층수)를 머신러닝에 활용할 수 있도록 데이터를 사전 처리하였다. 머신러닝 알고리즘으로서 지도학습 모형 중 다양한 분야에서 활용되며 정확도가 높다고 알려진 의사결정나무모형, 랜덤포레스트모형, Support Vector Machine(SVM)모형을 활용하여 범죄 예측 모형을 구축하고 비교 분석하였다. 그 결과 평균 제곱근 오차(Root Mean Square Error, RMSE)가 낮아 예측력이 높은 의사결정나무모형을 최적모형으로 선정하였다. 이를 바탕으로 가장 빈번하게 발생하는 절도와 폭력범죄를 대상으로 시나리오를 작성하여 범죄 발생 위험지역을 예측한 결과, 사례도시 J시는 위험지역이 3가지 패턴으로 발생하는 것으로 나타났으며, 각각 발생확률을 3 등급으로 구분하여 $250{\times}250m$ 단위의 지도형태로 시각화할 수 있었다. 본 연구는 향후 자동화 시스템으로 개발하여 시시각각으로 변하는 도시 상황에 따라 실시간으로 예측 결과를 시각화하여 제공함으로써 보다 범죄로부터 안전한 도시환경 조성에 기여하고자 한다.

Keywords

References

  1. Almanie, T., R. Mirza and E. Lor. 2015. Crime prediction based on crime types and using spatial and temporal criminal hotspots. International Journal of Data Mining & Knowledge Management Process 5(4):1-19.
  2. Ahn, H.C. 2014. Optimization of multiclass support vector machine using genetic algorithm : application to the prediction of corporate credit rating. Information Systems Review 16(3):161-177. https://doi.org/10.14329/isr.2014.16.3.161
  3. Bae, S.W. and J.S. Yu. 2018. Predicting the real estate price index using machine learning methods and time series analysis model. Housing Studies 26:107-133.
  4. Brown, I. and C. Mues. 2012. An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications 39(3):3446-3453. https://doi.org/10.1016/j.eswa.2011.09.033
  5. Chaurasia, V. and S. Pal. 2013. Early prediction of heart diseases using data mining techniques. Caribbean Journal of Science and Technology 1:208-217.
  6. Cho, Y.R., Kim, Y.C. and Y.S. Shin. 2017. Prediction model of construction safety accidents using decision tree technique. Journal of the Korea Institute of Building Construction 17(3):295-303. https://doi.org/10.5345/JKIBC.2017.17.3.295
  7. Choi, H.N. and D.H. Lim. Bankruptcy prediction using ensemble SVM model. Journal of the Korean Data & Information Science Society 24(6):1113-1125. https://doi.org/10.7465/jkdi.2013.24.6.1113
  8. Choi, J.H. and D.S. Seo. 1999. Decision trees and its applications. Statistical Analysis Studies 4(1):61-83
  9. Cortes, C. and V. Vapnik. 1995. Supportvector networks. Machine learning 20(3):273-297. https://doi.org/10.1007/BF00994018
  10. Elleng G. and COHN. 1990. Weather and crime. The British Journal of Criminology 30(1):51-64. https://doi.org/10.1093/oxfordjournals.bjc.a047980
  11. Guo, F., L. Zhang., S. Jin., M. Tigabu., Z. Su and W. Wang. 2016. Modeling anthropogenic fire occurrence in the boreal forest of China using logistic regression and random forests. Forests 7(11):250. https://doi.org/10.3390/f7110250
  12. Hajek, P. and K. Michalak. 2013. Feature selection in corporate credit rating prediction. Knowledge-Based Systems 51:72-84. https://doi.org/10.1016/j.knosys.2013.07.008
  13. Heo, J.Y. and J.Y. Yang. 2015. SVM based stock price forecasting using financial statements. KIISE Transactions on Computing Practices (KTCP) 21(2):167- 172. https://doi.org/10.5626/KTCP.2015.21.3.167
  14. Heo, S.Y., J.Y. Kim and T.H. Moon. 2017. Crime incident prediction model based on Bayesian probability. Journal of the Korean Association of Geographic Information Studies 20(4):89-101. https://doi.org/10.11108/KAGIS.2017.20.4.089
  15. Horrocks, J. and A.K. Menclova. 2011. The effects of weather on crime, New Zealand Economic Papers 45(3):231-254. https://doi.org/10.1080/00779954.2011.572544
  16. Jeong, J.H., J.H. Kim., J.H. Choo., S.H. Lee. and C.T. Hyun. 2017. Common maintenance cost estimation model using random forest for multi-family housing. Journal of the Architectural Institute of Korea 33(3):19-27.
  17. Kim, S.J. and H.C. Ahn. 2016. Application of random forests to corporate credit rating prediction. Industrial Innovation Studies 32(1):187-211.
  18. Kim, S.J. and B.Y. Kim. 2013. Comparative analysis of predictors of depression for residents in a metropolitan city using logistic regression and decision making tree. Journal of The Korea Institute of Building Construction 13(12):829-839.
  19. Lee, S.K. and T.S. Shin. 2018. Development and application of prediction model of hyperlipidemia using SVM and metalearning algorithm. Journal of Intelligence and Information Systems 24(2):111-124. https://doi.org/10.13088/JIIS.2018.24.2.111
  20. Lee, S. M. 2017. Spatial analysis of flood and landslide susceptibility in Seoul using random forest and boosted tree models. Master. Thesis, Univ. of Seoul, Seoul, Korea. 78pp.
  21. Neuilly, M.A., K.M. Zgoba., G.E. Tita and S.S. Lee. 2011. Predicting recidivism in homicide offenders using classification tree analysis. Homicide studies 15(2): 154-176. https://doi.org/10.1177/1088767911406867
  22. Newburn, T. and R. Sparks.(eds.). 2004. Criminal Justice and Political Cultures: National and international dimensions of crime control. Willan Publishing. UK.
  23. Oh, B.H., K.W. Chung. and K.S. Hong. 2015. Gaze recognition system using random forests in vehicular environment based on smart-phone. The Journal of The Institute of Internet. Broadcasting and Communication 15(1):191-197. https://doi.org/10.7236/JIIBC.2015.15.1.191
  24. Oliveira, S., F. Oehler., J. San-Miguel-Ayanz., A. Camia and J.M. Pereira. 2012. Modeling spatial patterns of fire occurrence in Mediterranean Europe using Multiple Regression and Random Forest. Forest Ecology and Management 275:117-129. https://doi.org/10.1016/j.foreco.2012.03.003
  25. Park, W.K. and S.Y. Kim. 2003. A Study on TV program rating prediction : Emphasizing the comparison of prediction capability between regression model and data mining model. Advertising Research 58:61-79.
  26. Ranson, M. 2014. Crime, weather, and climate change. Journal of Environmental Economics and Management 67(3):274-302. https://doi.org/10.1016/j.jeem.2013.11.008
  27. Song, J.Y. and T.M. Song. 2018. Crime prediction using Big Data. Hwangsogeoleum academi. Seoul. 414pp.
  28. Song, Y.S., Y.C. Cho., Y.S. Seo and S.R. Ahn. 2009. Development and its application of computer program for slope hazards prediction using Decision Tree Model. Journal of The Korean Society of Civil Engineers 29(2):59-69.
  29. Wikipedia. https://www.wikipedia.org/.
  30. Yoo, B.K., K.Y. Choi and D.K. Kim. 2018. An study on shopper's retail format choice via Machine Learning Method : Based on national chain market and traditional market. The Journal of Business Education 32(1):155-174. https://doi.org/10.34274/krabe.2018.32.1.007
  31. Yoo, J.E. 2015. Random forests, an alternative data mining technique to decision tree. Journal of Educational Evaluation 28(2):427-448.

Cited by

  1. 기상 데이터와 미세먼지 데이터를 활용한 머신러닝 기반 미세먼지 예측 모형 vol.24, pp.1, 2021, https://doi.org/10.11108/kagis.2021.24.1.092