DOI QR코드

DOI QR Code

시·공간 데이터를 활용한 머신러닝 기반 범죄예측모형 비교

Comparison of Crime Forecasting Models based on Spatio-Temporal Data and Machine Learning

  • 투고 : 2020.01.10
  • 심사 : 2020.12.28
  • 발행 : 2021.01.30

초록

With the advancement in computer performance and data analysis techniques, research using big data and machine learning is actively underway in various fields. However, regarding the domestic crime prediction research using machine learning, the current related studies are insufficient because disclosure of crime data is restricted and most of these studies predicted crimes by using a wide range of analysis units or by focusing on a few variables. To effectively distribute police power through practical crime prediction, it is necessary to predict the time and place of the crime. Therefore, in this study, we train machine learning models with 9,413 instances of actual theft crime data having temporal-spatial elements such as crime time and date, buildings, land-use, and CCTV. Thereby, we intend to provide a basis for future research and assist crime prevention activities practically by comparing the results of the prediction models. In this study, we divided the target land into 100 m square grids by using GIS and then inserted crime and temporal-spatial related variables. Subsequently, we trained the typical machine learning models such as random forest, decision trees, SVC, and K-NN, conducted crime prediction, and compared the results of the models. In the case of crime data, generally, an excessive amount of unbalanced data is present for the places where crimes did not occur compared to places where crimes occurred. Unbalanced data can result in noise and cause inaccurate predictions-these issues must be addressed. Therefore, in this study, we proposed a resampling method as an alternative to solve data imbalances and provide crime prediction with improved accuracy. The results of the comparison of the prediction performance of the models showed that the F1 score of the random forest model using the SMOTE method was high. This could be because the data loss of the SMOTE method is less than that of the under-sampling method and the random forest as an ensemble type model has an advantage in predicting data with various variables. We compared the influence of each variable by employing the feature importance function. Overall, the temporal-related variables showed high influence-among them, "crimes occurred within one month" showed the highest influence. Among the physical environment-related variables, "first neighborhood living facility," "retail store," and "detached house" were found to have high influence.

키워드

참고문헌

  1. Arietta, S. M., Efros, A. A., Ramamoorthi, R., & Agrawala, M. (2014). City forensics: Using visual elements to predict non-visual city attributes. IEEE transactions on visualization and computer graphics, 20(12), 2624-2633. https://doi.org/10.1109/TVCG.2014.2346446
  2. Bachner, J. (2013). Predictive policing: preventing crime with data and analytic. Washington DC, IBM Center for the Business of Government, 26.
  3. Bang, S.H., & Cho, H.B. (2016). Learning Method for Realtime Crime Prediction Model Utilizing CCTV. Journal of the Korea Society of Computer and Information, 21(5), 91-98. https://doi.org/10.9708/jksci.2016.21.5.091
  4. Bekkar, M., Djemaa, H. K., & Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl, 3(10).
  5. Bernasco, W. (2008). Them Again? European Journal of Criminology, 5(4), 411-431. https://doi.org/10.1177/1477370808095124
  6. Brantingham, P. L., & Brantingham, P. J. (1993). Environment, routine and situation: Toward a pattern theory of crime. Advances in criminological theory, 5(2), 259-294.
  7. Chawla, N. V. (2009). In Data mining and knowledge discovery handbook: Data mining for imbalanced datasets: An overview. Manhattan, Springer, 855.
  8. Cohen, L. E., & Felson, M. (1979). Social change and crime rate trends: A routine activity approach. American sociological review, 588-608.
  9. Groff, E. R., Weisburd, D., & Yang, S.M. (2010). Is it Important to Examine Crime Trends at a Local "Micro" Level?: A Longitudinal Analysis of Street to Street Variability in Crime Trajectories. Journal of Quantitative Criminology, 26(1), 7-32. https://doi.org/10.1007/s10940-009-9081-y
  10. Johnson, S. D., Bernasco, W., Bowers, K. J., Elffers, H., Ratcliffe, J., Rengert, G., & Townsley, M. (2007). Space?Time Patterns of Risk: A Cross National Assessment of Residential Burglary Victimization. Journal of Quantitative Criminology, 23(3), 201-219. https://doi.org/10.1007/s10940-007-9025-3
  11. Kang, S.H., Yang, J.k., Han, B., & Lee, D.H. (2017). Development of Crime Prediction Model : A Case Study in Anyang City. Journal of the Korean Operations Research and Management Science Society, 42(4), 135-146. https://doi.org/10.7737/JKORMS.2017.42.4.135
  12. Kim, D., Kou, H., & Kim, H.J. (2018). Analysis of an SVM with Resampling Techniques for Anomaly Detection in a DSMS Environment. KIISE Transactions on Computing Practices, 24(9), 442-455. https://doi.org/10.5626/ktcp.2018.24.9.442
  13. Kim, H. J., & Lee, S. W. (2011). Determinants of 5 Major Crimes in Seoul Metropolitan Area: Application of Mixed GWR Model. Seoul Studies, 12(4), 137155.
  14. Kim, M., & Lee, J. (2015). A Data Transformation Method for Visualizing the Statistical Information based on the Grid. Journal of Korea Spatial Information Society, 23(5), 31-40. https://doi.org/10.12672/ksis.2015.23.5.031
  15. Lin, Y. L., Yen, M. F., & Yu, L. C. (2018). Grid-based crime prediction using geographical features. ISPRS International Journal of Geo-Information, 7(8), 298. https://doi.org/10.3390/ijgi7080298
  16. Luque, A., Carrasco, A., Martin, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216-231. https://doi.org/10.1016/j.patcog.2019.02.023
  17. Mani, I., & Zhang, I. (2003). kNN approach to unbalanced data distributions: a case study involving information extraction. Paper presented at the Proceedings of workshop on learning from imbalanced datasets.
  18. Park, J.M., Chung, Y.S., & Park, k.R. (2015). A study to Predictive modeling of crime using Web traffic information. Journal of the Korea Society of Computer and Information, 20(1), 93-101. https://doi.org/10.9708/jksci.2015.20.1.093
  19. Park, J., Chae, M., & Jung, S. (2016). Classification Model of Types of Crime based on RandomForest Algorithms and Monitoring Interface Design Factors for Realtime Crime Prediction. KIISE Transactions on Computing Practices, 22(9), 455-460. https://doi.org/10.5626/KTCP.2016.22.9.455
  20. Roh, S. (2015). Testing the Predictability of Crime Forecasting Models Using SpatioTemporal Analysis and Risk Terrain Modeling. KOREAN CRIMINOLOGICAL REVIEW, 26(3), 239-266.
  21. Porzi, L., Rota Bulo, S., Lepri, B., & Ricci, E. (2015, October). Predicting and understanding urban perception with convolutional neural networks. In Proceedings of the 23rd ACM international conference on Multimedia, 139-148.
  22. Santos, M. S., Soares, J. P., Abreu, P. H., Araujo, H., & Santos, J. (2018). Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches [research frontier]. ieee ComputatioNal iNtelligeNCe magaziNe, 13(4), 59-76. https://doi.org/10.1109/mci.2018.2866730
  23. Sherman, L. W., Gartin, P. R., & Buerger, M. E. (1989). Hotspots of predatory crime: Routine activities and the criminology of place. Criminology, 27(1), 27-56. https://doi.org/10.1111/j.1745-9125.1989.tb00862.x
  24. Wolfgang, M. E., Figlio, R. M., & Sellin, T. (1987). Delinquency in a birth cohort. Chicago, University of Chicago Press, 89.
  25. Yu, C. H., Ward, M. W., Morabito, M., & Ding, W. (2011). Crime forecasting using data mining techniques. In 2011 IEEE 11th international conference on data mining workshops, 779-786
  26. Zhang, Y., & Yang, Y. (2015). Cross-validation for selecting a model selection procedure. Journal of Econometrics, 187(1), 95-112. https://doi.org/10.1016/j.jeconom.2015.02.006