Comparison of Crime Forecasting Models based on Spatio-Temporal Data and Machine Learning

Kim, Dongyoung;Jung, Sungwon;

doi:10.5659/JAIK.2021.37.1.135

대한건축학회논문집 (Journal of the Architectural Institute of Korea)

제37권1호
/
Pages.135-143
/
2021
/
2733-6239(pISSN)
/
2733-6247(eISSN)

대한건축학회 (Architectural Institute of Korea)

DOI QR Code

시·공간 데이터를 활용한 머신러닝 기반 범죄예측모형 비교

Comparison of Crime Forecasting Models based on Spatio-Temporal Data and Machine Learning

김동영 (세종대 일반대학원 건축학과) ;
정성원 (세종대 건축학과)

Kim, Dongyoung ;
Jung, Sungwon (Department of Architecture, Sejong University)

투고 : 2020.01.10
심사 : 2020.12.28
발행 : 2021.01.30

https://doi.org/10.5659/JAIK.2021.37.1.135 인용 KSCI

⟨ 이전 논문 다음 논문 ⟩

초록

With the advancement in computer performance and data analysis techniques, research using big data and machine learning is actively underway in various fields. However, regarding the domestic crime prediction research using machine learning, the current related studies are insufficient because disclosure of crime data is restricted and most of these studies predicted crimes by using a wide range of analysis units or by focusing on a few variables. To effectively distribute police power through practical crime prediction, it is necessary to predict the time and place of the crime. Therefore, in this study, we train machine learning models with 9,413 instances of actual theft crime data having temporal-spatial elements such as crime time and date, buildings, land-use, and CCTV. Thereby, we intend to provide a basis for future research and assist crime prevention activities practically by comparing the results of the prediction models. In this study, we divided the target land into 100 m square grids by using GIS and then inserted crime and temporal-spatial related variables. Subsequently, we trained the typical machine learning models such as random forest, decision trees, SVC, and K-NN, conducted crime prediction, and compared the results of the models. In the case of crime data, generally, an excessive amount of unbalanced data is present for the places where crimes did not occur compared to places where crimes occurred. Unbalanced data can result in noise and cause inaccurate predictions-these issues must be addressed. Therefore, in this study, we proposed a resampling method as an alternative to solve data imbalances and provide crime prediction with improved accuracy. The results of the comparison of the prediction performance of the models showed that the F1 score of the random forest model using the SMOTE method was high. This could be because the data loss of the SMOTE method is less than that of the under-sampling method and the random forest as an ensemble type model has an advantage in predicting data with various variables. We compared the influence of each variable by employing the feature importance function. Overall, the temporal-related variables showed high influence-among them, "crimes occurred within one month" showed the highest influence. Among the physical environment-related variables, "first neighborhood living facility," "retail store," and "detached house" were found to have high influence.

키워드

참고문헌

Arietta, S. M., Efros, A. A., Ramamoorthi, R., & Agrawala, M. (2014). City forensics: Using visual elements to predict non-visual city attributes. IEEE transactions on visualization and computer graphics, 20(12), 2624-2633. https://doi.org/10.1109/TVCG.2014.2346446
Bachner, J. (2013). Predictive policing: preventing crime with data and analytic. Washington DC, IBM Center for the Business of Government, 26.
Bang, S.H., & Cho, H.B. (2016). Learning Method for Realtime Crime Prediction Model Utilizing CCTV. Journal of the Korea Society of Computer and Information, 21(5), 91-98. https://doi.org/10.9708/jksci.2016.21.5.091
Bekkar, M., Djemaa, H. K., & Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl, 3(10).
Bernasco, W. (2008). Them Again? European Journal of Criminology, 5(4), 411-431. https://doi.org/10.1177/1477370808095124
Brantingham, P. L., & Brantingham, P. J. (1993). Environment, routine and situation: Toward a pattern theory of crime. Advances in criminological theory, 5(2), 259-294.
Chawla, N. V. (2009). In Data mining and knowledge discovery handbook: Data mining for imbalanced datasets: An overview. Manhattan, Springer, 855.
Cohen, L. E., & Felson, M. (1979). Social change and crime rate trends: A routine activity approach. American sociological review, 588-608.
Groff, E. R., Weisburd, D., & Yang, S.M. (2010). Is it Important to Examine Crime Trends at a Local "Micro" Level?: A Longitudinal Analysis of Street to Street Variability in Crime Trajectories. Journal of Quantitative Criminology, 26(1), 7-32. https://doi.org/10.1007/s10940-009-9081-y
Johnson, S. D., Bernasco, W., Bowers, K. J., Elffers, H., Ratcliffe, J., Rengert, G., & Townsley, M. (2007). Space?Time Patterns of Risk: A Cross National Assessment of Residential Burglary Victimization. Journal of Quantitative Criminology, 23(3), 201-219. https://doi.org/10.1007/s10940-007-9025-3
Kang, S.H., Yang, J.k., Han, B., & Lee, D.H. (2017). Development of Crime Prediction Model : A Case Study in Anyang City. Journal of the Korean Operations Research and Management Science Society, 42(4), 135-146. https://doi.org/10.7737/JKORMS.2017.42.4.135
Kim, D., Kou, H., & Kim, H.J. (2018). Analysis of an SVM with Resampling Techniques for Anomaly Detection in a DSMS Environment. KIISE Transactions on Computing Practices, 24(9), 442-455. https://doi.org/10.5626/ktcp.2018.24.9.442
Kim, H. J., & Lee, S. W. (2011). Determinants of 5 Major Crimes in Seoul Metropolitan Area: Application of Mixed GWR Model. Seoul Studies, 12(4), 137155.
Kim, M., & Lee, J. (2015). A Data Transformation Method for Visualizing the Statistical Information based on the Grid. Journal of Korea Spatial Information Society, 23(5), 31-40. https://doi.org/10.12672/ksis.2015.23.5.031
Lin, Y. L., Yen, M. F., & Yu, L. C. (2018). Grid-based crime prediction using geographical features. ISPRS International Journal of Geo-Information, 7(8), 298. https://doi.org/10.3390/ijgi7080298
Luque, A., Carrasco, A., Martin, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216-231. https://doi.org/10.1016/j.patcog.2019.02.023
Mani, I., & Zhang, I. (2003). kNN approach to unbalanced data distributions: a case study involving information extraction. Paper presented at the Proceedings of workshop on learning from imbalanced datasets.
Park, J.M., Chung, Y.S., & Park, k.R. (2015). A study to Predictive modeling of crime using Web traffic information. Journal of the Korea Society of Computer and Information, 20(1), 93-101. https://doi.org/10.9708/jksci.2015.20.1.093
Park, J., Chae, M., & Jung, S. (2016). Classification Model of Types of Crime based on RandomForest Algorithms and Monitoring Interface Design Factors for Realtime Crime Prediction. KIISE Transactions on Computing Practices, 22(9), 455-460. https://doi.org/10.5626/KTCP.2016.22.9.455
Roh, S. (2015). Testing the Predictability of Crime Forecasting Models Using SpatioTemporal Analysis and Risk Terrain Modeling. KOREAN CRIMINOLOGICAL REVIEW, 26(3), 239-266.
Porzi, L., Rota Bulo, S., Lepri, B., & Ricci, E. (2015, October). Predicting and understanding urban perception with convolutional neural networks. In Proceedings of the 23rd ACM international conference on Multimedia, 139-148.
Santos, M. S., Soares, J. P., Abreu, P. H., Araujo, H., & Santos, J. (2018). Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches [research frontier]. ieee ComputatioNal iNtelligeNCe magaziNe, 13(4), 59-76. https://doi.org/10.1109/mci.2018.2866730
Sherman, L. W., Gartin, P. R., & Buerger, M. E. (1989). Hotspots of predatory crime: Routine activities and the criminology of place. Criminology, 27(1), 27-56. https://doi.org/10.1111/j.1745-9125.1989.tb00862.x
Wolfgang, M. E., Figlio, R. M., & Sellin, T. (1987). Delinquency in a birth cohort. Chicago, University of Chicago Press, 89.
Yu, C. H., Ward, M. W., Morabito, M., & Ding, W. (2011). Crime forecasting using data mining techniques. In 2011 IEEE 11th international conference on data mining workshops, 779-786
Zhang, Y., & Yang, Y. (2015). Cross-validation for selecting a model selection procedure. Journal of Econometrics, 187(1), 95-112. https://doi.org/10.1016/j.jeconom.2015.02.006

대한건축학회논문집 (Journal of the Architectural Institute of Korea)

시·공간 데이터를 활용한 머신러닝 기반 범죄예측모형 비교

Comparison of Crime Forecasting Models based on Spatio-Temporal Data and Machine Learning

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)