DOI QR코드

DOI QR Code

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs

비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델

  • Won, Ha-Ram (Graduate School of Business IT, Kookmin University) ;
  • Shim, Jae-Seung (Graduate School of Business IT, Kookmin University) ;
  • Ahn, Hyunchul (Graduate School of Business IT, Kookmin University)
  • 원하람 (국민대학교 비즈니스IT전문대학원) ;
  • 심재승 (국민대학교 비즈니스IT전문대학원) ;
  • 안현철 (국민대학교 비즈니스IT전문대학원)
  • Received : 2019.01.28
  • Accepted : 2019.03.28
  • Published : 2019.03.31

Abstract

Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.

재범예측은 70년대 이전부터 전문가들에 의해서 꾸준히 연구되어온 분야지만, 최근 재범에 의한 범죄가 꾸준히 증가하면서 재범예측의 중요성이 커지고 있다. 특히 미국과 캐나다에서 재판이나 가석방심사 시 재범 위험 평가 보고서를 결정적인 기준으로 채택하게 된 90년대를 기점으로 재범예측에 관한 연구가 활발해졌으며, 비슷한 시기에 국내에서도 재범요인에 관한 실증적인 연구가 시작되었다. 지금까지 대부분의 재범예측 연구는 재범요인 분석이나 재범예측의 정확성을 높이는 연구에 집중된 경향을 보이고 있다. 그러나 재범 예측에는 비대칭 오류 비용 구조가 있기 때문에 경우에 따라 예측 정확도를 최대화함과 동시에 예측 오분류 비용을 최소화하는 연구도 중요한 의미를 가진다. 일반적으로 재범을 저지르지 않을 사람을 재범을 저지를 것으로 오분류하는 비용은 재범을 저지를 사람을 재범을 저지르지 않을 것으로 오분류하는 비용보다 낮다. 전자는 추가적인 감시 비용만 증가되는 반면, 후자는 범죄 발생에 따른 막대한 사회적, 경제적 비용을 야기하기 때문이다. 이러한 비대칭비용에 따른 비용 경제성을 반영하여, 본 연구에서 비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측모델을 제안한다. 모델의 첫 단계에서 최근 데이터 마이닝 분야에서 높은 성능으로 각광받고 있는 앙상블 기법, XGBoost를 적용하였고, XGBoost의 결과를 로지스틱 회귀 분석(Logistic Regression Analysis), 의사결정나무(Decision Trees), 인공신경망(Artificial Neural Networks), 서포트 벡터 머신(Support Vector Machine)과 같은 다양한 예측 기법과 비교하였다. 다음 단계에서 임계치의 최적화를 통해 FNE(False Negative Error)와 FPE(False Positive Error)의 가중 평균인 전체 오분류 비용을 최소화한다. 이후 모델의 유용성을 검증하기 위해 모델을 실제 재범예측 데이터셋에 적용하여 XGBoost 모델이 다른 비교 모델 보다 우수한 예측 정확도를 보일 뿐 아니라 오분류 비용도 가장 효과적으로 낮춘다는 점을 확인하였다.

Keywords

JJSHBB_2019_v25n1_127_f0001.png 이미지

Flow Chart of the Research Model

JJSHBB_2019_v25n1_127_f0002.png 이미지

Comparison of Total Social Cost using Fixed and Optimized Threshold

Candidate Independent Variables

JJSHBB_2019_v25n1_127_t0001.png 이미지

Selected Independent Variables Applied to the Model

JJSHBB_2019_v25n1_127_t0002.png 이미지

Experimental Results for each Classification Methods

JJSHBB_2019_v25n1_127_t0003.png 이미지

Two-Sample Test for Proportions (Z-values)

JJSHBB_2019_v25n1_127_t0004.png 이미지

Comparison of Results of Fixed and Optimized Classification Threshold

JJSHBB_2019_v25n1_127_t0005.png 이미지

References

  1. Breiman, L., "Bagging Predictors," Machine Learning, Vol.24, No.2(1996), 123-140. https://doi.org/10.1023/A:1018054314350
  2. Chen, T., and C, Guestrin, "Xgboost: A scalable tree boosting system," Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, (2016).
  3. Joo, D., Hong, T., and I. Han, "The neural network models for IDS based on the asymmetric costs of false negative errors and false positive errors," Expert Systems with Applications, Vol.25(2003), 69-75. https://doi.org/10.1016/S0957-4174(03)00007-1
  4. Jung, S., "A Study on the Use of Big data in Criminal Law," Journal of Public Policy Studies, Vol.29, No. 2(2012), 161-184.
  5. King, R. S., and B. Elderbroom, Improving recidivism as a performance measure, Washington, DC: Urban Institute, 2014.
  6. Lee, H.-U., and H. Ahn, "An intelligent intrusion detection model based on support vector machines and the classification threshold optimization for considering the asymmetric error cost," Journal of Intelligence and Information Systems, Vol.17, No.4(2011), 157-173. https://doi.org/10.13088/JIIS.2011.17.4.157
  7. Nam, S., and S. Park, "Study on recidivism factors of prisoners," Corrections Review, Vol.50 (2011), 115-139.
  8. New York Times, Recidivism's high cost and a way to cut it, 2011, Available at https://www.nytimes.com/2011/04/28/opinion/28thu3.html (Accessed 21 January 2019).
  9. Prison Education News, The Cost of Recidivism: Victims, the Economy, and American Prisons, 2014, Available at https://prisoneducation.com/prison-education-news/the-cost-of-recidivism-victims-the-economy-and-american-pris-html (Accessed 21 January, 2019).
  10. Schmidt, P., and A. D. Witte, "Predicting criminal recidivism using 'Split Population' survival time models", Journal of Econometrics, Vol.40, No.1(1989) 141-159. https://doi.org/10.1016/0304-4076(89)90034-1
  11. Seong, H. G., "Methods and tasks in the prediction of criminal recidivism," Proceeding of the 2006 Annual Conference of Korean Psychological Association, (2006), 404-405.
  12. Sharkey A.J., Combining Artificial Neural Nets: ensemble and modular multi-net systems, (Ed.), Springer Science & Business Media, 2012.
  13. Turgut O., "Predicting recidivism through machine learning," Ph.D. dissertation, University of Texas at Dallas, 2017.