DOI QR코드

DOI QR Code

Bike Insurance Fraud Detection Model Using Balanced Randomforest Algorithm

균형 랜덤 포레스트를 이용한 이륜차 보험사기 적발 모형 개발

  • Kim, Seunghoon (Korea Research Institute for Human Settlements) ;
  • Lee, Soo Il (Division of Transportation Safety, Coupang Corportation) ;
  • Kim, Tae ho (Division of Transportation Safety Planning, Coupang Corportation)
  • Received : 2021.12.06
  • Accepted : 2022.02.20
  • Published : 2022.02.28

Abstract

Due to the COVID-19 pandemic, with increased 'untact' services and with unstable household economy, the bike insurance fraud is expected to surge. Moreover, the fraud methodology gets complicated. However, the fraud detection model for bike insurance is absent. we deal with the issue of skewed class distribution and reflect the criterion of fraud detection expert. We utilize a balanced random-forest algorithm to develop an efficient bike insurance fraud detection model. As a result, while the predictive performance of balanced random-forest model is superior than it of non-balanced model. There is no significant difference between the variables used by the experts and the confirmatory models. The important variables to detect frauds are turned out to be age and gender of driver, correspondence between insured and driver, the amount of self-repairing claim, and the amount of bodily injury liability.

COVID-19 여파로 인한 비대면 서비스와 가정 재정 불안정성의 증가로 이륜차 보험사기 발생이 예상되고 있다. 이와 함께 보험사기 수법도 갈수록 교묘해지고 있다. 하지만 비대면 배달 수요와 연관된 이륜차 교통사고와 보험사기 적발 모형 관련 연구는 매우 미흡한 실정이다. 이에 본 연구는 보험사기의 표본 편중문제를 해결하기 위해 균형 랜덤포레스트 알고리즘을 이용하고 보험사기 조사 전문가의 정성적인 판단 기준을 반영한 변수를 모델에 포함하여 적용성을 향상시키며 적발력 높은 이륜차 보험사기 모형을 개발하고자 한다. 보험사기 적발 모형 개발 결과, 기존의 비균형 랜덤 포레스트 모형에 비해 균형 랜덤 포레스트가 보험 사기혐의자를 분류하는 데 있어 통계적으로 우수한 점을 확인할 수 있었다. 특히, 총 26개의 변수를 토대로 탐색적 변수 조합을 적용한 모형의 예측 성능이 가장 높았지만 일부 변수만을 사용한 확인적 모형의 예측 성능도 크게 떨어지지 않은 와중에, 정성적인 보험사기 전문가가 선정한 변수만을 사용한 확인적 모형은 예측력이 떨어지는 것을 확인하였다. 또한, 총 26개의 변수 중 운전자 성별, 연령, 운전자 피보험자 일치 여부, 미수선 청구금액, 대인보험금 등이 중요한 변수로 확인되어 이를 활용해 이륜차 보험사기 혐의자 선별을 위한 적극적인 대처가 필요해 보인다.

Keywords

References

  1. H. W. Byun, J. Y. Son. (2020). Prevention of Insurance Fraud Utilizing Data Analysis. KIRI Report (2020.11.23.), 1-7.
  2. Abdallah, A., Maarof, M. A., & Zainal, A. (2016). Fraud detection system: A survey. Journal of Network and Computer Applications, 68, 90-113. https://doi.org/10.1016/j.jnca.2016.04.007
  3. M. J. Lee, G. Y. Gim. (2007). An Empirical Study on the Development of Behavior Model of Insurance Fraud. Journal of Information Technology Services, 6(2), 1-18.
  4. Roy, R., & George, K. T. (2017). Detecting insurance claims fraud using machine learning techniques. Proceedings of IEEE International Conference on Circuit, Power and Computing Technologies, ICCPCT 2017.
  5. Sithic, H. L., & Balasubramanian, T. (2013). Survey of Insurance Fraud Detection Using Data Mining Techniques. International Journal of Innovative Technology and Exploring Engineering, 2(3), 62-65.
  6. Wen, C.-H., Wang, M.-J., & Lan, L. W. (2005). Discrete choice modeling for bundled automobile insurance policies. Journal of the Eastern Asia Society for Transportation Studies, 6. 1914-1928.
  7. Artis, M., Ayuso, M., & Guillen, M. (2002). Detection of Automobile Insurance Fraud With Discrete Choice Models and Misclassified Claims. The Journal of Risk and Insurance, 69(3), 325-340. https://doi.org/10.1111/1539-6975.00022
  8. H. G. Jo. (1990). The Cause of Insurance Fraud And Countermeasures. Korean Journal of Insurance, 35, 75-102
  9. H. G. Jo. (2001). Countermeasures of Insurance Fraud For Nation. Journal of Insurance Studies, 12(2).
  10. T. K. Sung. (2003). Detection of Insurance Fraud using Visualization Data Mining Tool. Information System Review, 5(1), 49-60.
  11. C. Y. Kim. (1996). Case Study of the Type of Car Insurance Frauds, General Insurance Association of Korea, 328, 43-61.
  12. Y. J. Kim. (1998). Case Study of Car Insurance for Moral Hazard, General Insurance, 359, 60-71.
  13. G. Y. Gim. (1996). Developing Early Detecting Insurance Fraud System: Fuzzy Theory and AHP, Insurance Development Studies, 18, 4-28.
  14. H. S. Kim. (1999). Brief Study of The Development of Automobile insurance Fraud Early-Warning model, General Insurance, 363, 68-80.
  15. H. S. Kim. (2000). A Study on The Development of Automobile insurance Fraud Early-Warning model using Claim Adjusters' Expert knowledge. The Journal of Risk management, 16, 59-97.
  16. J. D. Kim, J. S. Park. (2006). A Fraud Detection Model for Automobile Insurance Claims. Risk Management. 17(1), 109-152.
  17. T. H. Kim, J. I. Lim. (2020). A Study on Conspired Insurance Fraud Detection Modeling Using Social Network Analysis, Journal of the Korea Society of Computer and Information, 25(3), 117-127. https://doi.org/10.9708/JKSCI.2020.25.03.117
  18. Martino Scheepens. (retrieved on 11.30.2021). Coronavirus, what have you done?. FRISS. https://www.friss.com/blog/coronavirus-what-have-you-done/
  19. Matthew J. Smith. (retrieved on 11.30.2021). Insurance Fraud Report (2020). https://knowledge.friss.com/hubfs/Ebooks/Insurance%20Fraud%20Report%202020-2021%20EN.pdf?utm_campaign=Fraud%20Survey&utm_medium=email&_hsmi=98996085&_hsenc=p2ANqtz-9b05tppFd4OvW5Pgn40Us4ktpp0dXzleaTZb8IQV2-j9muWaPkF6WLs3jg2XUdudg0gUyFbZtE6ldFqd8yLfN59MVHA&utm_content=98996085&utm_source=hs_automation
  20. Fernandez, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research, 61, 863-905. https://doi.org/10.1613/jair.1.11192
  21. Brennan, P. (2012). A comprehensive survey of methods for overcoming the class imbalance problem in fraud detection. Thesis, (June), 1-107.
  22. Subelj, L., Furlan, S., & Bajec, M. (2011). An expert system for detecting automobile insurance fraud using social network analysis. Expert Systems with Applications, 38(1), 1039-1052. https://doi.org/10.1016/j.eswa.2010.07.143
  23. Fiorentini, N., & Losa, M. (2020). Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms. Infrastructures, 5(7).
  24. Chen, C., Liaw, A., & Breiman, L. (2004). Using Random Forest to Learn Imbalanced Data. In Department of Statistics, UC berkeley.
  25. Ai, J., Golden, L. L., & Brockett, P. L. (2009). Assessing Consumer Fraud Risk in Insurance Claims. North American Actuarial Journal, 13(4), 438-458. https://doi.org/10.1080/10920277.2009.10597568
  26. Brockett, P. L., Derrig, R. a, Golden, L. L., & Alpert, M. (2002). Fraud Classification Using Principal Component Analysis of RIDITs. The Journal of Risk and Insurance, 69(3), 341-371. https://doi.org/10.1111/1539-6975.00027
  27. Viaene, S., Ayuso, M., Guillen, M., Van Gheel, D., & Dedene, G. (2007). Strategies for detecting fraudulent claims in the automobile insurance industry. European Journal of Operational Research, 176(1), 565-583. https://doi.org/10.1016/j.ejor.2005.08.005
  28. Agjee, N. H., Mutanga, O., Peerbhay, K., & Ismail, R. (2018). The impact of simulated spectral noise on random forest and oblique random forest classification performance. Journal of Spectroscopy. 2018.8