DOI QR코드

DOI QR Code

Semi-Supervised Learning to Predict Default Risk for P2P Lending

준지도학습 기반의 P2P 대출 부도 위험 예측에 대한 연구

  • Kim, Hyun-jung (Department of IT Finance, College of Business Administration, Jeonju University)
  • 김현정 (전주대학교 경영대학 IT금융학과)
  • Received : 2022.01.19
  • Accepted : 2022.04.02
  • Published : 2022.04.28

Abstract

This study investigates the effect of the semi-supervised learning(SSL) method on predicting default risk of peer-to-peer(P2P) loans. Despite its proven performance, the supervised learning(SL) method requires labeled data, which may require a lot of effort and resources to collect. With the rapid growth of P2P platforms, the number of loans issued annually that have no clear final resolution is continuously increasing leading to abundance in unlabeled data. The research data of P2P loans used in this study were collected on the LendingClub platform. This is why an SSL model is needed to predict the default risk by using not only information from labeled loans(fully paid or defaulted) but also information from unlabeled loans. The results showed that in terms of default risk prediction and despite the use of a small number of labeled data, the SSL method achieved a much better default risk prediction performance than the SL method trained using a much larger set of labeled data.

본 연구는 P2P(Peer-to-Peer) 대출의 부도위험 예측을 위하여 준지도학습(SSL) 기반의 모델을 개발하고자 한다. 검증된 성능에도 불구하고 지도학습(SL) 방법은 완전 지불 또는 채무불이행과 같이 레이블이 결정된 다수의 데이터가 필요한데 충분한 수의 레이블 데이터를 수집하려면 많은 자원과 시간이 필요하다. P2P 플랫폼이 급성장하면서 대출 건수도 매해 급증하였고, 레이블이 없는 데이터도 지속적으로 증가하고 있다. 본 연구는 P2P 대출 플랫폼인 LendingClub에서 수집한 데이터를 사용하였다. P2P 대출 중 레이블이 결정된 대출에서 추출한 정보뿐만 아니라 레이블이 결정되지 않은 대출에서 추출한 정보도 사용하여 부도 위험을 예측하는 SSL 모델을 개발하여 연구를 수행한 결과, 적은 수의 레이블이 결정된 데이터를 사용함에도 불구하고 SSL 방법으로 구축된 모델이 많은 수의 레이블이 결정된 데이터를 사용하여 학습시킨 SL 방법으로 구축된 모델보다 부도 위험 예측성과가 향상되었다.

Keywords

Acknowledgement

This research was supported by the Research Grant of Jeonju University in 2021.

References

  1. M. Herzenstein, R. ANDREWS, U. Dholakia & E. Lyandres. (2008). The Democratization of Personal Consumer Loans? Determinants of Success in Online Peer-to-peer Lending Communities. University of Delaware.
  2. R. Iyer, A. Khwaja, E. Luttmer & K. Shue. (2009). Screening in New Credit Markets: Can Individual Lenders Infer Borrower Creditworthiness in Peer-to-Peer Lending? HKS Faculty Research Working Paper Series RWP09-031.
  3. T. Harris. (2015). Credit Scoring Using the Clustered Support Vector Machine. Expert Systems with Applications, 42(2), 741-750. DOI : 10.1016/j.eswa.2014.08.029.
  4. J. Galindo & P. Tamayo. (2000). Credit Risk Assessment Using Statistical and Machine Learning: Basic Methodology and Risk Modeling Applications. Computational Economics, 15, 107-143. DOI : 10.1023/A:1008699112516.
  5. S. Choi & H. Ahn. (2015). Optimized Bankruptcy Prediction through Combining SVM with Fuzzy Theory. Journal of Digital Convergence, 13(3), 155-165. DOI : 10.14400/JDC.2015.13.3.155.
  6. B. Slavin. (2007). Peer-to-Peer Lending: An Industry Insight. http://www.bradslavin.com
  7. M. Schreiner. (2000). Credit Scoring for Microfinance: Can It Work? Journal of Microfinance, 2(2), 105-118.
  8. H. Yum, B. Lee & M. Chae. (2012). From the Wisdom of Crowds to My Own Judgment in Microfinance through Online Peer-to-peer Lending Platforms. Electronic Commerce Research and Applications, 11(5), 469-483. https://doi.org/10.1016/j.elerap.2012.05.003
  9. R. Gao & J. Feng. (2014). An Overview Study on P2P Lending. International Business Management, 14-18. DOI : 10.3968/%25x.
  10. E. Lee & B. Lee. (2012). Herding Behavior in Online P2P Lending: An Empirical Investigation. Electronic Commerce Research and Applications, 11(5), 495-503. DOI : 10.1016/j.elerap.2012.02.001.
  11. M. Lin, N. Prabhala & S. Viswanathan. (2013). Judging Borrowers by the Company They Keep: Friendship Networks and Information Asymmetry in Online Peer-to-peer Lending. Management Science, 59(1), 17-35. DOI : 10.1287/mnsc.1120.1560.
  12. C. Serrano-Cinca, B. Gutierrez-Nieto & L. Lopez-Palacios. (2015). Determinants of Default in P2P Lending. Plos one. DOI : 10.1371/journal.pone.0139427.
  13. R. Emekter, Y. Tu, B. Jirasakuldech & M. Lu. (2015). Evaluating Credit Risk and Loan Performance in Online Peer-to-peer Lending. Journal of Applied Economics, 47, 54-70. DOI : 10.1080/00036846.2014.962222.
  14. G. Weiss, K. Pelger & A. Horsch. (2010). Mitigating Adverse Selection in P2P Lending: Empirical Evidence from Prosper.com. SSRN Electronic Journal. DOI:10.2139/ssrn.1650774.
  15. R. Ge, J. Feng, B. Gu & P. Zhang. (2017). Predicting and Deterring Default with Social Media Information in Peer-to-Peer Lending. Journal of Management Information Systems, 34, 401-424. DOI : 10.1080/07421222.2017.1334472.
  16. M. E. Greiner & H. Wang. (2014). Building Consumer-to-consumer Trust in E-finance Marketplaces: An Empirical Analysis. International Journal of Electronic Commerce, 15(2), 105-136. DOI : 10.2753/JEC1086-4415150204.
  17. D. Yarkowsky. (1995). Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics. DOI : 10.3115/981658.981684.
  18. B. Maeireizo, D. Litman & R. Hwa. (2004). Co-training for Predicting Emotions with Spoken Dialogue Data. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. DOI : 10.3115/1219044.1219072.
  19. E. Rilof, J. Wiebe & T. Wilson. (2003). Learning Subjective Nouns Using Extraction Pattern Bootstrapping. Proceedings of the Seventh Conference on Computational Natural Language Learning. DOI : 10.3115/1119176.1119180.
  20. C. Rosenberg, M. Hebert & H. Schneiderman. (2005). Semi-supervised Self-training of Object. 7th IEEE Workshops on Applications of Computer Vision. (pp. 29-36). DOI : 10.1109/ACVMOT.2005.107.
  21. O. Chapelle, B. Scholkopf & A. Zien. (2006). Semi-Supervised Learning. MA, USA: The MIT Press Cambridge.
  22. X. Zhu & A. Goldberg. (2009). Introduction to Semi-supervised Learning. Synthesis lectures on artificial intelligence and machine Learning, Morgan & Claypool Publishers. DOI : 10.2200/S00196ED1V01Y200906AIM006.
  23. K. Weinberger, J. Blitzer & L. Saul. (2006). Distance Metric Learning for Large Margin Nearest Neighbor Classification. Advances in Neural Information Processing Systems, 18, 473-1480.
  24. J. Suykens & J. Vandewalle. (1999). Least Squares Support Vector Machine Classifiers. Neural Processing Letters, 9(3), 293-300. DOI : 10.1023/A:1018628609742.
  25. D. Wang, F. Nie & H. Huang. (2014). Large-scale Adaptive Semi-supervised Learning via Unified. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. (pp. 482-491). DOI : 10.1145/2623330.2623731.
  26. C. Bishop. (2006). Pattern Recognition and Machine Learning. New York: Springer-Verlag.
  27. A. E. Khandani, A. J. Kim & A. W. Lo. (2010). Consumer Credit-risk Models via Machine-learning Algorithms. Journal of Banking & Finance, 34(11), 2767-2787. DOI : 10.1016/j.jbankfin.2010.06.001.
  28. J. Tanha, M. van Someren & H. Afsarmanesh. (2017). Semi-supervised Self-training for Decision Tree Classifiers. International Journal of Machine Learning and Cybernetics, 8, 355-370. DOI : 10.1007/s13042-015-0328-7.
  29. F. J. Costello & K. C. Lee. (2019). Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending. Journal of Digital Convergence, 17(9), 71-78. DOI : 10.14400/JDC.2019.17.9.071.