Utility of Synthetic Data in Finances: An Application of Online P2P Lending Loan Default Analysis

금융업의 합성 데이터 유용성 분석: 온라인 P2P 대출연체 분석을 중심으로

  • 송민채 (NH농협금융지주, NH금융연구소)
  • Received : 2024.06.07
  • Accepted : 2024.07.16
  • Published : 2024.08.31


In order to promote the AI applications in the financial industry, the financial sector has recently been paying attention to synthetic data technology. Synthetic data generates using a purpose-built mathematical model or algorithm, with the aim of solving a set of data science tasks. This study evaluates the utility of synthetic data by analyzing heterogeneous tabular data that is composed of discrete, categorical and continuous variables and has the feature of unbalanced data, which is commonly found in the financial sector. As a synthetic data generation technique, the TGAN and CTGAN models are applied by considering the feature of tabular data. As a result of evaluating the utility in terms of resemblance and machine learning efficiency, those of TGAN are confirmed to be high, while the quality of CTGAN are relatively poor. This is interpreted to be particularly due to the generation of categorical variables, and it suggests that how those with categorical properties especially are considered in the synthetic data generation model is a major factor in determining the utility of generation synthetic data.



