• Title/Summary/Keyword: pseudo nymization ratio

Search Result 1, Processing Time 0.013 seconds

Pseudonymization's effect on data quality: A study under personal information protection act (개인정보보호법에 따른 가명처리로 인한 데이터 손실이 데이터 분석의 정확도에 미치는 영향)

  • Minjeong Kim;Jae Keun Yoo
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.3
    • /
    • pp.381-393
    • /
    • 2024
  • This study investigates the impact of pseudonymization of personal information and its effect on the accuracy of data analysis. We quantitatively evaluated the relationship between the degree of pseudonymization and the accuracy of data analysis using logistic regression models, decision trees, and random forests. Through this, we confirmed that pseudonymizing sensitive information can realize personal information protection without significantly damaging data quality. However, we recognized limitations such as single sample data and consistent application of pseudonymization ratios. To overcome these limitations, additional research on diverse datasets is necessary to strengthen the generalizability of results. Moreover, we propose developing and applying methodologies to find optimal pseudonymization ratios for individual variables. The results from this study provide new insights into maintaining usability of data while achieving regulatory compliance and personal information protection.