DOI QR코드

DOI QR Code

Privacy-Constrained Relational Data Perturbation: An Empirical Evaluation

  • Deokyeon Jang (Dept. of Computer Science & Engineering, Korea University) ;
  • Minsoo Kim (Dept. of Computer Science & Engineering, Korea University) ;
  • Yon Dohn Chung (Dept. of Computer Science & Engineering, Korea University)
  • Received : 2023.09.26
  • Accepted : 2023.11.12
  • Published : 2024.08.31

Abstract

The release of relational data containing personal sensitive information poses a significant risk of privacy breaches. To preserve privacy while publishing such data, it is important to implement techniques that ensure protection of sensitive information. One popular technique used for this purpose is data perturbation, which is popularly used for privacy-preserving data release due to its simplicity and efficiency. However, the data perturbation has some limitations that prevent its practical application. As such, it is necessary to propose alternative solutions to overcome these limitations. In this study, we propose a novel approach to preserve privacy in the release of relational data containing personal sensitive information. This approach addresses an intuitive, syntactic privacy criterion for data perturbation and two perturbation methods for relational data release. Through experiments with synthetic and real data, we evaluate the performance of our methods.

Keywords

Acknowledgement

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (No. IITP-2023-2020-0-01819, IITP-2021-0-00634), and the National Research Foundation of Korea (No. NRF-2020R1A2C2013286, NRF-2021R1A6A1A13044830).

References

  1. A. Evfimievski, J. Gehrke, and R. Srikant, "Limiting privacy breaches in privacy preserving data mining," in Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, San Diego, CA, USA, 2003, pp. 211-222. https://doi.org/10.1145/773153.773174 
  2. Y. Zhu and L. Liu, "Optimal randomization for privacy preserving data mining," in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 2004, pp. 761-766. https://doi.org/10.1145/1014052.1014153 
  3. S. Agrawal and J. R. Haritsa, "A framework for high-accuracy privacy-preserving mining," in Proceedings of the 21st International Conference on Data Engineering (ICDE), Tokyo, Japan, 2005, pp. 193-204. https://doi.org/10.1109/ICDE.2005.8 
  4. C. C. Aggarwal and P. S. Yu, Privacy-Preserving Data Mining: Models and Algorithms. New York, NY: Springer, 2008. https://doi.org/10.1007/978-0-387-70992-5 
  5. C. Li, "Optimizing linear queries under differential privacy," Ph.D. dissertation, University of Massachusetts Amherst, Amherst, MA, USA, 2013. 
  6. L. Sweeney, "k-anonymity: a model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557-570, 2002. https://doi.org/10.1142/S0218488502001648 
  7. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, "l-diversity: privacy beyond k-anonymity," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 1, no. 1, article no. 3-es, 2007. https://doi.org/10.1145/1217299.1217302 
  8. V. T. Gowda, R. Bagai, G. Spilinek, and S. Vitalapura, "Efficient near-optimal t-closeness with low information loss," in Proceedings of 2021 11th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Cracow, Poland, 2021, pp. 494-498. https://doi.org/10.1109/IDAACS53288.2021.9661004 
  9. C. Dwork, "Differential privacy," in Automata, Languages, And Programming. Heidelberg, Germany: Springer, 2006, pp. 1-12. https://doi.org/10.1007/11787006_1 
  10. J. Dong, A. Roth, and W. J. Su, "Gaussian differential privacy," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 84, no. 1, pp. 3-37, 2022. https://doi.org/10.1111/rssb.12454 
  11. T. Zhu, G. Li, W. Zhou, and S. Y. Philip, "Differentially private data publishing and analysis: a survey," IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 8, pp. 1619-1638, 2017. https://doi.org/10.1109/TKDE.2017.2697856 
  12. H. Jiang, J. Pei, D. Yu, J. Yu, B. Gong, and X. Cheng, "Applications of differential privacy in social network analysis: a survey," IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 1, pp. 108-127, 2023. https://doi.org/10.1109/TKDE.2021.3073062 
  13. J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao,"). PrivBayes: private data release via Bayesian networks," ACM Transactions on Database Systems (TODS), vol. 42, no. 4, pp. 1-41, 2017. https://doi.org/10.1145/3134428 
  14. P. H. Lu, P. C. Wang, and C. M. Yu, "Empirical evaluation on synthetic data generation with generative adversarial network," in Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, Seoul, Republic of Korea, 2019, pp. 1-6. https://doi.org/10.1145/3326467.3326474 
  15. J. Fan, T. Liu, G. Li, J. Chen, Y. Shen, and X. Du, "Relational data synthesis using generative adversarial networks: a design space exploration," 2020 [Online]. Available: https://arxiv.org/abs/2008.12763. 
  16. Financial Services Commission, "Guidelines for Financial Data Pseudonymization and Anonymization," 2022 [Online]. Available: https://www.fsec.or.kr/bbs/detail?menuNo=246&bbsNo=6484. 
  17. Korean Law Information Center, "Personal Information Protection Act," 2023 [Online]. Available: https://www.law.go.kr/LSW/lsInfoP.do?chrClsCd=010203&lsiSeq=142563&viewCls=engLsInfoR&urlMode=engLsInfoR/1000#0000. 
  18. PWS Cup 2018 [Online]. Available: https://www.iwsec.org/pws/2018/cup18.html. 
  19. M. Rahman, M. K. Paul, and A. S. Sattar, "Efficient perturbation techniques for preserving privacy of multivariate sensitive data," Array, vol. 20, article no. 100324, 2023. https://doi.org/10.1016/j.array.2023.100324 
  20. Privacy enhancing data de-identification terminology and classification of techniques, ISO/IEC 20889:2018, 2018. 
  21. C. C. Aggarwal, "On k-anonymity and the curse of dimensionality," in Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005, pp. 901-909. https://dl.acm.org/doi/10.5555/1083592.1083696 
  22. D. Wang, B. Guo, and Y. Shen, "Method for measuring the privacy level of pre-published dataset," IET Information Security, vol. 12, no. 5, pp. 425-430, 2018. https://doi.org/10.1049/iet-ifs.2017.0341 
  23. C. K. Liew, U. J. Choi, and C. J. Liew, "A data distortion by probability distribution," ACM Transactions on Database Systems (TODS), vol. 10, no. 3, pp. 395-411, 1985. https://doi.org/10.1145/3979.4017 
  24. R. Agrawal and R. Srikant, "Privacy-preserving data mining," in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 2000, pp. 439-450. https://doi.org/10.1145/342009.335438 
  25. Github, "rho_RDP," 2023 [Online]. Available: https://github.com/jXXXXDy/rho_RDP/tree/main. 
  26. F. Prasser, J. Eicher, H. Spengler, R. Bild, and K. A. Kuhn, "Flexible data anonymization using ARX: current status and challenges ahead," Software: Practice and Experience, vol. 50, no. 7, pp. 1277-1304, 2020. https://doi.org/10.1002/spe.2812 
  27. C. E. Jakob, F. Kohlmayer, T. Meurers, J. J. Vehreschild, and F. Prasser, "Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19," Scientific Data, vol. 7, article no. 435, 2020. https://doi.org/10.1038/s41597-020-00773-y 
  28. A. C. Haber, U. Sax, F. Prasser, and NFDI4Health Consortium, "Open tools for quantitative anonymization of tabular phenotype data: literature review," Briefings in Bioinformatics, vol. 23, no. 6, article no. bbac440, 2022. https://doi.org/10.1093/bib/bbac440 
  29. UCI Machine Learning Repository, "Adults dataset," 1996 [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Adult.