DOI QR코드

DOI QR Code

A Study on a Differentially Private Model for Financial Data

금융 데이터 상에서의 차분 프라이버시 모델 정립 연구

  • Received : 2017.09.18
  • Accepted : 2017.10.24
  • Published : 2017.12.31

Abstract

Data de-identification is the one of the technique that preserves individual data privacy and provides useful information of data to the analyst. However, original de-identification techniques like k-anonymity have vulnerabilities to background knowledge attacks. On the contrary, differential privacy has a lot of researches and studies within several years because it has both strong privacy preserving and useful utility. In this paper, we analyze various models based on differential privacy and formalize a differentially private model on financial data. As a result, we can formalize a differentially private model on financial data and show that it has both security guarantees and good usefulness.

데이터 비식별화 기법은 데이터 내에 속한 개인 정보에 대한 프라이버시를 만족하면서 동시에 데이터 분석가들에게 유용한 정보를 습득할 수 있게 하는 반드시 필요한 기술 중 하나이다. 그러나 k-익명성과 같은 기존의 비식별화 기법은 공격자의 사전지식(Background knowledge)에 근본적으로 취약한 약점을 지니고 있다. 하지만 차분 프라이버시(Differential privacy)는 기존의 비식별화 기법들과는 다르게 개인 정보에 대한 강력한 안전성을 보장하는 모델로써 최근 들어 이에 대한 연구가 매우 활발히 진행 중에 있다. 본 논문은 이러한 차분 프라이버시가 적용된 기술에 대한 연구 및 분석을 통해 금융 데이터 상에서의 차분 프라이버시 모델을 정립하였으며 이러한 모델들은 금융 데이터 상에서 유용하게 사용될 수 있음을 입증하였다.

Keywords

References

  1. L.Sweeney, "k-anonymity: A model for protecting privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no.5, pp.557-570, 2002. https://doi.org/10.1142/S0218488502001648
  2. Office for government Policy Coordination, Prime Minister's Secretariat, Ministry of the Interior and Safety, Korea Communications Commission, Financial Services Commission, Ministry of Science and ICT, Ministry of Health & Welfare, "Guidelines for data de-identification - Guidance on de-identification standard, support and management system,", https://www.privacy.go.kr/inf/gdl/selectBoardArticle.do?nttId=7187&bbsId=BBSMSTR_000000000044&bbsTyCode=BBST01&bbsAttrbCode=BBSA03&authFlag=Y&pageIndex=1&searchCnd=&searchWrd=&replyLc=0&nttSj, June, 2016.
  3. J.Kim, "Presentation of data linkage case of SK Telecom: Creation and distribution demonstration of personal information de-identification data," Seminar on de-identified demonstaration for big data on the fourth industrial revolution, 2017.
  4. A.Machanavajjhala, D.Kifer, J.Gehrke and M.Venkitasubramaniam, "L-diversity: Privacy beyond k-anonymity," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 1, no. 1, Article 3, 2007.
  5. C.Wong, J.Li, W.Fu and K.Wang, "(${\alpha}$, k)-anonymity: an enhanced k-anonymity model for privacy-preserving data publishing," Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.754-759, 2006.
  6. N.Li, T.Li, and S.Venkatasubramanian, "t-closeness: Privacy beyond k-anonymity and l-diversity," Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pp. 106-115, April, 2007.
  7. N.Mohammed, R.Chen, B.Fung and P.S.Yu, "Differentially private data release for data mining," Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp.493-501, 2011.
  8. A.Narayanan and V.Shmatikov, "Robust de-anonymization of large sparse datasets," Security and Privacy, IEEE Symposium on, pp. 111-125, May, 2008.
  9. P.Ohm, "Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization," UCLA Law Review, Research Information Network, vol.57, no.6, pp-1701-1777, 2009.
  10. C.Dwork, A.Roth, "The algorithmic foundations of differential privacy," Foundations and Trends$^{(R)}$ in Theoretical Computer Science, pp.211-407, 2014.
  11. C.Dwork, F.McSherry, L.Nissim and A.Smith, "Calibrating noise to sensitivity in private data analysis, " Third Theory of Cryptography Conference(TCC), vol.3876, pp.265-284, 2006.
  12. C.Park, D.Hong, C.Seo "Differentially private data release method for general use of data," Korea Computer Congress, pp.1036-1038, 2017.
  13. Financial Security Institue, "Present condition on introduction for domestic and foreign financial machine learning techniques," http://www.fsec.or.kr/user/bbs/fsec/42/312/bbsDataView/899.do, 2017.
  14. K.Ligett, "Introduction to differential privacy, randomized response, basic properties," The 7th BIU Winter School on Cryptography, BIU, 2017.
  15. J.Wang, S.Liu and Y.Li, "A review of differential privacy in individual data release," International Journal of Distributed Sensor Networks, vol.11, no.10, 2015.
  16. F.McSherry and K.Talwar, "Mechanism design via differential privacy,", Foundations of Computer Science, pp.94-103, 2007.
  17. C. Dwork, G. N. Rothblum, and S. P. Vadhan, "Boosting and differential privacy," Foundations of Computer Science, pp 51-60. 2010.
  18. F.McSherry, "Privacy integrated queries: an extensible platform for privacy-preserving data analysis," Communications of the ACM, vol. 53, no. 9, pp. 89-97, 2010. https://doi.org/10.1145/1810891.1810916
  19. S.L.Garfinkel, "NISTIR8053: De-identification of personal information," Technical report, National Institute of Standards Technology, 2015.
  20. B.C.Fung, K.Wang, P.S.Yu, "Top-down specialization for information and privacy preservation, " Data Engineering, Proceedings 21st International Conference on IEEE, pp.205-216, 2005.
  21. J.Gardner, L.Xiong, Y.Xiao, J.Gao, A.R.Post, X.Jiang and L.Ohno-Machado, "SHARE: system design and case studies for statistical health information release," Journal of the American Medical Informatics Association, vol.20, no.1, pp.109-116, 2012.
  22. Y.Xiao, L.Xiong, C.Yuan, "Differentially private data release through multidimensional partitioning,", Secure Data Management, pp.150-168, 2010.
  23. J.L.Bentley, "Multidimensional binary search trees used for associative searching,", Communications of the ACM, vol.18, no.9, pp.509-517, 1975. https://doi.org/10.1145/361002.361007
  24. Y.Lim, "Evaluation and future challenges of de-identification techniques," Big data utilization and privacy protection: Information technology solution for object conflicts, Financial Information Society of Korea, Korea Money and Finance Association, Common policy symposium on spring, 2017.
  25. "https://onthemap.ces.census.gov/", OnTheMap.
  26. A.Machanvajjhala, D.Kifer, J.Abowd, J.Gehrke and L.Vilhuber, "Privacy: Theory meets practice on the map," Data Engineering, IEEE 24th International Conference on, pp.277-286, 2008.
  27. N.Li, W.H.Qardaji and D.Su, "Provably private data anonymization:Or, k-anonymity meets differential privacy, " CERIAS Technical Report, 2010.
  28. Z.Ji, Z.Lipton and C.Elkan, "Differential privacy and machine learning: a survey and review," arXiv preprint, 2014.
  29. J.R. Quinlan, "Induction of decision trees," Machine learning, vol.1, no.1, pp.81-106, 1986. https://doi.org/10.1007/BF00116251
  30. J.R. Quinlan, C4.5: Programs for machine learning, Elsevier, 2014.
  31. S.Fletcher, M.Z.Islam, "Decision tree classfication with differential privacy: A Survey,", arXiv preprint, 2016.
  32. S.P.Kasiviswanathan, H.K.Lee, K.Nissim, S.Raskhodnikova and A.Smith, "What can we learn privately?," SIAM Journal on Computing, vol.40, no.3, pp.793-826, 2011. https://doi.org/10.1137/090756090
  33. U.Erlingsson, V.Pihur and A.Korolova, "RAPPOR: Randomized aggregatable privacy-preserving ordinal response, " Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp.1054-1067, 2014.
  34. Google, "Chrome Privacy Whitepaper, "https://www.google.co.kr/intl/ko/chrome/browser/privacy/whitepaper.html
  35. Apple, "guides and sample code," https://developer.apple.com/library/content/releasenotes/General/WhatsNewIniOS/Articles/iOS10.html
  36. L.Fan and L.Xiong, "Differentially private anomaly detection with a case study on epidemic outbreak detection," Data Mining Workshops, IEEE 13th International Conference on, pp.833-840, 2013.
  37. J.Reed and B.C.Pierce, "Distance makes the types grow stronger: a calculus for differential privacy," ACM Sigplan Notices, vol.45, no.9, pp.157-168, 2010. https://doi.org/10.1145/1932681.1863568
  38. M.Gaboardi, A.Haeberlen, J.Hsu, A.Narayan and B.C.Pierce, "Linear dependent types for differential privacy," ACM SIGPLAN Notices, vol.48, no.1, pp.357-370, 2013.
  39. A.Friedman and A.Schuster, "Data mining with differential privacy," Proceedings of the 16th ACM SIGKDD International Conference on Konwledge Discovery and Data Mining, pp.493-502, 2010.
  40. J.Gardner and L.Xiong, "HIDE: an integrated system for health information DE-identification," Computer-Based Medical Systems, 2008.
  41. Financial Security Institute, "Survey on machine learning technologies," http://www.fsec.or.kr/user/bbs/fsec/42/312/bbsDataView/355.do?page=7&column=&search=&searchSDate=&searchEDate=&bbsDataCategory=
  42. UCI Repository, "German Credit Data, https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
  43. R.Shokri, M.Stronati, C.Song and V.Shmatikov, "Membership inference attacks against machine learning models," Security and Privacy, IEEE Symposium on, pp.3-18, 2017.
  44. M.Fredrikson, S.Jha and T.Ristenpart, "Model inversion attacks that exploit confidence information and basic countermeasures," Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp.1322-1333, 2015.