DOI QR코드

DOI QR Code

Development of a Privacy-Preserving Big Data Publishing System in Hadoop Distributed Computing Environments

하둡 분산 환경 기반 프라이버시 보호 빅 데이터 배포 시스템 개발

  • Received : 2017.09.04
  • Accepted : 2017.10.18
  • Published : 2017.11.30

Abstract

Generally, big data contains sensitive information about individuals, and thus directly releasing it for public use may violate existing privacy requirements. Therefore, privacy-preserving data publishing (PPDP) has been actively researched to share big data containing personal information for public use, while protecting the privacy of individuals with minimal data modification. Recently, with increasing demand for big data sharing in various area, there is also a growing interest in the development of software which supports a privacy-preserving data publishing. Thus, in this paper, we develops the system which aims to effectively and efficiently support privacy-preserving data publishing. In particular, the system developed in this paper enables data owners to select the appropriate anonymization level by providing them the information loss matrix. Furthermore, the developed system is able to achieve a high performance in data anonymization by using distributed Hadoop clusters.

Keywords

References

  1. J. Kim, K. Jung, H. Lee, S. Kim, J. Kim, and Y. Chung, "Models for Privacy-preserving Data Publishing: A Survey," Journal of Korean Institute of Information Scientists and Engineers, Vol. 44, No. 2, pp. 195-207, 2017.
  2. B.C.M. Fung, K. Wang, R. Chen, and P.S. Yu, “Privacy-preserving Data Publishing: A Survey of Recent Developments,” Association for Computing Machinery Computing Surveys, Vol. 42, No. 4, pp. 14-53, 2010.
  3. N. Mohammed, B.C.M. Fung, P.C.K. Hung, and C.K. Lee, “Centralized and Distributed Anonymization for High-dimensional Healthcare Data,” Association for Computing Machinery Transactions on Knowledge Discovery from Data, Vol. 4, No. 4, pp. 18-33, 2010.
  4. L. Sweeney, "K-anonymity: A Model for Protecting Privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 10, Issue 05, pp. 557-570, 2002. https://doi.org/10.1142/S0218488502001648
  5. K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full Domain K-anonymity," Proceedings of the Association for Computting Machinery Special Interest Group on Management of Data International Conference on Management of Data, pp. 49-60, 2005.
  6. J. Byun, A. Kamra, E. Bertino, and N. Li, "Efficient K-Anonymization Using Clustering Technique," Proceeding of International Conference on Database Systems for Advanced Applications 2007: Advances in Databases: Concepts, Systems and Applications, pp. 188-200, 2007.
  7. K. Wang, P.S. Yu, and S. Chakraborty, "Bottom-up Generalization: A Data Mining Solution to Privacy Protection," Proceedings of the IEEE International Conference on Data Mining, pp. 249-256, 2004.
  8. B.C.M. Fung, K. Wang, and P.S. Yu, "Top-down Specialization for Information and Privacy Preservation," Proceedings of the IEEE International Conference on Data Engineering, pp. 205-216, 2005.
  9. K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional K-anonymity," Proceedings of the IEEE International Conference on Data Engineering, pp. 25-35, 2006.
  10. G. Aggarwal, R. Panigrahy, T. Feder, D. Thomas, K. Kenthapadi, S. Khuller, et al., "Achieving Anonymity Via Clustering," Association for Computing Machinery Transactions on Algorithms, Vol. 6, No. 3 pp. 49-19, 2010.
  11. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “L-diversity: Privacy Beyond K-anonymity,” Association for Computing Machinery Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, pp. 3-52, 2007. https://doi.org/10.1145/1217299.1217302
  12. N. Li, T. Li, and S. Venkatasubramanian, "T-closeness: Privacy Beyond K-anonymity and L-diversity," Proceedings of the International Conference on Data Engineering, pp. 106-115, 2007.
  13. S. Kim, H. Lee, and Y.D. Chung, "Privacy-preserving Data Cub for Electronic Medical Records: An Experimental Evaluation," International Journal of Medical Informatics, Vol 97, pp. 33-42, 2017. https://doi.org/10.1016/j.ijmedinf.2016.09.008
  14. D.H. Kim and J.W. Kim, "A Study on Performing Join Queries over K-anonymous Tables," Journal of The Korea Society of Computer and Information, Vol. 22, No. 7, pp. 55-62, 2017. https://doi.org/10.9708/JKSCI.2017.22.07.055
  15. Apache Hadoop, http://hadoop.apache.org (accessed Sep., 1, 2017).
  16. Apache Spark, https://spark.apache.org (accessed Sep., 1, 2017).
  17. C. Dai, G. Ghinita, E. Bertino1, J.W. Byun, and N. Li, "TIAMAT: A Tool for Interactive Analysis of Microdata Anonymization Techniques," Proceedings of the International Conference on Very Large Databases, pp. 1618-1621, 2009.
  18. J.W. Kim, “Data Partitioning on MapReduce by Leveraging Data Utility,” Journal of Korea Multimedia Society, Vol. 16, No. 5, pp. 657-666, 2013. https://doi.org/10.9717/kmms.2013.16.5.657