Development of a Privacy-Preserving Big Data Publishing System in Hadoop Distributed Computing Environments

Kim, Dae-Ho;Kim, Jong Wook;

doi:10.9717/kmms.2017.20.11.1785

Journal of Korea Multimedia Society (한국멀티미디어학회논문지)

Volume 20 Issue 11
/
Pages.1785-1792
/
2017
/
1229-7771(pISSN)
/
2384-0102(eISSN)

Korea Multimedia Society (한국멀티미디어학회)

DOI QR Code

Development of a Privacy-Preserving Big Data Publishing System in Hadoop Distributed Computing Environments

하둡 분산 환경 기반 프라이버시 보호 빅 데이터 배포 시스템 개발

Kim, Dae-Ho (Dept. of Computer Science, Sangmyung University) ;
Kim, Jong Wook (Dept. of Computer Science, Sangmyung University)

김대호 ;
김종욱

Received : 2017.09.04
Accepted : 2017.10.18
Published : 2017.11.30

https://doi.org/10.9717/kmms.2017.20.11.1785 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Generally, big data contains sensitive information about individuals, and thus directly releasing it for public use may violate existing privacy requirements. Therefore, privacy-preserving data publishing (PPDP) has been actively researched to share big data containing personal information for public use, while protecting the privacy of individuals with minimal data modification. Recently, with increasing demand for big data sharing in various area, there is also a growing interest in the development of software which supports a privacy-preserving data publishing. Thus, in this paper, we develops the system which aims to effectively and efficiently support privacy-preserving data publishing. In particular, the system developed in this paper enables data owners to select the appropriate anonymization level by providing them the information loss matrix. Furthermore, the developed system is able to achieve a high performance in data anonymization by using distributed Hadoop clusters.

Keywords

References

J. Kim, K. Jung, H. Lee, S. Kim, J. Kim, and Y. Chung, "Models for Privacy-preserving Data Publishing: A Survey," Journal of Korean Institute of Information Scientists and Engineers, Vol. 44, No. 2, pp. 195-207, 2017.
B.C.M. Fung, K. Wang, R. Chen, and P.S. Yu, “Privacy-preserving Data Publishing: A Survey of Recent Developments,” Association for Computing Machinery Computing Surveys, Vol. 42, No. 4, pp. 14-53, 2010.
N. Mohammed, B.C.M. Fung, P.C.K. Hung, and C.K. Lee, “Centralized and Distributed Anonymization for High-dimensional Healthcare Data,” Association for Computing Machinery Transactions on Knowledge Discovery from Data, Vol. 4, No. 4, pp. 18-33, 2010.
L. Sweeney, "K-anonymity: A Model for Protecting Privacy," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 10, Issue 05, pp. 557-570, 2002. https://doi.org/10.1142/S0218488502001648
K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full Domain K-anonymity," Proceedings of the Association for Computting Machinery Special Interest Group on Management of Data International Conference on Management of Data, pp. 49-60, 2005.
J. Byun, A. Kamra, E. Bertino, and N. Li, "Efficient K-Anonymization Using Clustering Technique," Proceeding of International Conference on Database Systems for Advanced Applications 2007: Advances in Databases: Concepts, Systems and Applications, pp. 188-200, 2007.
K. Wang, P.S. Yu, and S. Chakraborty, "Bottom-up Generalization: A Data Mining Solution to Privacy Protection," Proceedings of the IEEE International Conference on Data Mining, pp. 249-256, 2004.
B.C.M. Fung, K. Wang, and P.S. Yu, "Top-down Specialization for Information and Privacy Preservation," Proceedings of the IEEE International Conference on Data Engineering, pp. 205-216, 2005.
K. LeFevre, D.J. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional K-anonymity," Proceedings of the IEEE International Conference on Data Engineering, pp. 25-35, 2006.
G. Aggarwal, R. Panigrahy, T. Feder, D. Thomas, K. Kenthapadi, S. Khuller, et al., "Achieving Anonymity Via Clustering," Association for Computing Machinery Transactions on Algorithms, Vol. 6, No. 3 pp. 49-19, 2010.
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “L-diversity: Privacy Beyond K-anonymity,” Association for Computing Machinery Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, pp. 3-52, 2007. https://doi.org/10.1145/1217299.1217302
N. Li, T. Li, and S. Venkatasubramanian, "T-closeness: Privacy Beyond K-anonymity and L-diversity," Proceedings of the International Conference on Data Engineering, pp. 106-115, 2007.
S. Kim, H. Lee, and Y.D. Chung, "Privacy-preserving Data Cub for Electronic Medical Records: An Experimental Evaluation," International Journal of Medical Informatics, Vol 97, pp. 33-42, 2017. https://doi.org/10.1016/j.ijmedinf.2016.09.008
D.H. Kim and J.W. Kim, "A Study on Performing Join Queries over K-anonymous Tables," Journal of The Korea Society of Computer and Information, Vol. 22, No. 7, pp. 55-62, 2017. https://doi.org/10.9708/JKSCI.2017.22.07.055
Apache Hadoop, http://hadoop.apache.org (accessed Sep., 1, 2017).
Apache Spark, https://spark.apache.org (accessed Sep., 1, 2017).
C. Dai, G. Ghinita, E. Bertino1, J.W. Byun, and N. Li, "TIAMAT: A Tool for Interactive Analysis of Microdata Anonymization Techniques," Proceedings of the International Conference on Very Large Databases, pp. 1618-1621, 2009.
J.W. Kim, “Data Partitioning on MapReduce by Leveraging Data Utility,” Journal of Korea Multimedia Society, Vol. 16, No. 5, pp. 657-666, 2013. https://doi.org/10.9717/kmms.2013.16.5.657

Journal of Korea Multimedia Society (한국멀티미디어학회논문지)

Development of a Privacy-Preserving Big Data Publishing System in Hadoop Distributed Computing Environments

하둡 분산 환경 기반 프라이버시 보호 빅 데이터 배포 시스템 개발

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)