DOI QR코드

DOI QR Code

A Hybrid K-anonymity Data Relocation Technique for Privacy Preserved Data Mining in Cloud Computing

  • S.Aldeen, Yousra Abdul Alsahib (Department of Computer Science, College of Education _Ibn Rushd, Baghdad University) ;
  • Salleh, Mazleena (Department of Computer Science, Universiti Teknologi Malaysia (UTM))
  • Received : 2016.05.09
  • Accepted : 2016.06.05
  • Published : 2016.10.31

Abstract

The unprecedented power of cloud computing (CC) that enables free sharing of confidential data records for further analysis and mining has prompted various security threats. Thus, supreme cyberspace security and mitigation against adversaries attack during data mining became inevitable. So, privacy preserving data mining is emerged as a precise and efficient solution, where various algorithms are developed to anonymize the data to be mined. Despite the wide use of generalized K-anonymizing approach its protection and truthfulness potency remains limited to tiny output space with unacceptable utility loss. By combining L-diversity and (${\alpha}$,k)-anonymity, we proposed a hybrid K-anonymity data relocation algorithm to surmount such limitation. The data relocation being a tradeoff between trustfulness and utility acted as a control input parameter. The performance of each K-anonymity's iteration is measured for data relocation. Data rows are changed into small groups of indistinguishable tuples to create anonymizations of finer granularity with assured privacy standard. Experimental results demonstrated considerable utility enhancement for relatively small number of group relocations.

Keywords

References

  1. X. Dong, J. Yu, Y. Luo, Y. Chen, G. Xue, and M. Li, "Achieving an effective, scalable and privacy-preserving data sharing service in cloud computing," computers & security, pp. 151-164, 2014. http://doi.org/10.1016/j.cose.2013.12.002
  2. R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and L. Brandic, "Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility," Future Generation computer systems, vol. 25, pp. 599-616, 2009. http://doi.org/10.1016/j.future.2008.12.001
  3. W. Cohen and D. Levinthal, "Absorptive capacity: a new perspective on learning and innovation," Administrative science quarterly, pp. 128-152, 1990. http://doi.org/10.2307/2393553
  4. L. Wang, J. Zhan, W. Shi, and Y. Liang, "In cloud, can scientific communities benefit from the economies of scale?" Parallel and Distributed Systems, IEEE Transactions on. 23, no. 2, pp. 296-303, 2012. http://doi.org/10.1109/TPDS.2011.144
  5. X. Yang, L. Wang, and G. Laszewski, "Recent Research Advances in e-Science," Cluster Computing, 2009, vol. 12, no. 4, pp. 353-356. http://doi.org/10.1007/s10586-009-0104-0
  6. G. Ateniese, R. Di Pietro, L. V. Mancini, and G. Tsudik, "Scalable and efficient provable data possession," Proceedings of the 4th international conference on Security and privacy in communication netowrks. ACM, 2008. http://doi.org/10.1145/1460877.1460889
  7. D. Zissis and D. Lekkas, "Addressing cloud computing security issues," Future Generation computer systems, vol. 28, no. 3, pp. 583-592, 2012. http://doi.org/10.1016/j.future.2010.12.006
  8. P. Samarati, "Protecting respondents' identities in microdata release," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010-1027, 2001. http://doi.org/10.1109/69.971193
  9. R. C. Wong, J. Li, A. W. Fu, and K. Wang, " (${\alpha}$,k)-Anonymity : An Enhanced k -Anonymity Model for Privacy-Preserving Data Publishing," Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006. http://doi.org/10.1145/1150402.1150499
  10. S. Kumara, S. Singhb, A. Singhc, and J. Alid, "Virtualization, The Great Thing and Issues in Cloud Computing," International journal of Current Engineering and Technology, pp. 338-341, 2013. http://inpressco.com/wp-content/uploads/2013/03/Paper18 338-341.pdf
  11. M. E. Nergiz and C. Clifton, "${\delta}$-presence without complete world knowledge," IEEE Transactions on Knowledge and Data Engineering, 2010, vol. 22, no. 6, pp. 868-883. http://doi.org/10.1109/TKDE.2009.125
  12. M. E. Nergiz, M. Z. Gok, and U. Ozkanli, "Preservation of utility through hybrid k-anonymization," Trust, Privacy, and Security in Digital Business. Springer Berlin Heidelberg, pp. 97-111, 2013. http://doi.org/10.1007/978-3-642-40343-9_9
  13. C. Kim, "Performance Analysis of Top-K High Utility Pattern Mining Methods," JICS, vol. 16, no. 15, pp. 89-95, 2015. http://dx.doi.org/10.7472/jksii.2015.16.6.89
  14. K. Lefevre, "Incognito : Efficient Full-Domain K-Anonymity," Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005. http://doi.acm.org/10.1145/1066157.1066164
  15. R. J. Bayardo and R. Agrawal, "Data privacy through optimal k-anonymization," Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on. IEEE, 2005. http://doi.org/10.1109/ICDE.2005.42
  16. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, "L-Diversity," ACM Transactions on Knowledge Discovery from Data, vol. 1, no. 1, p. 3-es, 2007. http://doi.org/10.1145/1217299.1217302
  17. M. E. Nergiz, M. Atzori, and C. Clifton, "Hiding the presence of individuals from shared databases," Proceedings of the 2007 ACM SIGMOD international conference on Management of data. ACM, 2007. http://doi.org/10.1145/1247480.1247554
  18. M. E. Nergiz and C. Clifton, "Thoughts on k-anonymization," Data & Knowledge Engineering, 2007, vol. 63, no. 3, pp. 622-645. http://doi.org/10.1016/j.datak.2007.03.009
  19. G. Aggarwal, R. Panigrahy, T. Feder, D. Thomas, K. Kenthapadi, S. Khuller, and A. Zhu, "Achieving anonymity via clustering," Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2006. http://doi.org/10.1145/1798596.1798602
  20. J. L. Lin, M. C. Wei, C. W. Li, and K. C. Hsieh, "A hybrid method for k-anonymization," Asia-Pacific Services Computing Conference, 2008. APSCC'08. IEEE. IEEE, 2008. http://doi.org/10.1109/APSCC.2008.65
  21. K. Lefevre and D. J. Dewitt, "Mondrian Multidimensional K-Anonymity," Data Engineering, 2006. ICDE'06. Proceedings of the 22nd International Conference on. IEEE, 2006. http://doi.ieeecomputersociety.org/10.1109/ICDE.2006.101
  22. B. Hore, R. C. Jammalamadaka, and S. Mehrotra, "Flexible Anonymization For Privacy Preserving Data Publishing : A Systematic Search Based Approach," SDM, 2007. http://dx.doi.org/10.1137/1.9781611972771.51
  23. G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, "Fast data anonymization with low information loss," Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment, 2007. Retrieved from http://dl.acm.org/citation.cfm?id=1325938\nhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.3217
  24. X. Zhang, C. Liu, S. Nepal, C. Yang, W. Dou, and J. Chen, "A hybrid approach for scalable sub-tree anonymization over big data using MapReduce on cloud," Journal of Computer and System Sciences, vol. 80, no. 5, pp. 1008-1020, 2014. http://doi.org/10.1016/j.jcss.2014.02.007
  25. M. E. Nergiz and M. Z. Gok, "Hybrid k-Anonymity," Computers & Security, vol. 44, pp. 51-63, 2014. http://doi.org/10.1016/j.cose.2014.03.006
  26. J. J. Panackal and A. S. Pillai, "Adaptive Utility-based Anonymization Model: Performance Evaluation on Big Data Sets," Procedia Computer Science, vol. 50, pp. 347-352, 2015. http://doi.org/10.1016/j.procs.2015.04.037
  27. E. T. Wang and G. Lee, "An efficient sanitization algorithm for balancing information privacy and knowledge discovery in association patterns mining," Data & Knowledge Engineering, Jun., vol. 65, no. 3, pp. 463-484, 2008.. http://doi.org/10.1016/j.datak.2007.12.005
  28. Y. Pan, X. L. Zhu, and T. G. Chen, "Research on privacy preserving on K-anonymity," Journal of Software, vol. 7, no. 7, pp. 1649-1656, 2012. http://doi.org/10.4304/jsw.7.7.1649-1656
  29. M. E. Nergiz, M. Z. Gok, and U. ozkanli, "Preservation of utility through hybrid k-anonymization," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, vol. 8058 LNCS, pp. 97-111. http://doi.org/10.1007/978-3-642-40343-9_9
  30. S. Moro and R. M. S. Laureano, "Using Data Mining for Bank Direct Marketing: An application of the CRISP-DM methodology," European Simulation and Modelling Conference, 2011. Retrieved from http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
  31. H. A. Elsalamony, "Bank Direct Marketing Analysis of Data Mining Techniques," International Journal of Computer Applications, 2014, pp. 12-22. http://www.ijcaonline.org/archives/volume85/number7/14852-3218
  32. S. Moro, P. Cortez, and P. Rita, "A data-driven approach to predict the success of bank telemarketing," Decision Support Systems, 2014, vol. 62, pp. 22-31. http://doi.org/10.1016/j.dss.2014.03.001

Cited by

  1. Quasi-Identifier Recognition Algorithm for Privacy Preservation of Cloud Data Based on Risk Reidentification vol.2021, pp.None, 2016, https://doi.org/10.1155/2021/7154705