DOI QR코드

DOI QR Code

Experiment and Implementation of a Machine-Learning Based k-Value Prediction Scheme in a k-Anonymity Algorithm

k-익명화 알고리즘에서 기계학습 기반의 k값 예측 기법 실험 및 구현

  • Received : 2019.10.11
  • Accepted : 2019.12.31
  • Published : 2020.01.31

Abstract

The k-anonymity scheme has been widely used to protect private information when Big Data are distributed to a third party for research purposes. When the scheme is applied, an optimal k value determination is one of difficult problems to be resolved because many factors should be considered. Currently, the determination has been done almost manually by human experts with their intuition. This leads to degrade performance of the anonymization, and it takes much time and cost for them to do a task. To overcome this problem, a simple idea has been proposed that is based on machine learning. This paper describes implementations and experiments to realize the proposed idea. In thi work, a deep neural network (DNN) is implemented using tensorflow libraries, and it is trained and tested using input dataset. The experiment results show that a trend of training errors follows a typical pattern in DNN, but for validation errors, our model represents a different pattern from one shown in typical training process. The advantage of the proposed approach is that it can reduce time and cost for experts to determine k value because it can be done semi-automatically.

빅 데이터를 연구 목적으로 제3자에게 배포할 때 프라이버시 정보를 보호하기 위해서 k-익명화 기법이 널리 사용되어 왔다. k-익명화 기법을 적용할 때, 해결 해야할 어려운 문제 중의 하나는 최적의 k값을 결정하는 것이다. 현재는 대부분 전문가의 직관에 근거하여 수동으로 결정되고 있다. 이러한 방식은 익명화의 성능을 떨어뜨리고 시간과 비용을 많이 낭비하게 만든다. 이러한 문제점을 해결하기 위해서 기계학습 기반의 k값 결정방식을 제안한다. 본 논문에서는 제안된 아이디어를 실제로 적용한 구현 및 실험 내용에 대해서 서술 한다. 실험에서는 심층 신경망을 구현하여 훈련하고 테스트를 수행 하였다. 실험결과 훈련 에러는 전형적인 신경망에서 보여지는 패턴을 나타냈으며, 테스트 실험에서는 훈련에러에서 나타나는 패턴과는 다른 패턴을 보여주고 있다. 제안된 방식의 장점은 k값 결정시 시간과 비용을 줄일 수 있다는 장점이 있다.

Keywords

References

  1. L. M. Kumbayoni and S.-B. Jang, "A Determination of k-value Based on Artificial Neural Network in k-anonymization," Proceedings of International Conference on Convergence Research, Vol.4, No.3, pp.437-440, 2018.
  2. L. Sweeney, "k-anonymity: a model for protecting privacy," International Journal on Uncertainty, Fuzziness and Knowledge based Systems, pp.557-570, 2002.
  3. B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu, "Privacy-Preserving Data Publishing: A Survey of Recent Developments," ACM Computing Survey, Vol.42, No.4, pp.1-53, 2010.
  4. I. Roy, S. T. Setty, A Kilzer, V Shmatikov, and E Witchel, "Airavat: Security and Privacy for Mapreduce," Proceedings of Seventh USENIX Conference on Networked Systems Design and Implementation, pp.297-312, 2010.
  5. N. Elanshekhar and R. Shedge, "An effective anonymization technique of big data using suppression slicing method," 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp.2500- 2504, 2017,
  6. M. Nassar, A. A. R. Orabi, M. Doha, and B. Al Bouna, "An SQL-like query tool for data anonymization and outsourcing," 2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), pp.1-3, 2015.
  7. R. Matsunaga, I. Ricarte, T. Basso, and R. Moraes, "Towards an Ontology-Based definition of Data Anonymization Policy for Cloud Computing and Big Data," 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp.75-82, 2017.
  8. X. Zhang, W. Dou, J. Pei, S. Nepal, C. Yang, C. Liu, and J. Chen, "Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud," IEEE Transactions on Computers, Vol.64, No.8, pp.2293-2307, 2015. https://doi.org/10.1109/TC.2014.2360516
  9. A. A. William, "On the Efficiency of Learning Machines," IEEE Transactions on Systems Science and Cybernetics, Vol.3, No.2, pp.111-116, 1967. https://doi.org/10.1109/TSSC.1967.300091
  10. V. F. Nicholas, "Some New Approaches to Machine Learning," IEEE Transactions on Systems Science and Cybernetics, Vol.5, No.3, pp.173-182, 1969. https://doi.org/10.1109/TSSC.1969.300258
  11. H. C. Anderson, "Neural network machines", IEEE Potentials, Vol.8, No.1, pp.13-16, 1989. https://doi.org/10.1109/45.31575
  12. J. Nagumo and A. Noda, "A learning method for system identification," IEEE Transactions on Automatic Control, Vol.12, No.3, pp.282-287, 1967. https://doi.org/10.1109/TAC.1967.1098599
  13. W. T. Illingworth, "Beginner's guide to neural networks," IEEE Aerospace and Electronic Systems Magazine, Vol.4, No.9, pp.234-241, 1989. https://doi.org/10.1109/62.35668
  14. D. Burshtein, "Long-term attraction in higher order neural networks," IEEE Transactions on Neural Networks, Vol.9, No.1, pp.42-50, 1998. https://doi.org/10.1109/72.655028
  15. P. D. Wasserman and T. Schwartz, "Neural networks. II. What are they and why is everybody so interested in them now?," IEEE Expert, Vol.3, No.1, pp.10-15, 1988. https://doi.org/10.1109/64.2091
  16. P. Yugowati, M. Shaou-Gang, and W. Hui-Ming, "Supervised learning approaches and feature selection-a case study in diabetes," International Journal of Data Analysis Techniques and Strategies, Vol. 5, No.3, 2013, pp. 323-337. https://doi.org/10.1504/IJDATS.2013.055346
  17. J.-X. Peng, L. Kang, and W. I. George, "A New Jacobian Matrix for Optimal Learning of Single-Layer Neural Networks," IEEE Transactions on Neural Networks, Vol.19, No.1, pp.119-129, 2008. https://doi.org/10.1109/TNN.2007.903150
  18. C. Gustavo, B. C. Antoni, J. M. Pedro, and V. Nuno, "Supervised Learning of Semantic Classes for Image Annotation and Retrieval," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.29, No.3, pp.394-410, 2007. https://doi.org/10.1109/TPAMI.2007.61