Privacy-Preserving K-means Clustering using Homomorphic Encryption in a Multiple Clients Environment

다중 클라이언트 환경에서 동형 암호를 이용한 프라이버시 보장형 K-평균 클러스터링

  • 권희용 (인하대학교 컴퓨터공학과) ;
  • 임종혁 (인하대학교 컴퓨터공학과) ;
  • 이문규 (인하대학교 컴퓨터공학과)
  • Received : 2019.06.25
  • Accepted : 2019.08.13
  • Published : 2019.08.31

Abstract

Machine learning is one of the most accurate techniques to predict and analyze various phenomena. K-means clustering is a kind of machine learning technique that classifies given data into clusters of similar data. Because it is desirable to perform an analysis based on a lot of data for better performance, K-means clustering can be performed in a model with a server that calculates the centroids of the clusters, and a number of clients that provide data to server. However, this model has the problem that if the clients' data are associated with private information, the server can infringe clients' privacy. In this paper, to solve this problem in a model with a number of clients, we propose a privacy-preserving K-means clustering method that can perform machine learning, concealing private information using homomorphic encryption.

기계 학습은 다양한 현상의 예측 및 분석 등을 가장 정확하게 수행하는 기술 중 하나이다. K-평균 클러스터링은 주어진 데이터들을 비슷한 데이터들의 군집으로 분류하는 기계 학습 기법의 한 종류로 다양한 분야에서 사용된다. K-평균 클러스터링의 성능을 높이기 위해서는 가능하면 많은 데이터에 기반한 분석을 수행하는 것이 바람직하므로, K-평균 클러스터링은 데이터를 제공하는 다수의 클라이언트들과 제공받은 데이터들을 사용하여 클러스터의 중심값을 계산하는 서버가 있는 모델에서 수행될 수 있다. 그러나 이 모델은 클라이언트들의 데이터가 민감한 정보를 포함하고 있는 경우, 서버가 클라이언트들의 프라이버시를 침해할 수 있다는 문제점이 있다. 본 논문에서는 다수의 클라이언트가 있는 모델에서 이러한 문제를 해결하기 위해 동형 암호를 사용하여 클라이언트의 프라이버시를 보호하며 기계 학습을 수행할 수 있는 프라이버시 보장형 K-평균 클러스터링 방법을 제안한다.

Keywords

References

  1. D. Alahakoon and X. Yu, "Smart Electricity Meter Data Intelligence for Future Energy Systems: A Survey," IEEE Transactions on Industrial Informatics. Vol.12, No.1, pp.425-436, Feb. 2016. https://doi.org/10.1109/TII.2015.2414355
  2. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. van der Laak, B. van Ginneken and C. I. Sanchez, "A survey on deep learning in medical image analysis," Medical image analysis. Elsevier, Vol.42, pp.60-88, 2017. https://doi.org/10.1016/j.media.2017.07.005
  3. 고종민, 임승우, 정헌, 이상웅, "MRI를 이용한 치매 진단을 위한 패턴인식 기법의 비교분석." 한국차세대컴퓨팅학회 논문지, 12(4), 47-57, 2016.
  4. A. Kamilaris and F. X. Prenafeta-Boldu, "Deep learning in agriculture: A survey," Computers and Electronics in Agriculture. Elsevier, Vol.147, pp.70-90, 2018. https://doi.org/10.1016/j.compag.2018.02.016
  5. 하지수, 노상욱, 박소령, "통합 전자전에서 기계학습을 이용한 위협체 역추정 모델링." 한국차세대컴퓨팅학회논문지, 11(5), 43-52, 2015.
  6. J. A. Hartigan, and M. A. Wong, "Algorithm AS 136: A k-means clustering algorithm," Journal of the Royal Statistical Society. Series C (Applied Statistics). JSTOR, Vol.28, No.1, pp.100-108, 1979.
  7. R. Agrawal and R. Srikant, "Privacy-preserving data mining," Vol.29, No.2, ACM, 2000.
  8. Y. Lindell and B. Pinkas, "Privacy preserving data mining," Annual International Cryptology Conference. Springer, pp.36-54, 2000.
  9. O. Goldreich, "The Foundations of Cryptography - Volume 2, Basic Applications," Cambridge University Press. 2004.
  10. Y. Yoon and M. K. Lee, "Secure Power Demand Prediction Using Multi-Site Pattern Sequence-based Forecasting," ICNGC 2017b. 2017.
  11. R. Bost, R. A. Popa, S. Tu, and S. Goldwasser, "Machine learning classification over encrypted data," NDSS. 2015.
  12. T. Graepel, K. Lauter, and M. Naehrig, "ML confidential: Machine learning on encrypted data," International Conference on Information Security and Cryptology. Springer, pp.1-21, 2012.
  13. H. Takabi, E. Hesamifard, and M. Ghasemi, "Privacy preserving multi-party machine learning with homomorphic encryption," 29th Annual Conference on Neural Information Processing Systems (NIPS). 2016.
  14. H. T. Poon, and A. Miri. "Scanning for viruses on encrypted cloud storage." 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld). IEEE, 2016.
  15. R. L. Cannon, J. V. Dave, and J. C. Bezdek. "Efficient implementation of the fuzzy c-means clustering algorithms." IEEE transactions on pattern analysis and machine intelligence 2 (1986): 248-255.
  16. A. Alabdulatif, H. Kumarage, I. Khalil and X. Yi, "Privacy-preserving anomaly detection in cloud with lightweight homomorphic encryption." Journal of Computer and System Sciences 90 (2017): 28-45. https://doi.org/10.1016/j.jcss.2017.03.001
  17. S. R. M. Oliveira and O. R. Zaiane, "Privacy Preserving Clustering by Data Transformation," XVIII Simposio Brasileiro de Bancos de Dados, Anais/Proceedings. pp.304-318, 2003.
  18. J. Vaidya and C. Clifton, "Privacy-preserving k-means clustering over vertically partitioned data,"Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp.206-215, 2003.
  19. G. Jagannathan and R. N. Wright, "Privacy-preserving distributed k-means clustering over arbitrarily partitioned data," Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp.593-599, 2005.
  20. P. Paillier, "Public-key cryptosystems based on composite degree residuosity classes," Int. Conf. Theory and Appl. Cryptographic Techn. pp.223-238, 1999.
  21. I. Damgard, M. Geisler, and M. Kroigard, "Homomorphic encryption and secure comparison," Int. J. Appl. Cryptography. Vol.1, No.1, pp.22-31, 2008. https://doi.org/10.1504/IJACT.2008.017048
  22. M. Joye and B. Libert, "Efficient cryptosystems from 2 k-th power residue symbols," Annu. Int. Conf. Theory Appl. Cryptographic Techn. pp.76-92, 2013.
  23. H. Steinhaus, "Sur la division des corp materiels en parties," Bull. Acad. Polon. Sci. Vol.1, pp.801-804, 1956.
  24. S. Lloyd, "Least squares quantization in PCM," IEEE transactions on information theory. IEEE, Vol.28, No.2, pp.129-137, 1982. https://doi.org/10.1109/TIT.1982.1056489
  25. E. W. Forgy, "Cluster analysis of multivariate data: efficiency versus interpretability of classifications," Biometrics. Vol.21, pp.768-769, 1965.
  26. R. L. Rivest, L. Adleman, and M. L. Dertouzos, "On data banks and privacy homomorphisms," Found. Secure Comput. pp.169-179, 1978.
  27. S. Goldwasser and S. Micali, "Probabilistic encryption," J. Comput. Syst. Sci. Elsevier, Vol.28, No.2, pp.270-299, 1984. https://doi.org/10.1016/0022-0000(84)90070-9
  28. C. Gentry, "Fully Homomorphic Encryption Using Ideal Lattices," Proc. 41st Annu. ACM Symp. Theory Comput. pp.169-178, 2009.
  29. J. MacQueen et al., "Some methods for classification and analysis of multivariate observations," Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol.1, No.14, pp.281-297, 1967.
  30. J. M. Pena, J. A. Lozano, and P. Larranaga, "An empirical comparison of four initialization methods for the k-means algorithm," Pattern recognition letters. Elsevier, Vol.20, No.10, pp.1027-1040, 1999. https://doi.org/10.1016/S0167-8655(99)00069-0
  31. F. M. Alvarez, A. Troncoso, J. C. Riquelme, and J. S. A. Ruiz, "Energy time series forecasting based on pattern sequence similarity," IEEE Transactions on Knowledge and Data Engineering. IEEE, Vol.23, No.8, pp.1230-1243, 2011. https://doi.org/10.1109/TKDE.2010.227