DOI QR코드

DOI QR Code

A Density Peak Clustering Algorithm Based on Information Bottleneck

  • Yongli Liu (School of Computer Science and Technology, Henan Polytechnic University) ;
  • Congcong Zhao (School of Computer Science and Technology, Henan Polytechnic University) ;
  • Hao Chao (School of Computer Science and Technology, Henan Polytechnic University)
  • Received : 2022.08.04
  • Accepted : 2023.01.08
  • Published : 2023.12.31

Abstract

Although density peak clustering can often easily yield excellent results, there is still room for improvement when dealing with complex, high-dimensional datasets. One of the main limitations of this algorithm is its reliance on geometric distance as the sole similarity measurement. To address this limitation, we draw inspiration from the information bottleneck theory, and propose a novel density peak clustering algorithm that incorporates this theory as a similarity measure. Specifically, our algorithm utilizes the joint probability distribution between data objects and feature information, and employs the loss of mutual information as the measurement standard. This approach not only eliminates the potential for subjective error in selecting similarity method, but also enhances performance on datasets with multiple centers and high dimensionality. To evaluate the effectiveness of our algorithm, we conducted experiments using ten carefully selected datasets and compared the results with three other algorithms. The experimental results demonstrate that our information bottleneck-based density peaks clustering (IBDPC) algorithm consistently achieves high levels of accuracy, highlighting its potential as a valuable tool for data clustering tasks.

Keywords

References

  1. D. Crowther, S. Kim, J. Lee, J. Lim, and S. Loewen, "Methodological synthesis of cluster analysis in second language research," Language Learning, vol. 71, no. 1, pp. 99-130, 2021. https://doi.org/10.1111/lang.12428 
  2. R. Cohn and E. Holm, "Unsupervised machine learning via transfer learning and k-means clustering to classify materials image data," Integrating Materials and Manufacturing Innovation, vol. 10, no. 2, pp. 231-244, 2021. https://doi.org/10.1007/s40192-021-00205-8 
  3. A. S. Ramos, C. H. Fontes, A. M. Ferreira, C. C. Baccili, K. N. da Silva, V. Gomes, and G. J. A. de Melo, "Somatic cell count in buffalo milk using fuzzy clustering and image processing techniques," Journal of Dairy Research, vol. 88, no. 1, pp. 69-72, 2021. https://doi.org/10.1017/S0022029921000042 
  4. P. Bhattacharjee and P. Mitra, "A survey of density based clustering algorithms," Frontiers of Computer Science, vol. 15, article no. 151308, 2021. https://doi.org/10.1007/s11704-019-9059-3 
  5. A. Rodriguez and A. Laio, "Clustering by fast search and find of density peaks," Science, vol. 344, no. 6191, pp. 1492-1496, 2014. https://doi.org/10.1126/science.1242072 
  6. Y. Wang, D. Wang, W. Pang, C. Miao, A. H. Tan, and Y. Zhou, "A systematic density-based clustering method using anchor points," Neurocomputing, vol. 400, pp. 352-370, 2020. https://doi.org/10.1016/j.neucom.2020.02.119 
  7. M. Du, S. Ding, and H. Jia, "Study on density peaks clustering based on k-nearest neighbors and principal component analysis," Knowledge-Based Systems, vol. 99, pp. 135-145, 2016. https://doi.org/10.1016/j.knosys.2016.02.001 
  8. R. Liu, H. Wang, and X. Yu, "Shared-nearest-neighbor-based clustering by fast search and find of density peaks," Information Sciences, vol. 450, pp. 200-226, 2018. https://doi.org/10.1016/j.ins.2018.03.031 
  9. S. F. Ding, X. Xu, and Y. R. Wang, "Optimized density peaks clustering algorithm based on dissimilarity measure," Journal of Software, vol. 31, no. 11, pp. 3321-3333, 2020. https://doi.org/10.13328/j.cnki.jos.005813 
  10. N. Slonim and N. Tishby, "Agglomerative information bottleneck," Advances in Neural Information Processing Systems, vol. 12, pp. 617-623, 1999. 
  11. N. Slonim, N. Friedman, and N. Tishby, "Unsupervised document classification using sequential information maximization," in Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 2002, pp. 129-136. https://doi.org/10.1145/564376.564401 
  12. Y. Liu and X. Wan, "Information bottleneck based incremental fuzzy clustering for large biomedical data," Journal of Biomedical Informatics, vol. 62, pp. 48-58, 2016. https://doi.org/10.1016/j.jbi.2016.05.009 
  13. S. Hu, R. Wang, and Y. Ye, "Interactive information bottleneck for high-dimensional co-occurrence data clustering," Applied Soft Computing, vol. 111, article no. 107837, 2021. https://doi.org/10.1016/j.asoc.2021.107837 
  14. N. Tishby, F. C. Pereira, and W. Bialek, "The information bottleneck method," in Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA, 1999, pp. 368-377.