A Density Peak Clustering Algorithm Based on Information Bottleneck

Yongli Liu;Congcong Zhao;Hao Chao;

doi:10.3745/JIPS.04.0294

Journal of Information Processing Systems

Volume 19 Issue 6
/
Pages.778-790
/
2023
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

A Density Peak Clustering Algorithm Based on Information Bottleneck

Yongli Liu (School of Computer Science and Technology, Henan Polytechnic University) ;
Congcong Zhao (School of Computer Science and Technology, Henan Polytechnic University) ;
Hao Chao (School of Computer Science and Technology, Henan Polytechnic University)

Received : 2022.08.04
Accepted : 2023.01.08
Published : 2023.12.31

https://doi.org/10.3745/JIPS.04.0294 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Although density peak clustering can often easily yield excellent results, there is still room for improvement when dealing with complex, high-dimensional datasets. One of the main limitations of this algorithm is its reliance on geometric distance as the sole similarity measurement. To address this limitation, we draw inspiration from the information bottleneck theory, and propose a novel density peak clustering algorithm that incorporates this theory as a similarity measure. Specifically, our algorithm utilizes the joint probability distribution between data objects and feature information, and employs the loss of mutual information as the measurement standard. This approach not only eliminates the potential for subjective error in selecting similarity method, but also enhances performance on datasets with multiple centers and high dimensionality. To evaluate the effectiveness of our algorithm, we conducted experiments using ten carefully selected datasets and compared the results with three other algorithms. The experimental results demonstrate that our information bottleneck-based density peaks clustering (IBDPC) algorithm consistently achieves high levels of accuracy, highlighting its potential as a valuable tool for data clustering tasks.

Keywords

References

D. Crowther, S. Kim, J. Lee, J. Lim, and S. Loewen, "Methodological synthesis of cluster analysis in second language research," Language Learning, vol. 71, no. 1, pp. 99-130, 2021. https://doi.org/10.1111/lang.12428
R. Cohn and E. Holm, "Unsupervised machine learning via transfer learning and k-means clustering to classify materials image data," Integrating Materials and Manufacturing Innovation, vol. 10, no. 2, pp. 231-244, 2021. https://doi.org/10.1007/s40192-021-00205-8
A. S. Ramos, C. H. Fontes, A. M. Ferreira, C. C. Baccili, K. N. da Silva, V. Gomes, and G. J. A. de Melo, "Somatic cell count in buffalo milk using fuzzy clustering and image processing techniques," Journal of Dairy Research, vol. 88, no. 1, pp. 69-72, 2021. https://doi.org/10.1017/S0022029921000042
P. Bhattacharjee and P. Mitra, "A survey of density based clustering algorithms," Frontiers of Computer Science, vol. 15, article no. 151308, 2021. https://doi.org/10.1007/s11704-019-9059-3
A. Rodriguez and A. Laio, "Clustering by fast search and find of density peaks," Science, vol. 344, no. 6191, pp. 1492-1496, 2014. https://doi.org/10.1126/science.1242072
Y. Wang, D. Wang, W. Pang, C. Miao, A. H. Tan, and Y. Zhou, "A systematic density-based clustering method using anchor points," Neurocomputing, vol. 400, pp. 352-370, 2020. https://doi.org/10.1016/j.neucom.2020.02.119
M. Du, S. Ding, and H. Jia, "Study on density peaks clustering based on k-nearest neighbors and principal component analysis," Knowledge-Based Systems, vol. 99, pp. 135-145, 2016. https://doi.org/10.1016/j.knosys.2016.02.001
R. Liu, H. Wang, and X. Yu, "Shared-nearest-neighbor-based clustering by fast search and find of density peaks," Information Sciences, vol. 450, pp. 200-226, 2018. https://doi.org/10.1016/j.ins.2018.03.031
S. F. Ding, X. Xu, and Y. R. Wang, "Optimized density peaks clustering algorithm based on dissimilarity measure," Journal of Software, vol. 31, no. 11, pp. 3321-3333, 2020. https://doi.org/10.13328/j.cnki.jos.005813
N. Slonim and N. Tishby, "Agglomerative information bottleneck," Advances in Neural Information Processing Systems, vol. 12, pp. 617-623, 1999.
N. Slonim, N. Friedman, and N. Tishby, "Unsupervised document classification using sequential information maximization," in Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 2002, pp. 129-136. https://doi.org/10.1145/564376.564401
Y. Liu and X. Wan, "Information bottleneck based incremental fuzzy clustering for large biomedical data," Journal of Biomedical Informatics, vol. 62, pp. 48-58, 2016. https://doi.org/10.1016/j.jbi.2016.05.009
S. Hu, R. Wang, and Y. Ye, "Interactive information bottleneck for high-dimensional co-occurrence data clustering," Applied Soft Computing, vol. 111, article no. 107837, 2021. https://doi.org/10.1016/j.asoc.2021.107837
N. Tishby, F. C. Pereira, and W. Bialek, "The information bottleneck method," in Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA, 1999, pp. 368-377.

Journal of Information Processing Systems

A Density Peak Clustering Algorithm Based on Information Bottleneck

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)