DOI QR코드

DOI QR Code

A new clustering algorithm based on the connected region generation

  • Feng, Liuwei (Institute of Information Science, Beijing jiaotong University) ;
  • Chang, Dongxia (Institute of Information Science, Beijing jiaotong University) ;
  • Zhao, Yao (Institute of Information Science, Beijing jiaotong University)
  • Received : 2017.07.06
  • Accepted : 2018.01.24
  • Published : 2018.06.30

Abstract

In this paper, a new clustering algorithm based on the connected region generation (CRG-clustering) is proposed. It is an effective and robust approach to clustering on the basis of the connectivity of the points and their neighbors. In the new algorithm, a connected region generating (CRG) algorithm is developed to obtain the connected regions and an isolated point set. Each connected region corresponds to a homogeneous cluster and this ensures the separability of an arbitrary data set theoretically. Then, a region expansion strategy and a consensus criterion are used to deal with the points in the isolated point set. Experimental results on the synthetic datasets and the real world datasets show that the proposed algorithm has high performance and is insensitive to noise.

Keywords

References

  1. R. O. Duda, P. E. Hart and D. G. Stork, "Pattern Classification, 2nd Edition," Wiley, New York, 2000.
  2. T. Senthil and B. Kannapiran, "EETCA: Energy Efficient Trustworthy Clustering Algorithm for WSN," KSII Transactions on Internet and Information Systems, vol. 10, no. 11, pp. 5437-5454, November, 2016. https://doi.org/10.3837/tiis.2016.11.013
  3. F. Aadil, S. Khan, K. B. Bajwa, M. F. Khan and A. Ali, "Intelligent Clustering in Vehicular ad hoc Networks," KSII Transactions on Internet and Information Systems, vol. 10, no. 8, pp. 3512-3528, August, 2016. https://doi.org/10.3837/tiis.2016.08.005
  4. Y. Su, X. Zhu and W. Z Nie, "Multiple Person Tracking based on Spatial-temporal Information by Global Graph Clustering," KSII Transactions on Internet and Information Systems, vol. 9, no. 6, pp. 2217-2229, June, 2015. https://doi.org/10.3837/tiis.2015.06.014
  5. B. A. Galitsky, G. Dobrocsi, J. L. D. L. Rosa and S. O. Kuznetsov, "Using Generalization of Syntactic Parse Trees for Taxonomy Capture on the Web," in Proc. of the 19th international conference on Conceptual structures for discovering knowledge, pp. 104-117, July 25-29, 2011.
  6. B. Everitt, S. Landau and M. Leese, "Cluster Analysis," Arnold, London, 2001.
  7. J. Macqueen, "Some methods for classification and analysis of multivariate observations," in Proc. of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, January, 1967.
  8. R. Sibson, "SLINK: An optimally efficient algorithm for the single-link cluster method," Computer Journal, vol. 16, no. 1, pp. 30-34, January, 1973. https://doi.org/10.1093/comjnl/16.1.30
  9. M. Ester, H. Kriegel, S. Jiirg and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," in Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226-231, August 2-4, 1996.
  10. J. Shi and J. Malik, "Normalized cuts and image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, August, 2000. https://doi.org/10.1109/34.868688
  11. B. J. Frey, D. Dueck, "Clustering by Passing Messages between Data Points," Science, vol. 315, no. 5814, pp. 972-976, February, 2007. https://doi.org/10.1126/science.1136800
  12. Y. Liu, Y. Liu, and K. C. C. Chan, "Dimensionality reduction for heterogeneous dataset in rushes editing," Pattern Recognition, vol. 42, no. 2, pp. 229-242, 2009. https://doi.org/10.1016/j.patcog.2008.06.016
  13. D. Arthur and S. Vassilvitskii, "k-means++: the advantages of careful seeding," in Proc. of the 18th annual ACM-SIAM symposium on Discrete algorithms, pp. 1027-1035, January 7-9, 2007.
  14. P. Bradley, O. Mangasarian, and W. Street, "Clustering via Concave Minimization," Advances in Neural Information Processing Systems, pp. 368-374, January, 1996.
  15. D. Defays, "An Efficient Algorithm for a Complete Link Method," Computer Journal, vol. 20, no. 4, pp. 364-366, January, 1977. https://doi.org/10.1093/comjnl/20.4.364
  16. J. A. García , J. Fdez-Valdivia, F. J. Cortijo and R. Molina, "A dynamic approach for clustering data," Signal Processing, vol. 44, no. 2, pp. 181-196, June, 1995. https://doi.org/10.1016/0165-1684(95)00023-7
  17. S. Feng, J. Fan, H. Tan, Y. He, H. Mao, W. Luo and D. Ma, "MR-DBSCAN: An Efficient Parallel Density-based Clustering Algorithm using MapReduce," in Proc. of IEEE International Conference on Parallel and Distributed Systems. IEEE Computer Society, pp. 473-480, December 7-9, 2011.
  18. Y. Kim, K. Shim, M. Kim and J. S. Lee, "DBCURE-MR: An efficient density-based clustering algorithm for large data using Map Reduce," Information Systems, vol. 42, pp. 15-35, June, 2014. https://doi.org/10.1016/j.is.2013.11.002
  19. C. Cassisi, A. Ferro, R. Giugno, G. Pigola and A. Pulvirenti, "Enhancing density-based clustering: Parameter reduction and outlier detection," Information Systems, vol. 38, no. 3, pp. 317-330, May, 2013. https://doi.org/10.1016/j.is.2012.09.001
  20. A. Y. Ng, M. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an algorithm," Proceedings of Advances in Neural Information Processing Systems, vol. 14, pp. 849-856, April, 2002.
  21. L. Zelnik-Manor and P. Perona, "Self-tuning spectral clustering," Advances in Neural Information Processing Systems, vol. 17, pp. 1601-1608, January, 2004.
  22. C. Hong and D. Y. Yeung, "Robust path-based spectral clustering with application to image segmentation," in Proc. of 10th IEEE International Conference on Computer Vision, IEEE Computer Society, pp. 278-285, October 17-20, 2005.
  23. X. Zhang, J. Li and H. Yu, "Local density adaptive similarity measurement for spectral clustering," Pattern Recognition Letters, vol. 32, no. 2, pp. 352-358, January, 2011. https://doi.org/10.1016/j.patrec.2010.09.014
  24. C. H. Q. Ding, X. He, H. Zha, M. Gu and H. D. Simon, "A min-max cut algorithm for graph partitioning and data clustering," in Proc. of IEEE International Conference on Data Mining. IEEE Computer Society, pp. 107-114, November 29-December 2, 2001.
  25. I. Fischer and J. Poland, "Amplifying the block matrix structure for spectral clustering," Idsia, pp. 21-28, January, 2005.
  26. L. Hagen and A. B. Kahng, "New spectral methods for ratio cut partitioning and clustering," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 11, no. 9, pp. 1074-1085, November, 2006.
  27. T. Xia, J. Cao, Y. Zhang and J. Li, "On defining affinity graph for spectral clustering through ranking on manifolds," Neurocomputing, vol. 72, no. 13-15, pp. 3203-3211, August, 2009. https://doi.org/10.1016/j.neucom.2009.03.012
  28. J. Cao, P. Chen, W. K. Ling, Z. Yang and Q. Dai, "Spectral Clustering with Sparse Graph Construction Based on Markov Random Walk," KSII Transactions on Internet and Information Systems, vol. 9, no. 7, pp. 2568-2584, July, 2015. https://doi.org/10.3837/tiis.2015.07.013
  29. I. E. Givoni, B. J. Frey, "A binary variable model for affinity propagation," Neural Computation, vol. 21, no. 6, pp. 1589-1600, June, 2009. https://doi.org/10.1162/neco.2009.05-08-785
  30. I. E. Givoni, B. J. Frey, "Semi-Supervised Affinity Propagation with Instance-Level Constraints," in Proc. of the international conference on Artificial Intelligence &Statistics, pp. 161-168, 2009.
  31. M. Leone, M. Weigt, "Clustering by soft-constraint affinity propagation: applications to gene-expression data," Bioinformatics, vol. 23, no. 20, pp. 2708-2715, October, 2007. https://doi.org/10.1093/bioinformatics/btm414
  32. M. L. Sumedha, M. Weigt, "Unsupervised and semi-supervised clustering by message passing: soft-constraint affinity propagation," European Physical Journal B, vol. 66, no. 1, pp. 125-135, October, 2008. https://doi.org/10.1140/epjb/e2008-00381-8
  33. C. Furtlehner, M. Sebag, X. Zhang, "Scaling analysis of affinity propagation," Physical Review E Statistical Nonlinear & Soft Matter Physics, vol. 81, no. 6 Pt 2, pp. 066102 , 2009.
  34. C. Fu, J. Wang, X. Chen, Z. Qin, M. Zhao, "Flow Transformation of Anonymous Communication Based on Hierarchical Weighted Affinity Propagation Clustering," Journal of Computational Information Systems, vol. 7, no. 1, 2011.
  35. A. Rodriguez and A. Laio, "Machine learning. Clustering by fast search and find of density peaks," Science, vol. 344, no. 6191, pp. 1492-1496, June, 2014. https://doi.org/10.1126/science.1242072
  36. G. Wang and Q. Song, "Automatic Clustering via Outward Statistical Testing on Density Metrics," IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 8, pp. 1971-1985, August, 2016. https://doi.org/10.1109/TKDE.2016.2535209
  37. F. R. Bach and M. I. Jordan, "Learning Spectral Clustering," Advances in Neural Information Processing Systems, vol. 16, no. 2, pp. 2006, June, 2003.