DOI QR코드

DOI QR Code

Comprehensive review on Clustering Techniques and its application on High Dimensional Data

  • Alam, Afroj (Department of Computer Application Integral University) ;
  • Muqeem, Mohd (Department of Computer Application Integral University) ;
  • Ahmad, Sultan (Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University)
  • Received : 2021.06.05
  • Published : 2021.06.30

Abstract

Clustering is a most powerful un-supervised machine learning techniques for division of instances into homogenous group, which is called cluster. This Clustering is mainly used for generating a good quality of cluster through which we can discover hidden patterns and knowledge from the large datasets. It has huge application in different field like in medicine field, healthcare, gene-expression, image processing, agriculture, fraud detection, profitability analysis etc. The goal of this paper is to explore both hierarchical as well as partitioning clustering and understanding their problem with various approaches for their solution. Among different clustering K-means is better than other clustering due to its linear time complexity. Further this paper also focused on data mining that dealing with high-dimensional datasets with their problems and their existing approaches for their relevancy

Keywords

Acknowledgement

The authors would like to thank the Deanship of Scientific Research at Prince Sattam Bin Abdulaziz University, Alkharj, Saudi Arabia for the assistance.

References

  1. Guha, S., Rastogi, R., & Shim, K. (1998). CURE: An efficient clustering algorithm for large databases. ACM Sigmod record, 27(2), 73-84. https://doi.org/10.1145/276305.276312
  2. Saxena, A., Prasad, M., Gupta, A., Bharill, N., Patel, O. P., Tiwari, A., ... & Lin, C. T. (2017). A review of clustering techniques and developments. Neurocomputing, 267, 664-681. https://doi.org/10.1016/j.neucom.2017.06.053
  3. Bansal, A., Sharma, M., & Goel, S. (2017). Improved Kmean clustering algorithm for prediction analysis using classification technique in data mining. International Journal of Computer Applications, 157(6), 0975-8887.
  4. Pavithra, M., & Parvathi, R. M. S. (2017). A survey on clustering high dimensional data techniques. International Journal of Applied Engineering Research, 12(11), 2893-2899.
  5. Han, J.,Pie, J., & Kamber, M. (2010). Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2010.
  6. Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The computer journal, 41(8), 578-588. https://doi.org/10.1093/comjnl/41.8.578
  7. Cohen-Addad, V., Kanade, V., Mallmann-Trenn, F., & Mathieu, C. (2019). Hierarchical clustering: Objective functions and algorithms. Journal of the ACM (JACM), 66(4), 1-42.
  8. Murtagh, F., & Contreras, P. (2017). Algorithms for hierarchical clustering: an overview, II. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(6), e1219.
  9. Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., & Song, A. (2015). Efficient agglomerative hierarchical clustering. Expert Systems with Applications, 42(5), 2785-2797. https://doi.org/10.1016/j.eswa.2014.09.054
  10. Pandove, D., Goel, S., & Rani, R. (2018). Systematic review of clustering high-dimensional and large datasets. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(2), 1-68. https://doi.org/10.1145/3132088
  11. Kameshwaran, K., & Malarvizhi, K. (2014). Survey on clustering techniques in data mining. International Journal of Computer Science and Information Technologies, 5(2), 2272-2276.
  12. Popat, S. K., & Emmanuel, M. (2014). Review and comparative study of clustering techniques. International journal of computer science and information technologies, 5(1), 805-812.
  13. Shakeel, P. M., Baskar, S., Dhulipala, V. S., & Jaber, M. M. (2018). Cloud based framework for diagnosis of diabetes mellitus using K-means clustering. Health information science and systems, 6(1), 1-7. https://doi.org/10.1007/s13755-017-0038-5
  14. Mohammed, N. N., & Abdulazeez, A. M. (2017, June). Evaluation of partitioning around medoids algorithm with various distances on microarray data. In 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) (pp. 1011-1016). IEEE.
  15. Elavarasi, S. A., Akilandeswari, J., & Sathiyabhama, B. (2011). A survey on partition clustering algorithms. International Journal of Enterprise Computing and Business Systems, 1(1).
  16. Makwana, T. M., & Prashant, R. (2013). Partitioning Clustering algorithms for handling numerical and categorical data: a review. arXiv preprint arXiv:1311.7219.
  17. Shah, M., & Nair, S. (2015). A survey of data mining clustering algorithms. International Journal of Computer Applications, 128(1), 1-5. https://doi.org/10.5120/ijca2015906404
  18. Zafar, M. H., & Ilyas, M. (2015). A clustering based study of classification algorithms. International journal of database theory and application, 8(1), 11-22. https://doi.org/10.14257/ijdta.2015.8.1.02
  19. Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (2005). Automatic subspace clustering of high dimensional data. Data Mining and Knowledge Discovery, 11(1), 5-33. https://doi.org/10.1007/s10618-005-1396-1
  20. Ding, C., He, X., Zha, H., & Simon, H. D. (2002, December). Adaptive dimension reduction for clustering high dimensional data. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings. (pp. 147-154). IEEE.
  21. Pandove, D., Goel, S., & Rani, R. (2018). Systematic review of clustering high-dimensional and large datasets. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(2), 1-68 https://doi.org/10.1145/3132088
  22. Khanmohammadi, S., Adibeig, N., & Shanehbandy, S. (2017). An improved overlapping k-means clustering method for medical applications. Expert Systems with Applications, 67, 12-18. https://doi.org/10.1016/j.eswa.2016.09.025
  23. Fu, X., Zeng, X. J., Feng, P., & Cai, X. (2018). Clustering-based short-term load forecasting for residential electricity under the increasing-block pricing tariffs in China. Energy, 165, 76-89. https://doi.org/10.1016/j.energy.2018.09.156
  24. Nanda, S. J., & Panda, G. (2014). A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm and Evolutionary computation, 16, 1-18. https://doi.org/10.1016/j.swevo.2013.11.003
  25. Torabi, M., Hashemi, S., Saybani, M. R., Shamshirband, S., & Mosavi, A. (2019). A Hybrid clustering and classification technique for forecasting short-term energy consumption. Environmental progress & sustainable energy, 38(1), 66-76. https://doi.org/10.1002/ep.12934
  26. Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The computer journal, 41(8), 578-588. https://doi.org/10.1093/comjnl/41.8.578
  27. Sneath, P. H., & Sokal, R. R. (1973). Numerical taxonomy. The principles and practice of numerical classification.
  28. Murtagh, F. (1983). A survey of recent advances in hierarchical clustering algorithms. The computer journal, 26(4), 354-359. https://doi.org/10.1093/comjnl/26.4.354
  29. Assent, I. (2012). Clustering high dimensional data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(4), 340-350. https://doi.org/10.1002/widm.1062
  30. A. E. M. Eljialy, Sultan Ahmad,"Errors Detection Mechanism in Big Data",IEEE, Second International Conference on Smart Systems and Inventive Technology (ICSSIT 2019) on 27-29 November, 2019