DOI QR코드

DOI QR Code

Impact Analysis of Partition Utility Score in Cluster Analysis

군집분석의 분할 유용도 점수의 영향 분석

  • 이계성 (단국대학교 소프트웨어학과)
  • Received : 2021.06.25
  • Accepted : 2021.07.30
  • Published : 2021.08.31

Abstract

Machine learning algorithms adopt criterion function as a key component to measure the quality of their model derived from data. Cluster analysis also uses this function to rate the clustering result. All the criterion functions have in general certain types of favoritism in producing high quality clusters. These clusters are then described by attributes and their values. Category utility and partition utility play an important role in cluster analysis. These are fully analyzed in this research particularly in terms of how they are related to the favoritism in the final results. In this research, several data sets are selected and analyzed to show how different results are induced from these criterion functions.

기계학습 알고리즘은 기준 함수를 채택하여 데이터를 처리하고 학습 모델을 유도한다. 군집분석에서 사용하는 기준 함수는 어떤 형태로든지 선호성을 내포하게 되고 이를 통해 유사한 데이터끼리 묶어 준 후 이를 구성하는 변수와 값들을 특정하여 군집을 정의하게 된다. 군집분석에서 사용하는 카테고리 유용도와 분할 유용도 점수가 군집분석 결과물에 어떤 영향을 주는지를 파악하고 이들이 결과에 어떤 편향성으로 이어지는지를 분석한다. 본 연구는 군집분석에 사용되는 기준 함수의 특성에 따라 결과에 미치는 영향을 파악하기 위해 여러 데이터 세트를 이용해 실험하고 결과를 평가한다.

Keywords

References

  1. I.H. Witten, E. Frank, M Hall, and C. Palestro, "Data Mining: Practical Machine Learning Tools and Techniques," Elsevier Science & Technology, pp. 9-33, 2017.
  2. T. Mitchell, Machine Learning, McGraw-Hill Education, 1997.
  3. P. Berkhin, "A Survey of Clustering Data Mining Techniques," Grouping Multidimensional Data, Springer, Berlin, Heidelberg, pp. 25-71, 2006. DOI: https://doi.org/10.1007/3-540-28349-8_2
  4. V. Kumar, N. Rathee, "Knowledge Discovery from Database using an Integration of Clustering and Classification," International Journal of Advanced Computer Science and Application, Vol. 2. No.3., pp. 29-33, March 2011.
  5. Wikipedia, "Conceptual Clustering," https://en.wikipedia.org/wiki/Conceptual_clustering
  6. P. Domingos, "The Role of Occam's Razor in Knowledge Discovery," Data Mining and Knowledge Discovery, pp. 409-425, Kluwer Academic Publishers, 1999.
  7. U. Luxburg, "Clustering Stability: An Overview," Foundations and Trends in Machine Learning," Vol. 2, No. 3, pp. 235-274, 2010, DOI: 10.1561/2200000008
  8. D. Fisher, "Knowledge acquisition via incremental conceptual clustering," Machine Learning Vol.2, pp. 139-172, 1987. DOI: https://doi.org/10.1007/BF00114265
  9. D. Fisher, "Iterative Optimization and Simplification of Hierarchical Clustering," Journal of AI Research, Vol. 4, pp. 147-179, 1996. DOI: https://doi.org/10.1613/jair.276
  10. V. Kanageswari and A.Pethalakshmi, "A Novel Approach of Clustering Using COBWEB". International Journal of Information Technology (IJIT), Vol. 3 No. 3, pp 37-42, Jun 2017. DOI: htps://doi.org/10.33144/24545414
  11. G.S. Lee, "A Study on Simplification of Machine Learning Model," The Journal of IIBC, Vol. 16., No. 4., pp. 147-152, Aug 2016. DOI: https://dx.doi.org/10.7236/IIIBC.2015.15.5
  12. G.S. Lee, "The effect of Bias in Data Set for Conceptual Clustering Algorithms," International Journal of Advanced Smart Convergernce, Vol. 8 No.3, pp. 46-52, 2019. DOI: https://dx.doi.org/10.7236/IJASC.2019.8.3.46
  13. G. Biswas, J.B. Weinberg, and D. Fisher, "ITERATE: A Conceptual Clustering Algorithm for Data Mining," IEEE Tr. on Systems, Man and Cybernetics, Vol. 28, Part C No. 2. 1998. DOI:https://doi.org/10.1109/5326.669556
  14. UCI repository, https://archive.ics.uci.edu