Finding the Number of Clusters and Various Experiments Based on ASA Clustering Method

ASA 군집화를 이용한 군집수 결정 및 다양한 실험

  • 윤복식 (홍익대학교 기초과학과 응용수학)
  • Published : 2006.06.01

Abstract

In many cases of cluster analysis we are forced to perform clustering without any prior knowledge on the number of clusters. But in some clustering methods such as k-means algorithm it is required to provide the number of clusters beforehand. In this study, we focus on the problem to determine the number of clusters in the given data. We follow the 2 stage approach of ASA clustering algorithm and mainly try to improve the performance of the first stage of the algorithm. We verify the usefulness of the method by applying it for various kinds of simulated data. Also, we apply the method for clustering two kinds of real life qualitative data.

Keywords

References

  1. 윤복식, '최적에 가까운 군집화를 위한 이단계 방법', 한국경영과학회지, 제29권, 제1호 (2004), pp.43-56
  2. Bock, H.H., 'Probability Models and Hypothesis Testing in Partitioning Cluster Analysis,' in P. Arabie, L.J. Hubert, and G. De Soete (Eds), Clustering and Classification, World Scientific, Singapore, (1996), pp.377- 453
  3. Dorndorf, U. and E. Pesch, 'Fast Clustering Algorithms,' ORSA Journal on Computing, Vol.6, No.2(1994), pp.141-153 https://doi.org/10.1287/ijoc.6.2.141
  4. Geva, A.B. and Y. Steinberg, 'A Comparison of Cluster Validity Criteria for a Mixture of Normal Distributed Data,' Pattern Recognition Letters, Vol.21(2000), pp.511- 529 https://doi.org/10.1016/S0167-8655(00)00016-7
  5. Hand, D.J., Discrimination and Classification, John Wiley & Sons, New York, 1981
  6. Hardy, A., 'On the Number of Clusters,' Computational Statistics & Data Analysis, Vol.23(1996), pp.83-96 https://doi.org/10.1016/S0167-9473(96)00022-9
  7. Jain, A.K. and J.N. Moreau, 'Bootstrap Techniques in Cluster Analysis,' Pattern Recognition, Vol.20(1987), pp.547-568 https://doi.org/10.1016/0031-3203(87)90081-1
  8. Kothari, R. and D. Pitts, 'On Finding the Number of Clusters,' Pattern Recognition Letters, Vol.20(1999), pp.405-416 https://doi.org/10.1016/S0167-8655(99)00008-2
  9. Manly, B.J., Multivariate Statistical Methods (2nd ed.), Chapman & Hall, London, 1994
  10. Milligan, G.W. and M.C. Cooper, 'An Examination of Procedures for Determining the Number of Clusters in a Data Set,' Psychometrika, Vol.50, No.2(1985), pp.159-179 https://doi.org/10.1007/BF02294245
  11. Milligan, G.W., 'Clustering Validation:Results and Implications for Applied Analysis,' in Clustering and Classification,P. Arabie, L.J. Hubert, and G. De Soete (Eds), World Scientific, Singapore, (1996), pp. 341-375
  12. Mirkin, B., Mathematical Classification and Clustering, Kluwer Academic Publishers, 1996
  13. Mojena, R., 'Hierarchical Grouping Methods and Stopping Rules:An Evaluation,' Computer Journal, Vol.20(1977), pp.359-363 https://doi.org/10.1093/comjnl/20.4.359
  14. Nakamura, N. and N. Kehtarnavaz, 'Determining Number of Clusters and PrototypeLocations Via Multi-scale Clustering,' Pattern Recognition Letters, Vol.19(1998), pp.1265-1283 https://doi.org/10.1016/S0167-8655(98)00099-3
  15. Peck, R., L. Fisher, L. and J.V. Ness, 'Approximate Confidence Intervals for the Number of Clusters,' Journal of the American Statistical Association, Vol.84 No.405 (1989), pp.184-191 https://doi.org/10.2307/2289862