A Comparative Study of Determining the Number of Clusters with a Method Proposed

군집수의 예측에 관한 방법의 제안 및 비교

  • Chae, Seong-San (Department of Information and Statistics, Daejeon University) ;
  • Lim, Nam-Kyoo (Department of Information and Statistics, Daejeon University)
  • 채성산 (대전대학교 정보통계학과) ;
  • 임남규 (대전대학교 정보통계학과)
  • Published : 2005.07.01


A method of determining the number of clusters is proposed based on some asymptotic results on the Rand's(1971} $C_k$, k = 2, 3, . . ., N - 1, statistic. Simulation is conducted to compare the proposed method with Chae and Warde(1991), and Huh and Lee(2004).


  1. 이석훈, 박래현, 김응환 (1995). 쿨롱네트워크를 이용한 집락분석, <응용통계연구>, 8 , 39-50
  2. 채성산 (1997). 재표본추출 및 검정을 통한 집락수의 예측, 대전대, <자연과학>, 8, 73-88
  3. 허명회, 이용구 (2004). K-평균 군집화의 재현성 평가 및 응용, <응용통계연구>, 17 , 135-144
  4. Becker, H. W., and Riodan, J. (1934). The arithmetic of bell and Stirling numbers, American Journal of Mathematics, 70, 385-394
  5. Chae, S. S., DuBien, J. L. and Warde W. D. (2004). A method of predicting the number of clusters using asymptotic results on $C_k$, Computational Statistics & Data Analysis, (in revise)
  6. Chae, S. S. and Warde W. D. (1991). A method to predict the number of clusters, Journal of the Korean Statistical Society, 20, 162-176
  7. Chae, S. S. and Warde W. D. (2005). Effect of using principal coordinates and principal components on retrieval of clusters, Computational Statistics & Data Analysis, (in press)
  8. DuBien J. L. and Warde, W. D. (1987). A comparison of agglomerative clustering method with respect to noise, Communication in Statistics, Theory and Method, 16, 1433-1460
  9. DuBien, J. L., Warde, W. D. and Chae, S. S. (2004). Moments of Rand's C statistic in cluster analysis, Statistics & Probability Letters, 69, 243-252
  10. Fowlkes, E. B. and Mallows, C. L. (1983). A method for comparing two hierarchical clusterings, Journal of American Statistical Association, 78, 553-569
  11. Idrissi, A. (2000). Contribution it l'Unification de Criteres d'Association pour variables qualitatives, Ph.D., Paris: Universite Pierre et Marie Curie
  12. Lance, G. N. and Williams, W. T. (1967). A general theory of classificatory sorting strategies. 1. Hierarchical systems, The Computer Journal, 9, 373-380
  13. Lengyel, T. (1984). On a recurrence involving Stirling numbers, European Journal of Combinatorics, 5, 313-321
  14. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods, Journal of American Statistical Association, 66, 846-850