DOI QR코드

DOI QR Code

A Comparative Study of Determining the Number of Clusters with a Method Proposed

군집수의 예측에 관한 방법의 제안 및 비교

  • Chae, Seong-San (Department of Information and Statistics, Daejeon University) ;
  • Lim, Nam-Kyoo (Department of Information and Statistics, Daejeon University)
  • 채성산 (대전대학교 정보통계학과) ;
  • 임남규 (대전대학교 정보통계학과)
  • Published : 2005.07.01

Abstract

A method of determining the number of clusters is proposed based on some asymptotic results on the Rand's(1971} $C_k$, k = 2, 3, . . ., N - 1, statistic. Simulation is conducted to compare the proposed method with Chae and Warde(1991), and Huh and Lee(2004).

군집방법의 비교시 사용되는 Rand(1971)의 $C_k$, k = 2, 3, . . ., N-1 통계량에 대한 점근 결과를 이용하여 자료에 존재하는 군집수를 예측하는 방법을 제안하였다. 제안된 방법과 $C_k$ 통계량의 변화 형태에 따라 군집수를 예측하는 Chae와 Warde(1991)와 허명회와 이용구(2004)의 방법을 비교하기 위하여 모의실험을 하였다. 현실적인 문제를 고려하여 실제자료에 대해서는 계속적인 재표본의 형성을 위하여 붓스트랩방법을 사용하였다.

Keywords

References

  1. 이석훈, 박래현, 김응환 (1995). 쿨롱네트워크를 이용한 집락분석, <응용통계연구>, 8 , 39-50
  2. 채성산 (1997). 재표본추출 및 검정을 통한 집락수의 예측, 대전대, <자연과학>, 8, 73-88
  3. 허명회, 이용구 (2004). K-평균 군집화의 재현성 평가 및 응용, <응용통계연구>, 17 , 135-144
  4. Becker, H. W., and Riodan, J. (1934). The arithmetic of bell and Stirling numbers, American Journal of Mathematics, 70, 385-394 https://doi.org/10.2307/2372336
  5. Chae, S. S., DuBien, J. L. and Warde W. D. (2004). A method of predicting the number of clusters using asymptotic results on $C_k$, Computational Statistics & Data Analysis, (in revise)
  6. Chae, S. S. and Warde W. D. (1991). A method to predict the number of clusters, Journal of the Korean Statistical Society, 20, 162-176
  7. Chae, S. S. and Warde W. D. (2005). Effect of using principal coordinates and principal components on retrieval of clusters, Computational Statistics & Data Analysis, (in press)
  8. DuBien J. L. and Warde, W. D. (1987). A comparison of agglomerative clustering method with respect to noise, Communication in Statistics, Theory and Method, 16, 1433-1460 https://doi.org/10.1080/03610928708829447
  9. DuBien, J. L., Warde, W. D. and Chae, S. S. (2004). Moments of Rand's C statistic in cluster analysis, Statistics & Probability Letters, 69, 243-252 https://doi.org/10.1016/j.spl.2004.06.009
  10. Fowlkes, E. B. and Mallows, C. L. (1983). A method for comparing two hierarchical clusterings, Journal of American Statistical Association, 78, 553-569 https://doi.org/10.2307/2288117
  11. Idrissi, A. (2000). Contribution it l'Unification de Criteres d'Association pour variables qualitatives, Ph.D., Paris: Universite Pierre et Marie Curie
  12. Lance, G. N. and Williams, W. T. (1967). A general theory of classificatory sorting strategies. 1. Hierarchical systems, The Computer Journal, 9, 373-380 https://doi.org/10.1093/comjnl/9.4.373
  13. Lengyel, T. (1984). On a recurrence involving Stirling numbers, European Journal of Combinatorics, 5, 313-321 https://doi.org/10.1016/S0195-6698(84)80035-9
  14. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods, Journal of American Statistical Association, 66, 846-850 https://doi.org/10.2307/2284239