이중 K-평균 군집화

Double K-Means Clustering

  • 허명회 (고려대학교 정경대학 통계학과)
  • 발행 : 2000.09.01

초록

K-평균 군집화(K-means clustering)는 비계층적 군집화 방법이 하나로서 큰 자료에서 개체 군집화에 효율적인 것으로 알려져 있다. 그러나 종종 비교적 균일한 대군집의 일부를 소군집에 떼어주는 오류를 범하기도 한다. 이 연구에서는 그러한 현상을 정확히 인지하고 이에 대한 대책으로서 ‘이중 K-평균 군집화(double K-means clustering)’방법을 제시한다. 또한 실증적 사례에 새 방법론을 적용해보고 토의한다.

In this study. the author proposes a nonhierarchical clustering method. called the "Double K-Means Clustering", which performs clustering of multivariate observations with the following algorithm: Step I: Carry out the ordinary K-means clmitering and obtain k temporary clusters with sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-I: Allocate the observation x, to the cluster F if it satisfies ..... where N is the total number of observations, for -i = 1, . ,N. $\bullet$ Step II-2: Update cluster sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-3: Repeat Steps II-I and II-2 until the change becomes negligible. The double K-means clustering is nearly "optimal" under the mixture of k multivariate normal distributions with the common covariance matrix. Also, it is nearly affine invariant, with the data-analytic implication that variable standardizations are not that required. The method is numerically demonstrated on Fisher's iris data.

키워드

참고문헌

  1. Cluster Analysis Everitt, B. S.
  2. Applied Multivariate Data Analysis Everitt, B. S.;Dunn, G.
  3. Clustering Algorithms Hartigan, J. A.
  4. A Study on the Partitioning Method for Cluster Analysis Jin, Seohoon
  5. Applied Multivariate Statistical Analysis Johnson, R. A.;Wichern, D. W.
  6. in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability v.1 Some methods for classification and analysis of multivariate observations MacQueen, J. B.
  7. Multivariate Analysis Mardia, K. V.;Kent, J. T.;Bibby, J. M.
  8. Multivariate Behavioral Research v.16 A review of Monte Carlo tests of cluster analysis Milligan, G. W.
  9. SAS/STAT User's Guide, Version 6 v.1 SAS Institute
  10. Applied Multivariate Techniques Sharma, S.