DOI QR코드

DOI QR Code

군집분석 비교 및 한우 관능평가데이터 군집화

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls

  • 김재희 (덕성여자대학교 정보 통계학과) ;
  • 고윤실 (덕성여자대학교 정보 통계학과)
  • Kim, Jae-Hee (Department of Statistics, Duksung Women's University) ;
  • Ko, Yoon-Sil (Department of Statistics, Duksung Women's University)
  • 발행 : 2009.08.31

초록

자발적인 군집을 유도하는 다변량 통계기법으로 널리 사용되는 군집분석은 데이터에 기반한 탐색적 방법으로 쓰이며 군집원칙에 따라 여러 가지 방법이 제안되어 왔다. 또한 군집화된 결과에 대하여 유효성을 측정하는 측도도 다양한방법이 개발되었다. 본 연구에서는 계층적 군집분석 방법으로 최장연결법과 Ward의 방법, 비계층적 군집분석 방법으로 K-평균법 그리고 확률분포정보를 활용한 모형기반 군집분석방법을 이용하여 모의실험으로 군집분석을 실시하고 군집유효성 측도로는 연결성, Dunn 지수, 실루엣을 구하여 각 군집방법에 대해 유효성을 비교한다. 또한, 한우 관능평가 데이터에 군집분석을 적용하여 최적의 군집 상황을 구하고자 한다.

Cluster analysis is the automated search for groups of related observations in a data set. To group the observations into clusters many techniques has been proposed, and a variety measures aimed at validating the results of a cluster analysis have been suggested. In this paper, we compare complete linkage, Ward's method, K-means and model-based clustering and compute validity measures such as connectivity, Dunn Index and silhouette with simulated data from multivariate distributions. We also select a clustering algorithm and determine the number of clusters of Korean consumers based on Korean consumers' palatability scores for Hanwoo bull in BBQ cooking method.

키워드

참고문헌

  1. Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and Non-Gaussian clustering, Biometrics, 49, 803-821 https://doi.org/10.2307/2532201
  2. Brock, G., Pihur, V., Datta, S. and Datta, S. (2008). clValid: An R package for cluster validation, Journal of Statistical Software, 25, 1-21
  3. Datta, S. and Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics, 19, 459-466 https://doi.org/10.1093/bioinformatics/btg025
  4. Dunn (1974). Well-separated clusters and optimal fuzzy partitions, Journal of Cybernetics, 4, 95-104 https://doi.org/10.1080/01969727408546059
  5. Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method-answers via model-based cluster analysis, Computation Journal, 41, 578-588
  6. Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association, 97, 611-631 https://doi.org/10.1198/016214502760047131
  7. Handl, J., Knowles, J. and Kell, D. B. (2005). Computational cluster validation in post-genomic data analysis, Bioinformatics, 21, 3201-3212 https://doi.org/10.1093/bioinformatics/bti517
  8. Hartigan, J. A. and Wong, M. A. (1979). K-means clustering algorithm, Applied Statistics, 28, 100-108 https://doi.org/10.2307/2346830
  9. Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York
  10. Pollard, D. (1981). Strong consistency of K-means clustering, Annals of Statistics, 9, 135-140 https://doi.org/10.1214/aos/1176345339
  11. Pollard, D. (1982). Central limit theorems for K-means clustering, Annals of Statistics, 10, 919-926
  12. Rousseeuw, P. J. (1987). Silhouettes: Graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65 https://doi.org/10.1016/0377-0427(87)90125-7
  13. Scott, A. J. and Symons, M. (1971). Clustering methods based on likelihood ratio criteria, Biometrics, 27, 387-397 https://doi.org/10.2307/2529003
  14. Ward, Jr., J. H. (1963). Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, 58, 236-244 https://doi.org/10.2307/2282967
  15. Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E. and Ruzzo, W. L. (2001). Model-based clustering and data transformations for gene expression data, Bioinformatics, 17, 977-987 https://doi.org/10.1093/bioinformatics/17.10.977

피인용 문헌

  1. Gene Screening and Clustering of Yeast Microarray Gene Expression Data vol.24, pp.6, 2011, https://doi.org/10.5351/KJAS.2011.24.6.1077