A Comparative Study on Clustering Methods for Grouping Related Tags

연관 태그의 군집화를 위한 클러스터링 기법 비교 연구

  • 한승희 (서울여자대학교 사회과학대학 문헌정보학과)
  • Published : 2009.09.30


In this study, clustering methods with related tags were discussed for improving search and exploration in the tag space. The experiments were performed on 10 Delicious tags and the strongly-related tags extracted by each 300 documents, and hierarchical and non-hierarchical clustering methods were carried out based on the tag co-occurrences. To evaluate the experimental results, cluster relevance was measured. Results showed that Ward's method with cosine coefficient, which shows good performance to term clustering, was best performed with consistent clustering tendency. Furthermore, it was analyzed that cluster membership among related tags is based on users' tagging purposes or interest and can disambiguate word sense. Therefore, tag clusters would be helpful for improving search and exploration in the tag space.


  1. 한승희. 2004. "클러스터링 기법을 이용한 개별문서의 지식구조 자동 생성에 관한 연구". 박사학위논문, 연세대학교 대학원 문헌정보학과.
  2. Sneath, P. H. A., and Sokal, R. R. 1973. Numerical Taxonomy. SF: Freeman.
  3. Tombros, Anastasios. 2002. The Effects of Query-based Hierarchical Clustering of Documents for Information Retrieval. Ph.D. diss., Department of Computer Science, Cornell University.
  4. Voorhees, Ellen M. 1985. “The cluster hypothesis revisited." In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 188-196.
  5. 정충영, 최이규. 2009. "SPSSWIN을 이용한 통계분석". 제5판. 서울: 무역경영사.
  6. Hammond, Tony, Hannay, Timo, Lund, Ben, and Scott, Joanna. 2005. “Social bookmarking tools(I)." D-Lib Magazine, 11(4). [online]. [cited 2009.8.7]. .
  7. Xu, Rui, and Wunsch II, Donald C. 2009. Clustering. NJ: IEEE Press.
  8. 정영미. 2005. "정보검색연구". 서울: 구미무역(주)출판부.
  9. Delicious. [online]. .
  10. Willet, Peter. 1988. “Recent trends in hierarchic document clustering: a critical review." Information Processing and Management, 24(5): 577-597.
  11. 이시화, 이만형, 황대훈. Web2.0 환경에서의 효율적인 이미지 검색을 위한 태그 클러스터링 시스템의 설계 및 구현. "멀티미디어학회 논문지", 11(8): 169-178.
  12. Weeds, J. E. 2003. Measures and Applications of Lexical Distributional Similarity. Ph. D. diss., University of Sussex.
  13. 이재윤. 2007. 분포 유사도를 이용한 문헌클러스터링의 성능향상에 대한 연구. "정보관리학회지",24(4): 267-283.
  14. Lee, Lillan. 1999. “Measures of distributional similarity." In Proceedings of 37th Annual Meeting of the Association for Computational Linguistics, 25-32.
  15. Milligan, G. W., Soon, S. C., and Sokol, L. M. 1983. “The effect of cluster size, dimensionality and the number of clusters on recovery of true cluster structure." IEEE Transactions on Patterns Analysis and Machine Intelligence, 5(1): 40-47.
  16. 이재윤, 정도헌. 2008. 폭소노미 태그 사용 패턴 분석 통제어휘 및 비통제어휘와의 비교. "제15회 한국정보관리학회 학술대회 논문집", 21-26.
  17. Shepitsen, Andriy, Janathan, Gemmell, Bamshad, Mobasher, and Robin, Burke. 2008. “Personalized recommendation in social tagging systems using hierarchical clustering." In Proceedings of the 2008 ACM conference on Recommender systems, 259-266.
  18. Ward, Joe H. 1963. “Hierarchical grouping to optimize an objective function." Journal of the American Statistical Association, 58: 236-244.
  19. Mathes, Adam. 2004. Folksonomies – Cooperative Classification and Communication Through Shared Metadata. [online]. [cited 2008.7.31]. .
  20. Begelman, Grigory, Keller, Phillip, and Smadja, Frank. 2006. Automated tag clustering: Improving search and exploration in the tag space. [online]. [cited 2009.7.13]. .
  21. Candan, K. Selçuk, Caroz, Di, Luigi, and Sapino, Luisa, Maria. 2008. “Creating tag hierarchies for effective navigation in social media." In Proceeding of the 2008 ACM Workshop on Search in Social Media, 75-82.
  22. Fichter, Darlene 2006. “Intranet applications for tagging and folksonomies." Online, 30(3): 43-45.
  23. Simpson, Edwin. 2008. Clustering Tags in Enterprise and Web Folksonomies. [online]. [cited 2009.7.13]. .
  24. Yi, Kwan. 2009. “Mining semantically similar tags from delicious." Journal of the Korean Society for Information Science, 26(2): 127-147.
  25. 이정미. 2007. 폭소노미의 개념적 접근과 웹 정보 서비스에의 적용. "한국비블리아학회지", 18(2):141-159.
  26. Strehl, Alexander, Joydeep, Ghosh, and Raymond, Mooney. 2000. “Impact of similarity measures on web-page clustering." In Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search(AAAI 2000), 58-64.
  27. 박병재, 우종우. 2008. 연관 태그의 군집 알고리즘의 설계 및 구현. "한국IT서비스학회지", 7(4):199-208.
  28. Dagan, Ido, Lee, Lillian, and Pereira, Fernando. 1999. “Similarity-based models of cooccurrence probabilities." Machine Learning, 34(1-3): 43-69.
  29. 유사라. 1999. "정보학연구와 분석방법론". 서울: 나남출판.
  30. Schrammel, Johann, Leitner, Michael, and Tscheligi, Manfred. 2009. “Semantically structured tag clouds: An empirical evaluation of clustered presentation approaches." In Proceedings of the 27th international conference on Human factors in computing systems, 2037-2040.
  31. 이순규, 김정훈, 이지형. 2008. 트랙백을 이용한 연관태그 클러스터링. "한국지능시스템학회 추계학술대회 학술발표논문집", 18(2): 125-128.
  32. Jardine, N., and Sibson, R. 1968. “The construction of hierarchic and non-hierarchic classifications." The Computer Journal, 11(2): 177-184.
  33. Ding, Y., Chowdhury, G. G., and Foo, S. 2001. “Bibliometric cartography of information retrieval research by using co-word analysis." Information Processing and Management, 37: 817-842.