Proceedings of the Korea Information Processing Society Conference (한국정보처리학회:학술대회논문집)
- 2004.05a
- /
- Pages.673-676
- /
- 2004
- /
- 2005-0011(pISSN)
- /
- 2671-7298(eISSN)
A Study on Cluster Hierarchy Depth in Hierarchical Clustering
계층적 클러스터링에서 분류 계층 깊이에 관한 연구
- Jin, Hai-Nan (Dept. of Computer Engineering, Chonbuk National University) ;
- Lee, Shin-won (Dept. of Computer Engineering, Chonbuk National University) ;
- An, Dong-Un (Dept. of Computer Engineering, Chonbuk National University) ;
- Chung, Sung-Jong (Dept. of Computer Engineering, Chonbuk National University)
- Published : 2004.05.14
Abstract
Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering provide a view of the data at different levels, making the large document collections are adapted to people's instinctive and interested requires. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. Think of the factor of simpleness, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system [10] with hierarchical structure based on document clustering using K-means algorithm to "get the best of both worlds". The performance of CONDOR system is compared with the VIVISIMO hierarchical clustering system [9], and performance is analyzed on feature words selection of specific topics and the optimum hierarchy depth.
Keywords