Journal of the Korean Data and Information Science Society
- Volume 14 Issue 4
- /
- Pages.853-861
- /
- 2003
- /
- 1598-9402(pISSN)
An Improved K-means Document Clustering using Concept Vectors
Abstract
An improved K-means document clustering method has been presented, where a concept vector is manipulated for each cluster on the basis of cosine similarity of text documents. The concept vectors are unit vectors that have been normalized on the n-dimensional sphere. Because the standard K-means method is sensitive to initial starting condition, our improvement focused on starting condition for estimating the modes of a distribution. The improved K-means clustering algorithm has been applied to a set of text documents, called Classic3, to test and prove efficiency and correctness of clustering result, and showed 7% improvements in its worst case.