한국정보과학회:학술대회논문집 (Proceedings of the Korean Information Science Society Conference)
- 한국정보과학회 2006년도 가을 학술발표논문집 Vol.33 No.2 (B)
- /
- Pages.233-237
- /
- 2006
- /
- 1598-5164(pISSN)
가변적 클러스터 개수에 대한 문서군집화 평가방법
The Evaluation Measure of Text Clustering for the Variable Number of Clusters
- Jo, Tae-Ho (School of Information Technology and Engineering, University of Ottawa)
- 발행 : 2006.10.20
초록
This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.
키워드