Proceedings of the Society of Korea Industrial and System Engineering Conference (한국산업경영시스템학회:학술대회논문집)
- 2002.05a
- /
- Pages.119-124
- /
- 2002
Development of Similarity-Based Document Clustering System
유사성 계수에 의한 문서 클러스터링 시스템 개발
- Published : 2002.05.01
Abstract
Clustering of data is of a great interest in many data mining applications. In the field of document clustering, a document is represented as a data in a high dimensional space. Therefore, the document clustering can be accomplished with a general data clustering techniques. In this paper, we introduce a document clustering system based on similarity among documents. The developed system consists of three functions: 1) gatherings documents utilizing a search agent; 2) determining similarity coefficients between any two documents from term frequencies; 3) clustering documents with similarity coefficients. Especially, the document clustering is accomplished by a hybrid algorithm utilizing genetic and K-Means methods.
Keywords