한국산업경영시스템학회:학술대회논문집 (Proceedings of the Society of Korea Industrial and System Engineering Conference)
- 한국산업경영시스템학회 2002년도 춘계학술대회
- /
- Pages.119-124
- /
- 2002
유사성 계수에 의한 문서 클러스터링 시스템 개발
Development of Similarity-Based Document Clustering System
초록
Clustering of data is of a great interest in many data mining applications. In the field of document clustering, a document is represented as a data in a high dimensional space. Therefore, the document clustering can be accomplished with a general data clustering techniques. In this paper, we introduce a document clustering system based on similarity among documents. The developed system consists of three functions: 1) gatherings documents utilizing a search agent; 2) determining similarity coefficients between any two documents from term frequencies; 3) clustering documents with similarity coefficients. Especially, the document clustering is accomplished by a hybrid algorithm utilizing genetic and K-Means methods.
키워드