한국전자거래학회:학술대회논문집 (Proceedings of the CALSEC Conference)
- 한국전자거래학회 2005년도 e-Biz World Conference 2005
- /
- Pages.189-194
- /
- 2005
Clustering Techniques for XML Data Using Data Mining
초록
Many studies have been conducted to classify documents, and to extract useful information from documents. However, most search engines have used a keyword based method. This method does not search and classify documents effectively. This paper identifies structures of XML document based on the fact that the XML document has a structural document using a set theory, which is suggested by Broder, and attempts a test for clustering XML document by applying a k-nearest neighbor algorithm. In addition, this study investigates the effectiveness of the clustering technique for large scaled data, compared to the existing bitmap method, by applying a test, which reveals a difference between the clause based documents instead of using a type of vector, in order to measure the similarity between the existing methods.
키워드