Proceedings of the CALSEC Conference (한국전자거래학회:학술대회논문집)
- 2005.03a
- /
- Pages.189-194
- /
- 2005
Clustering Techniques for XML Data Using Data Mining
- Kim, Chun-Sik (Department of Digital Media Engineering, Anyang University)
- Published : 2005.03.23
Abstract
Many studies have been conducted to classify documents, and to extract useful information from documents. However, most search engines have used a keyword based method. This method does not search and classify documents effectively. This paper identifies structures of XML document based on the fact that the XML document has a structural document using a set theory, which is suggested by Broder, and attempts a test for clustering XML document by applying a k-nearest neighbor algorithm. In addition, this study investigates the effectiveness of the clustering technique for large scaled data, compared to the existing bitmap method, by applying a test, which reveals a difference between the clause based documents instead of using a type of vector, in order to measure the similarity between the existing methods.
Keywords