• Title/Summary/Keyword: XML 클러스터링

Search Result 36, Processing Time 0.026 seconds

XML Document Clustering Based on Sequential Pattern (순차패턴에 기반한 XML 문서 클러스터링)

  • Hwang, Jeong-Hee;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.10D no.7
    • /
    • pp.1093-1102
    • /
    • 2003
  • As the use of internet is growing, the amount of information is increasing rapidly and XML that is a standard of the web data has the property of flexibility of data representation. Therefore electronic document systems based on web, such as EDMS (Electronic Document Management System), ebXML (e-business extensible Markup Language), have been adopting XML as the method for exchange and standard of documents. So research on the method which can manage and search structural XML documents in an effective wav is required. In this paper we propose the clustering method based on structural similarity among the many XML documents, using typical structures extracted from each document by sequential pattern mining in pre-clustering process. The proposed algorithm improves the accuracy of clustering by computing cost considering cluster cohesion and inter-cluster similarity.

k-Bitmap Clustering Method for XML Data based on Relational DBMS (관계형 DBMS 기반의 XML 데이터를 위한 k-비트맵 클러스터링 기법)

  • Lee, Bum-Suk;Hwang, Byung-Yeon
    • The KIPS Transactions:PartD
    • /
    • v.16D no.6
    • /
    • pp.845-850
    • /
    • 2009
  • Use of XML data has been increased with growth of Web 2.0 environment. XML is recognized its advantages by using based technology of RSS or ATOM for transferring information from blogs and news feed. Bitmap clustering is a method to keep index in main memory based on Relational DBMS, and which performed better than the other XML indexing methods during the evaluation. Existing method generates too many clusters, and it causes deterioration of result of searching quality. This paper proposes k-Bitmap clustering method that can generate user defined k clusters to solve above-mentioned problem. The proposed method also keeps additional inverted index for searching excluded terms from representative bits of k-Bitmap. We performed evaluation and the result shows that the users can control the number of clusters. Also our method has high recall value in single term search, and it guarantees the searching result includes all related documents for its query with keeping two indices.

Clustering of MPEG-7 Data for Efficient Management (MPEG-7 데이터의 효율적인 관리를 위한 클러스터링 방법)

  • Ahn, Byeong-Tae;Kang, Byeong-Shoo;Diao, Jianhua;Kang, Hyun-Syug
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.1
    • /
    • pp.1-12
    • /
    • 2007
  • To use multimedia data in restricted resources of mobile environment, any management method of MPEG-7 documents is needed. At this time, some XML clustering methods can be used. But, to improve the performance efficiency better, a new clustering method which uses the characteristics of MPEG-7 documents is needed. A new clustering improved query processing speed at multimedia search and it possible document storage about various application suitably. In this paper, we suggest a new clustering method of MPEG-7 documents for effective management in multimedia data of large capacity, which uses some semantic relationships among elements of MPEG-7 documents. And also we compared it to the existed clustering methods.

  • PDF

Design and Implementation of MPEG-7 Document Management System Based on Native Embedded XML Database (순수 내장형 XML 데이터베이스 기반의 MPEG-7 문서 관리 시스템의 설계 및 구현)

  • Ahn, Byeong-Tae;Kang, Byeong-Shoo;Diao, Jianhua;Kang, Hyun-Syug
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.2
    • /
    • pp.170-178
    • /
    • 2007
  • In restricted resources based on mobile environment, we can use an embedded database technology for management of MPEG-7 data. At this time, some XML clustering methods can be used. But, to improve the performance efficiency better, a new clustering method is need to store effective MPEG-7 document. In this paper, we have designed and implemented a MPEG-7 document management system to store MPEG-7 document effectively in mobile device such as PDA. The system used the 버클리 DB XML as a native embedded XML database system based on the clustering method of MPEG-7 data.

  • PDF

Path Similarity Calculation for Clustering of XML Documents (XML 문서 클러스터링을 위한 경로 유사도의 계산)

  • Lee, Bum-Suk;Hwang, Byung-Yeon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.11a
    • /
    • pp.325-328
    • /
    • 2006
  • 최근 DTD (Document Type Descriptor)를 포함하고 있지 않은 XML 문서의 사용이 증가하고 있다. 따라서 서로 다른 구조를 갖는 많은 양의 XML 문서를 관계형 DBMS에 저장하거나, 인덱스를 이용하여 매핑하는 등 보다 효율적으로 관리하기 위한 다양한 인덱싱 기법에 대한 연구가 진행되고 있다. 이러한 연구들 중 경로 비트맵 인덱싱 기법은 경로 구성 유사도를 기반으로 3차원 비트맵 클러스터를 생성하고, 클러스터 단위의 검색을 수행함으로서 빠른 검색 속도를 보여주었다. 그러나 이 기법은 비교하려는 두 경로 중 항상 짧은 경로가 기준 경로가 되는 한계점과, 같은 노드 구성을 가지는 두 경로에서도 노드의 위치에 따라 그 유사도가 크게 변하는 등의 여러 문제점을 가지고 있었다. 이러한 문제점을 해결하고, 정확한 클러스터링을 수행하기 위해서는 합리적인 경로 유사도 계산식이 필요하게 되었다. 본 논문에서는 기존 방법의 문제점을 해결하고, 보다 정확한 클러스터링을 수행할 수 있는 새로운 경로 유사도 계산식을 제안한다.

  • PDF

A Minimization Technique of XML Path Comparison Based on Signature (시그니쳐를 이용한 XML 경로 비교의 최소화 기법)

  • Jang, Kyung-Hoon;Hwang, Byung-Yeon
    • The Journal of Society for e-Business Studies
    • /
    • v.17 no.3
    • /
    • pp.61-72
    • /
    • 2012
  • Since XML allows users to define any tags, XML documents with various structures have been created. Accordingly, many studies on clustering and searching the XML documents based on the similarity of paths have been done in order to manage the documents efficiently. To retrieve XML documents having similar structures, the three-dimensional bitmap indexing technique uses a path as a unit when it creates an index. If a path structure is changed, the technique recognizes it as a new path. Thus, another technique to measure the similarity of paths was proposed. To compute the similarity between two paths, the technique compares every node of the paths. It causes unnecessary comparison of the nodes, which do not exist in common between the two paths. In this paper, we propose a new technique that minimizes the comparison using signatures and show the performance evaluation results of the technique. The comparison speed of proposed technique was 20 percent faster than the existing technique.

An Unsupervised Clustering Technique of XML Documents based on Function Transform and FFT (함수 변환과 FFT에 기반한 조정자가 없는 XML 문서 클러스터링 기법)

  • Lee, Ho-Suk
    • The KIPS Transactions:PartD
    • /
    • v.14D no.2
    • /
    • pp.169-180
    • /
    • 2007
  • This paper discusses a new unsupervised XML document clustering technique based on the function transform and FFT(Fast Fourier Transform). An XML document is transformed into a discrete function based on the hierarchical nesting structure of the elements. The discrete function is, then, transformed into vectors using FFT. The vectors of two documents are compared using a weighted Euclidean distance metric. If the comparison is lower than the pre specified threshold, the two documents are considered similar in the structure and are grouped into the same cluster. XML clustering can be useful for the storage and searching of XML documents. The experiments were conducted with 800 synthetic documents and also with 520 real documents. The experiments showed that the function transform and FFT are effective for the incremental and unsupervised clustering of XML documents similar in structure.

Sketch Map System using Clustering Method of XML Documents (XML 문서의 클러스터링 기법을 이용한 스케치맵 시스템)

  • Kim, Jung-Sook;Lee, Ya-Ri;Hong, Kyung-Pyo
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.19-30
    • /
    • 2009
  • The service that has recently come into the spotlight utilizes the map to first approach the map and then provide various mash-up formed results through the interface. This service can provide precise information to the users but the map is barely reusable. The sketch-map system of this paper, unlike the existing large map system, uses the method of presenting the specific spot and route in XML document and then clustering among sketch-maps. The map service system is designed to show the optimum route to the destination in a simple outline map. It is done by renovating the spot presented by the map into optimum contents. This service system, through the process of analyzing, splitting and clustering of the sketch-map's XML document input, creates a valid form of a sketch-map. It uses the LCS(Longest Common Subsequence) algorithm for splitting and merging sketch-map in the process of query. In addition, the simulation of this system's expected effects is provided. It shows how the maps that share information and knowledge assemble to form a large map and thus presents the system's ability and role as a new research portal.

MD-TIX: Multidimensional Type Inheritance Indexing for Efficient Execution of XML Queries (MD-TIX: XML 질의의 효율적 처리를 위한 다차원 타입상속 색인기법)

  • Lee, Jong-Hak
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.9
    • /
    • pp.1093-1105
    • /
    • 2007
  • This paper presents a multidimensional type inheritance indexing technique (MD-TIX) for XML databases. We use a multidimensional file organization as the index structure. In conventional XML database indexing techniques using one-dimensional index structures, they do not efficiently handle complex queries involving both nested elements and type inheritance hierarchies. We extend a two-dimensional type hierarchy indexing technique(2D-THI) for indexing the nested elements of XML databases. 2D-THI is an indexing scheme that deals with the problem of clustering elements in a two-dimensional domain space consisting of the key value domain and the type identifier domain for indexing a simple element in a type hierarchy. In our extended scheme, we handle the clustering of the index entries in a multidimensional domain space consisting of a key value domain and multiple type identifier domains that include one type identifier domain per type hierarchy on a path expression. This scheme efficiently supports queries that involve search conditions on the nested element represented by an extended path expression. An extended path expression is a path expression in which every type hierarchy on a path can be substituted by an individual type or a subtype hierarchy.

  • PDF

A Space Compression of Three-Dimensional Bitmap Indexing using Linked List (연결 리스트를 이용한 3차원 비트맵 인덱싱의 공간 축약)

  • Lee, Jae-Min;Hwang, Byung-Yeon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05c
    • /
    • pp.1519-1522
    • /
    • 2003
  • 기존의 웹 문서나 컨텐츠들의 표현적 한계를 극복하기 위한 방안으로 메타 데이터에 관한 다양한 연구가 수행되어졌고 그 결과의 산물중에 가장 대표적인 것으로 XML을 들 수 있다. XML은 문서의 내용뿐 아니라 구조까지도 기술할 수 있는 장점을 통해 향후 정보 교환에 핵심적인 역할을 할 것으로 기대되어지고 있으며 이에 따라 XML 문서를 효율적으로 저장하고 검색하기 위한 다양한 연구가 진행되고 있다. BitCube는 Bit-wise 연산이 가능한 3차원 비트맵 인덱싱을 사용하여 XML 문서들의 구조적 유사성에 따라 클러스터링하고 사용자의 질의에 대한 처리를 수행하는 인덱싱 기법으로 그것의 빠른 성능을 입증하였다. 그러나 BitCube의 클러스터링은 XML 문서의 경로에 중점을 둔 것이므로 클러스터와 경로가 담고 있는 실제 단어들간에는 연관성이 없으므로 3차원 비트맵 인덱스는 하나의 평면을 제외한 모든 평면이 굉장히 높은 공간 사용량을 갖는 회소행렬이 된다. 본 논문에서는 늘어나는 방대한 문서의 양으로 인한 시스템의 성능 저하를 막고 안정적인 성능을 유지할 수 있도록 기존 연산의 성능을 저하시키지 않으면서 공간을 최소화 할 수 있는 연결 리스트틀 설계하고 3차원 비트맵 인덱스를 연결 리스트로 재구성하는 방법을 제시한다.

  • PDF