• Title/Summary/Keyword: XML 인덱스

Search Result 117, Processing Time 0.024 seconds

FiST: XML Document Filtering by Sequencing Twig Patterns (가지형 패턴의 시퀀스화를 이용한 XML 문서 필터링)

  • Kwon Joon-Ho;Rao Praveen;Moon Bong-Ki;Lee Suk-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.4
    • /
    • pp.423-436
    • /
    • 2006
  • In recent years, publish-subscribe (pub-sub) systems based on XML document filtering have received much attention. In a typical pub-sub system, subscribing users specify their interest in profiles expressed in the XPath language, and each new content is matched against the user profiles so that the content is delivered only to the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the system is critical to the success of pub-sub services. In this paper, we propose a novel scalable filtering system called FiST(Filtering by Sequencing Twigs) that transforms twig patterns expressed in XPath and XML documents into sequences using Prufer's method. As a consequence, instead of matching linear paths of twig patterns individually and merging the matches during post-processing, FiST performs holistic matching of twig patterns with incoming documents. FiST organizes the sequences into a dynamic hash based index for efficient filtering. We demonstrate that our holistic matching approach yields lower filtering cost and good scalability under various situations.

A Search Method for Components Based-on XML Component Specification (XML 컴포넌트 명세서 기반의 컴포넌트 검색 기법)

  • Park, Seo-Young;Shin, Yoeng-Gil;Wu, Chi-Su
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.2
    • /
    • pp.180-192
    • /
    • 2000
  • Recently, the component technology has played a main role in software reuse. It has changed the code-based reuse into the binary code-based reuse, because components can be easily combined into the developing software only through component interfaces. Since components and component users have increased rapidly, it is necessary that the users of components search for the most proper components for HTML among the enormous number of components on the Internet. It is desirable to use web-document-typed specifications for component specifications on the Internet. This paper proposes to use XML component specifications instead of HTML specifications, because it is impossible to represent the semantics of contexts using HTML. We also propose the XML context-search method based on XML component specifications. Component users use the contexts for the component properties and the terms for the values of component properties in their queries for searching components. The index structure for the context-based search method is the inverted file indexing structure of term-context-component specification. Not only an XML context-based search method but also a variety of search methods based on context-based search, such as keyword, search, faceted search, and browsing search method, are provided for the convenience of users. We use the 3-layer architecture, with an interface layer, a query expansion layer, and an XML search engine layer, of the search engine for the efficient index scheme. In this paper, an XML DTD(Document Type Definition) for component specification is defined and the experimental results of comparing search performance of XML with HTML are discussed.

  • PDF

Multimedia data search method for User (사용자 중심의 멀티미디어 데이터 검색 방안)

  • 정성주;박희숙;김성록;조우현
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2003.05b
    • /
    • pp.202-205
    • /
    • 2003
  • 인터넷의 보급으로 사용자는 일반문서에 대한 검색뿐만 아니라 멀티미디어 데이터에 대한 검색도 할 수 있게 되었다. 기존 포탈사이트의 검색은 주로 html 문서위주로 제공되고 있으며, 검색방법은 html 문서의 단어, 구를 이용하는 검색방식을 주로 사용하고 있다. 멀티미디어 데이터에 대한 검색 또한 데이터 제공자(Data provider)가 제시한 검색어구를 바탕으로 이루어진다. 본 논문에서는 사용자(User)에게 관심이 있는 멀티미디어 데이터 부가정보를 인덱스로 유지하고 구성하여 제공하는 XML 트리 형식의 검색 시스템을 제안한다.

  • PDF

Branching Path Query Processing for XML Documents using the Prefix Match Join (프리픽스 매취 조인을 이용한 XML 문서에 대한 분기 경로 질의 처리)

  • Park Young-Ho;Han Wook-Shin;Whang Kyu-Young
    • Journal of KIISE:Databases
    • /
    • v.32 no.4
    • /
    • pp.452-472
    • /
    • 2005
  • We propose XIR-Branching, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval(IR) techniques and novel instance join techniques. A partial match query is defined as the one having the descendent-or-self axis '//' in its path expression. In its general form, a partial match query has branch predicates forming branching paths. The objective of XIR-Branching is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR-Branching has its basis on the conventional schema-level methods using relational tables(e.g., XRel, XParent, XIR-Linear[21]) and significantly improves their efficiency and scalability using two techniques: an inverted index technique and a novel prefix match join. The former supports linear path expressions as the method used in XIR-Linear[21]. The latter supports branching path expressions, and allows for finding the result nodes more efficiently than containment joins used in the conventional methods. XIR-Linear shows the efficiency for linear path expressions, but does not handle branching path expressions. However, we have to handle branching path expressions for querying more in detail and general. The paper presents a novel method for handling branching path expressions. XIR-Branching reduces a candidate set for a query as a schema-level method and then, efficiently finds a final result set by using a novel prefix match join as an instance-level method. We compare the efficiency and scalability of XIR-Branching with those of XRel and XParent using XML documents crawled from the Internet. The results show that XIR-Branching is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions, and by several factors for branching path expressions.

A Study on Layout Extraction from Internet Documents Through Xpath (Xpath에 의한 인터넷 문서의 레이아웃 추출 방법에 관한 연구)

  • Han Kwang-Rok;Sun Bok-Keun
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.4
    • /
    • pp.237-244
    • /
    • 2005
  • Currently most Internet documents including news data are made based on predefined templates, but templates are usually formed only for main data and are not helpful for information retrieval against indexes, advertisements, header data etc. Templates in such forms are not appropriate when Internet documents are used as data for information retrieval. In order to process Internet documents in various areas of information retrieval, it is necessary to detect additional information such as advertisements and page indexes. Thus this study proposes a method of detecting the layout of web pages by identifying the characteristics and structure of block tags that affect the layout of web pages and calculating distances between web pages. As a result of experiment, we can successfully extract 640 documents from 1000 samples and obtain 64% recall rate. This method is purposed to reduce the cost of web document automatic processing and improve its efficiency through applying the method to document preprocessing of information retrieval such as data extraction and document summarization.

  • PDF

Resource Scheduling Framework based on Resource Parameter Graph (자원인자 기반 스케줄링 프레임워크)

  • 배재환;권성호;김덕수;이강우
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.8 no.3
    • /
    • pp.19-31
    • /
    • 2003
  • For the implementation of large scale GRID systems, the performance scalability in resource scheduling is clearly to be addressed. In this research, we analyzed existing scheduling frameworks from the viewpoint of the performance and propose a novel resource scheduling framework called resource parameter based scheduling. Proposed scheduling framework consists of three components. The first is the resource parameter graph that expresses resource information via inter-resource relation and the composition base on the hierarchical structure. The second component is the resource parameter tree to be used for the implementation of the memory-based index of resource information. The third component is the resource information repository which mostly consists of static data to be used for the general resource information services. This paper presents the details of the framework.

  • PDF

An Indexing Scheme for Efficient Retrieval and Update of Structured Documents Based on GDIT (GDIT를 기반으로 한 구조적 문서의 효율적 검색과 갱신을 위한 인덱스 설계)

  • Kim, Young-Ja;Bae, Jong-Min
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.2
    • /
    • pp.411-425
    • /
    • 2000
  • Information retrieval systems for structured documents which are written in SGML or XML support partial retrieval of document. In order to efficiently process queries based on document structures, low memory overhead for indexing, quick response time for queries, supports to powerful types of user queries, and minimal updates of index structure for document updates are required. This paper suggests the Global Document Instance Tree(GDIT) and proposes an effective indexing scheme and query processing algorithms based on the GDIT. The indexing scheme keeps up indexing and retrieval effciency and also guarantees minimal updates of the index structure when document structures are updated.

  • PDF

Context-based Incremental Preference Analysis Method in Ubiquitous Commerce (유비쿼터스 상거래 환경의 컨텍스트 기반 점진적 선호 분석 기법)

  • Ku Mi Sug;Hwang Jeong Hee;Choi Nam Kyu;Jung Doo Young;Ryu Keun Ho
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1417-1426
    • /
    • 2004
  • As Ubiquitous commerce is coming personalization service is getting interested. And also the recommendation method which offers useful information to customer becomes more important. However, most of them depend on specific method and are restricted to the E-commerce. For applying these recommendation methods into U-commerce, first it is necessary that the extended context modeling and systematic connection of the methods to complement strength and weakness of recommendation methods in each commercial transaction. Therefore, we propose a mod-eling technique of context information related to personal activation in commercial transaction and show incremental preference analysis method, using preference tree which is closely connected to recommendation method in each step. And also, we use an XML indexing technique to effi-ciently extract the recommendation information from a preference tree.

Design and Implementation of a Structure and Content-based Multimedia Document Retrieval System (구조 및 내용-기반 멀티미디어 문서검색 시스템의 설계 및 구현)

  • Jin, Du-Seok;Lee, Jeong-Jae;Chang, Jae-Woo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.11
    • /
    • pp.3341-3355
    • /
    • 2000
  • 최근 멀티미디어 문서의 개수가 기하 급수적으로 증가함에 따라, 사용자가 요구하는 멀티미디어 문서를 보다 효과적으로 저장 및 검색할 수 있는 멀티미디어 문서 검색 시스템을 개발하는 것이 필요하다. 본 논문에서는 XML로 정의된 문서를 문서 구조 및 이미지 내용을 기반으로 보다 효율적으로 검색할 수 있는 시스템을 설계 및 구현한다. 효율적인 구조-기반 검색을 지원하기 위해서 구조 인덱스를 o2store 저장 시스템을 사용하여 구현한다. 아울러 내용-기반 검색을 지원하기 위해서 X-트리에 기반한 효율적인 고차원 색인구조를 구현한다. 마지막으로 구현된 멀티미디어 문서검색 시스템을 검색시간, 저장시간, 부가 저장 공간의 관점에서 성능평가를 수행한다.

  • PDF

Development of a Metadata Tool for LIO Learning Object Model on the Distributed Environments (분산 환경에서의 LIO 학습 객체 모델을 위한 메타데이터 도구 개발)

  • Shin, Haeng-Ja;Park, Keuyng-Hwan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.11b
    • /
    • pp.697-700
    • /
    • 2003
  • 메타데이터는 데이터의 데이터로서 컨텐츠 모델을 구성하는 각 요소들의 속성을 기술하는 방법으로 컨텐츠에 대한 정보를 제공한다. 이러한 메타데이터는 컨텐츠를 더 쉽게 이용하거나 검색할 수 있도록 인덱스화된 레이블로 기술되는데, 정확하게 기술하기 위해 메타데이터 요소가 정밀하여야 한다. 본 논문에서는 다른 시스템들 간에 재사용 가능한 LIO 학습 객체 모델의 메타데이터를 e-learning 시스템의 메타데이터 표준화 기술인 LOM 을 기반으로 가상교육 시스템에서 필수적인 메타 데이터를 생성, 갱신, 저장하는 도구를 설계 및 개발하고 분산 컴퓨팅 환경에서 효과적으로 활용하도록 XML 문서로 바인딩 하였다.

  • PDF