• Title/Summary/Keyword: Document Indexing

Search Result 105, Processing Time 0.023 seconds

Design of Efficient Storage Structure and Indexing Model of XML Document (XML 문서의 효율적인 저장구조와 색인 모델의 설계)

  • 김은정
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.301-303
    • /
    • 2002
  • XML 문서는 문서의 내용뿐 아니라, 의미를 가지는 구조 정보, 그리고 다양한 의미를 부과할 수 있는 링크 정보를 가지고 있다. 본 논문에서는 XML 문서를 보다 효율적으로 관리하기 위하여 DTD와 XML 문서에 대한 새로운 저장 방법과 이를 이용한 색인 모델을 제안한다. 이를 위해 하나의 XML 문서를 저장함에 있어, 엘리먼트 구조 정보, 애트리뷰트 정보, 링크 정보의 구성 방법을 제시하고, 이를 바탕으로 링크 정보론 이용한 내용 검색 색인 모델과 구조 검색, 애트리뷰트 검색을 위한 색인 모델을 설계한다. 또한 제안된 모델에서의 사용자들의 다양한 질의 유형의 처리 과정을 설명한다.

  • PDF

A Study on XML Document Repository Management System using ODMG Object Model (ODMG 객체 모델 기반의 XML 문서 저장 관리 시스템에 관한 연구)

  • 박준범;박경우;오수열
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.2
    • /
    • pp.16-23
    • /
    • 2003
  • In most organizations, the relational DBMS was used for store and manage XML documents due to the pre-established relational DBMS. But, relational DBMS has some problems. They has the possibilities of informational loses in the process of mapping the structure of XML documents to RDB, and they require expensive cost to reflect all XML properties. Thus, this paper was intended to design and implement XML document management system which utilize O2 DBMS-object oriented DBMS in order to overcome the existing problem and reflect all XML Properties. The XML document management system purposed in this thesis has multiple function, such as the library service function for XML documents-check-in/check-out, versioning, user access management-, dynamic indexing and retrieval, and publishing function using style sheet.

  • PDF

XML Repository System Using DBMS and IRS

  • Kang, Hyung-Il;Yoo, Jae-Soo;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • v.3 no.3
    • /
    • pp.6-14
    • /
    • 2007
  • In this paper, we design and implement a XML Repository System(XRS) that exploits the advantages of DBMSs and IRSs. Our scheme uses BRS to support full text indexing and content-based queries efficiently, and ORACLE to store XML documents, multimedia data, DTD and structure information. We design databases to manage XML documents including audio, video, images as well as text. We employ the non-composition model when storing XML documents into ORACLE. We represent structured information as ETID(Element Type Id), SORD(Sibling ORDer) and SSORD(Same Sibling ORDer). ETID is a unique value assigned to each element of DTD. SORD and SSORD represent an order information between sibling nodes and an order information among the sibling nodes with the same element respectively. In order to show superiority of our XRS, we perform various experiments in terms of the document loading time, document extracting time and contents retrieval time. It is shown through experiments that our XRS outperforms the existing XML document management systems. We also show that it supports various types of queries through performance experiments.

A Digital Library Prototype for Access to Diverse Collections (다양한 장서 접근을 위한 디지털 도서관의 프로토타입 구축)

  • Choi Won-Tae
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.2
    • /
    • pp.295-307
    • /
    • 1998
  • This article is an overview of the digital library project, indicating what roles Koreas diverse digital collections may play. Our digital library prototype has simple architecture, consisting of digital repositories, filters, indexing and searching, and clients. Digital repositories include various types of materials and databases. The role of filters is to recognize a format of a document collection and mark the structural components of each of its documents. We are using a database management system (ORACLE and ConText) supporting user-defined functions and access methods that allows us to easily incorporate new object analysis, structuring, and indexing technology into a repository. Clients can be considered browsers or viewers designed for different document data types, such as image, audio, video, SGML, PDF, and KORMARC. The combination of navigational tools supports a variety of approaches to identifying collections and browsing or searching for individual items. The search interface was implemented using HTML forms and the World Wide Web's CGI mechanism.

  • PDF

Data Model, Query Language, and Indexing Scheme for Structured Video Documents (구조화된 비디오 문서의 데이터 모델 및 질의어와 색인 기법)

  • 류은숙;이규철
    • Journal of Korea Multimedia Society
    • /
    • v.1 no.1
    • /
    • pp.1-17
    • /
    • 1998
  • Video information is an important component of multimedia systems such as Digital Library, World-Wide Web (WWW), and Video-On-Demand (VOD) service system. Video information has hierarchical document structure inherently, so it is named "structure video document" in this paper. This paper proposes a data model, a query language, and an indexing scheme for structured video documents in order to store, retrieve, and share video documents efficiently. In representing structured video documents, the object-oriented data modeling technique is used since the hierarchical structure information can be modeled as complex objects. We also define object types for the structure information. Our query language supports not only content-based retrieval, which means the queries based on the structure of video documents, and spatial/temporal relation for video documents. In order to perform structure queries efficiently, as well as to reduce the storage overhead of indices, an optimized inverted index structure is proposed.

  • PDF

Measurement of Document Similarity using Term/Term-pair Features and Neural Network (단어/단어쌍 특징과 신경망을 이용한 두 문서간 유사도 측정)

  • Kim Hye Sook;Park Sang Cheol;Kim Soo Hyung
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.12
    • /
    • pp.1660-1671
    • /
    • 2004
  • This paper proposes a method for measuring document similarity between two documents. One of the most significant ideas of the method is to estimate the degree of similarity between two documents based on the frequencies of terms and term-pair, existing in both the two documents. In contrast to conventional methods which takes only one feature into account, the proposed method considers several features at the same time and meatures the similarity using a neural network. To prove the superiority of our method, two experiments have been conducted. One is to verify whether the two input documents are from the same document or not. The other is a problem of information retrieval with a document as the query against a large number of documents. In both the two experiments, the proposed method shows higher accuracy than two conventional methods, Cosine similarity measurement and a term-pair method.

Performance Evaluation on Structure-based Retrievals of XML Documents (XML 문서의 구조기반 검색성능 평가)

  • Kim, Su-Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.2
    • /
    • pp.396-406
    • /
    • 2009
  • In extension to our previous study, we develop metadata that specify elements' structural orders, to increase the efficiency level of XML document's retrieval process. Then, we proposed a structure-based indexing model. We expect the model to generate a more efficient retrieval process of horizontally and vertically related elements. To evaluate the model's performance level, we developed an experimental prototype and conducted an experiment on an XML corpus. On average, descendant, ancestor and sibling retrievals were approximately twelve percent faster than the ETID model. And retrievals specifying structural orders of particular element types were approximately twenty-five percent faster than the ETID model. In conclusion, metadata, such as Etype, Asso and Lsso, may make a meaningful contribution to retrieval processes that specify elements' order.

A Study of Designing the Intelligent Information Retrieval System by Automatic Classification Algorithm (자동분류 알고리즘을 이용한 지능형 정보검색시스템 구축에 관한 연구)

  • Seo, Whee
    • Journal of Korean Library and Information Science Society
    • /
    • v.39 no.4
    • /
    • pp.283-304
    • /
    • 2008
  • This is to develop Intelligent Retrieval System which can automatically present early query's category terms(association terms connected with knowledge structure of relevant terminology) through learning function and it changes searching form automatically and runs it with association terms. For the reason, this theoretical study of Intelligent Automatic Indexing System abstracts expert's index term through learning and clustering algorism about automatic classification, text mining(categorization), and document category representation. It also demonstrates a good capacity in the aspects of expense, time, recall ratio, and precision ratio.

  • PDF

A Fast and Powerful Question-answering System using 2-pass Indexing and Rule-based Query Processing Method (2-패스 색인 기법과 규칙 기반 질의 처리기법을 이용한 고속, 고성능 질의 응답 시스템)

  • 김학수;서정연
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.795-802
    • /
    • 2002
  • We propose a fast and powerful Question-answering (QA) system in Korean, which uses a predictive answer indexer based on 2-pass scoring method. The indexing process is as follows. The predictive answer indexer first extracts all answer candidates in a document. Then, using 2-pass scoring method, it gives scores to the adjacent content words that are closely related with each answer candidate. Next, it stores the weighted content words with each candidate into a database. Using this technique, along with a complementary analysis of questions which is based on lexico-syntactic pattern matching method, the proposed QA system saves response time and enhances the precision.

An Index Method for Storing and Extracting XML Documents (XML 문서의 저장과 추출을 위한 색인 기법)

  • Kim Woosaeng;Song Jungsuk
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.2
    • /
    • pp.154-163
    • /
    • 2005
  • Because most researches that were studied so far on XML documents used an absolute coordinate system in most of the index techniques, the update operation makes a large burden. To express the structural relations between elements, attributes and text, we need to reconstruct the structure of the coordinates. As the reconstruction process proceeds through out the entire XML document in a cascade manner, which is not limited to the current changing node, a serious performance problem may be caused by the frequent update operations. In this paper, we propose an index technique based on extensible index that does not cause serious performance degradations. It can limit the number of node to participate in reconstruction process and improve lots of performance capacities on the whole. And extensible index performs the containment relationship query by the simple expression using SQL statement.

  • PDF