• Title/Summary/Keyword: Document Indexing

Search Result 105, Processing Time 0.026 seconds

Concept Design for Inhouse Database Construction (사내(社內) 데이터베이스 구축(構築)을 위한 개념설계(槪念設計))

  • Lee, Chang-Han
    • Journal of Information Management
    • /
    • v.23 no.2
    • /
    • pp.40-56
    • /
    • 1992
  • The inhouse database can be used as an effective tool for managing inhouse/corporaterelated informations, which efficiently support business activities as they are shared by all the inhouse members. The inhouse database construction processes can be classified into a concept design, various kinds of DB format designs, abstracting/indexing, and DB operation/maintenance. This article presents a method of systematizing the concept design elements such as information flow support system, usable resources, and so on.

  • PDF

Efficient Indexing Technique for Retrieval of an XML Document and Design of Query Language (TQL) (XML 문서의 검색을 위한 효율적인 색인 기법과 질의 언어(TQL)의 설계)

  • 이계준;신동욱;권택근
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10a
    • /
    • pp.57-59
    • /
    • 1999
  • 현재 WWW(World Wide Web), 사무 자동화 시스템(Office Information System), 전자 도서관(Digital Library) 등의 빠른 발전으로 인하여 정보가 기하급수적으로 증가하였다. 이러한 방대한 양의 정보를 처리하기 위하여 많은 인터넷 기반의 문서 표준들이 출현하였고, 대표적으로 XML(eXtensible Markup Language)이 차세대 인터넷 전자 문서의 표준으로 많은 곳에 응용되고 있다. 이에 따라 XML 문서의 정보들을 효율적이고 정확하게 저장하고 이용, 검색 할 수 있는 기능을 요구되어졌다. 현재 대부분의 연구들은 XML 문서에 대한 구조적인 정보만을 저장하고 검색하는 기능만을 지원 할 뿐 검색된 결과에 대한 재사용이나 재구성에 대한 기능의 제공은 미흡한 실정이다. 본 논문에서는 현재 검색기들이 제공하는 XML 문서에 대한 구조적인 검색 기능을 확장하여 XML 문서를 보다 효율적으로 검색하기 위하여 새로운 색인 기법을 제안하고, 데이터베이스 내에 저장된 XML문서에 대해 구조적인 검색과 이것을 바탕으로 문서를 재구성하고 재사용하는 기능을 수행할 수 있도록 새로운 질의어(TQL)을 설계하였다.

  • PDF

Techniques of XML Fragment Stream Organization for Efficient XML Query Processing in Mobile Clients (이동 클라이언트에서 효율적인 XML 질의 처리를 위한 XML 조각 스트림 구성 기법)

  • Ryu, Jeong-Hoon;Kang, Hyun-Chul
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.75-94
    • /
    • 2009
  • Since XML emerged as a standard for data exchange on the web, it has been established as a core component in e-Commerce and efficient query processing over XML data in ubiquitous computing environment has been also receiving much attention. Recently, the techniques were proposed whereby an XML document is fragmented into XML fragments to be streamed and the mobile clients receive the stream while processing queries over it. In processing queries over an XML fragment stream, the average access time significantly depends on the order of fragments in the stream. As such, for query performance, an efficient organization of XML fragment stream is required as well as the indexing for energy-efficient query processing due to the reduction of tuning time. In this paper, a technique of XML fragment stream organization based on query frequencies, fragment size, fragment access frequencies, and an active XML-based indexing scheme are proposed. Through implementation and performance experiments, our techniques were shown to be efficient compared with the conventional XML fragment stream organizations.

  • PDF

A Labeling Methods for Keyword Search over Large XML Documents (대용량 XML 문서의 키워드 검색을 위한 레이블링 기법)

  • Sun, Dong-Han;Hwang, Soo-Chan
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.699-706
    • /
    • 2014
  • As XML documents are getting bigger and more complex, a keyword-based search method that does not require structural information is needed to search these large XML documents. In order to use this method, not only all keywords expressed as nodes in the XML document must be labeled for indexing but also structural information should be well represented. However, the existing labeling methods either have very simple information of XML documents for index or represent the structural information which is difficult to deal with the increase of XML documents' size. As the size of XML documents is getting larger, it causes either the poor performance of keyword search or the exponential increase of space usage. In this paper, we present the Repetitive Prime Labeling Scheme (RPLS) in order to improve the problem of the existing labeling methods for keyword-based search of large XML documents. This method is based on the existing prime number labeling method and allows a parent's prime number to be used at a lower level repeatedly so that the number of prime numbers being generated can be reduced. Then, we show an experimental result of the comparison between our methods and the existing methods.

A Semantic Text Model with Wikipedia-based Concept Space (위키피디어 기반 개념 공간을 가지는 시멘틱 텍스트 모델)

  • Kim, Han-Joon;Chang, Jae-Young
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.3
    • /
    • pp.107-123
    • /
    • 2014
  • Current text mining techniques suffer from the problem that the conventional text representation models cannot express the semantic or conceptual information for the textual documents written with natural languages. The conventional text models represent the textual documents as bag of words, which include vector space model, Boolean model, statistical model, and tensor space model. These models express documents only with the term literals for indexing and the frequency-based weights for their corresponding terms; that is, they ignore semantical information, sequential order information, and structural information of terms. Most of the text mining techniques have been developed assuming that the given documents are represented as 'bag-of-words' based text models. However, currently, confronting the big data era, a new paradigm of text representation model is required which can analyse huge amounts of textual documents more precisely. Our text model regards the 'concept' as an independent space equated with the 'term' and 'document' spaces used in the vector space model, and it expresses the relatedness among the three spaces. To develop the concept space, we use Wikipedia data, each of which defines a single concept. Consequently, a document collection is represented as a 3-order tensor with semantic information, and then the proposed model is called text cuboid model in our paper. Through experiments using the popular 20NewsGroup document corpus, we prove the superiority of the proposed text model in terms of document clustering and concept clustering.

Weighting of XML Tag using User's Query (사용자 질의를 이용한 XML 태그의 가중치 결정)

  • Woo Seon-Mi;Yoo Chun-Sik;Kim Yong-Sung
    • The KIPS Transactions:PartD
    • /
    • v.12D no.3 s.99
    • /
    • pp.439-446
    • /
    • 2005
  • XML is the standard that can manage systematically WWW documents and increase retrieval efficiency. Because XML documents have the information of contents and that of structure in single document, users can get more suitable retrieval result by retrieving the information of content as well as that of logical structure. In this paper, we will propose a method to calculate the weights of XML tags so that the information of XML tag is used to index decision. A proposed method creates term vector and weight vector for XML tags, and calculates weight of tag by reflecting user's retrieval behavior (user's query). And it decides the weights of index terms of XML document by reflecting the weights of tags. And we will perform an evaluation of proposed method by comparison with existing researches using weights of paragraphs.

Implementation of a Journal's Table of Contents Separation System based on Contents Analysis (내용분석을 통한 논문지의 목차분류 시스템의 구현)

  • Kwon, Young-Bin
    • The KIPS Transactions:PartB
    • /
    • v.14B no.7
    • /
    • pp.481-492
    • /
    • 2007
  • In this paper, a method for automatic indexing of contents to reduce effort for inputting paper information and constructing index is considered. Existing document analysis methods can't analyse various table of contents of journal paper formats efficiently because they have many exceptions. In this paper, various contents formats for journals, which have different features from those for general documents, are analysed and described. The principal elements that we want to represent are titles, authors, and pages for each papers. Thus, the three principal elements are modeled according to the order of their arrangement, and their features are extracted. And a table of content recognition system of journal is implemented, based on the proposed modeling method. The accuracy of exact extraction ratio of 91.5% on title, author, and page type on 660 published papers of various journals is obtained.

A Study on Ontology-based Keywords Structuring for Efficient Information Retrieval (연구.학술정보 효율적 검색을 위한 온톨로지 기반의 주제 색인어 구조화 방안 연구)

  • Song, In-Seok
    • Journal of Information Management
    • /
    • v.39 no.4
    • /
    • pp.121-154
    • /
    • 2008
  • In this paper, a ontology-based keyword structuring method is proposed to represent the knowledge structure of scholarly documents and to make inferences from the semantic relationships holding among them. The characteristics of thesaurus as a knowledge organization system(KOS) for subject heading is critically reviewed from the information retrieval point of view. The domain concepts are identified and classified by analysis of the information activities occurring in a general research process based on scholarly sensemaking model. The ontological structure of keyword set is defined in terms of the semantic relationship of the canonical concepts which constitute scholarly documents such as journal articles. As a result, each ontologically structured keyword set of a document represents the knowledge structure of the corresponding document as semantic index. By means of the axioms and inference rules defined for information needs, users can efficiently explore the scholarly communication network built on the semantic relationship among documents in an analytic way based on the scholarly sensemaking model in oder to efficiently retrieve the relevant information for problem solving.

A Study on Natural Language Document and Query Processor for Information Retrieval in Digital Library (디지털 도서관 환경에서의 정보 검색을 위한 자연어 문서 및 질의 처리기에 관한 연구)

  • 윤성희
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.12
    • /
    • pp.1601-1608
    • /
    • 2001
  • Digital library is the most important database system that needs information retrieval engine for natural language documents and multimedia data. This paper describes the experimental results of information retrieval engine and browser based on natural language processing. It includes lexical analysis, syntax processing, stemming, and keyword indexing for the natural language text. With the experimental database ‘Earth and Space Science’ that has lots of images and titles and their descriptive text in natural language, text-based search engine was tested. Combined with content-based image search engine, it is expected to be a multimedia information retrieval system in digital library

  • PDF

Design and Implementation of MPEG-2 Compressed Video Information Management System (MPEG-2 압축 동영상 정보 관리 시스템의 설계 및 구현)

  • Heo, Jin-Yong;Kim, In-Hong;Bae, Jong-Min;Kang, Hyun-Syug
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1431-1440
    • /
    • 1998
  • Video data are retrieved and stored in various compressed forms according to their characteristics, In this paper, we present a generic data model that captures the structure of a video document and that provides a means for indexing a video stream, Using this model, we design and implement CVIMS (the MPEG-2 Compressed Video Information Management System) to store and retrieve video documents, CVIMS extracts I-frames from MPEG-2 files, selects key-frames from the I -frames, and stores in database the index information such as thumbnails, captions, and picture descriptors of the key-frames, And also, CVIMS retrieves MPEG- 2 video data using the thumbnails of key-frames and v31ious labels of queries.

  • PDF