• Title/Summary/Keyword: Document information retrieval

Search Result 413, Processing Time 0.029 seconds

A Signature Method for Efficient Preprocessing of XML Queries (XML 질의의 효율적인 전처리를 위한 시그너처 방법)

  • 정연돈;김종욱;김명호
    • Journal of KIISE:Databases
    • /
    • v.30 no.5
    • /
    • pp.532-539
    • /
    • 2003
  • The paper proposes a pre-processing method for efficient processing of XML queries in information retrieval systems with a large amount of XML documents. For the pre-processing, we use a signature-based approach. In the conventional (flat document-based) information retrieval systems, user queries consist of keywords and boolean operators, and thus signatures are structured in a flat manner. However, in XML-based information retrieval systems, the user queries have the form of path query. Therefore, the flat signature cannot be effective for XML documents. In the paper, we propose a structured signature for XML documents. Through experiments, we evaluate the performance of the proposed method.

Dynamic index storage and integrated searching service development (동적 색인 스토리지 및 통합 검색 서비스 개발)

  • Lee, Wang-Woo;Lee, Seok-Hyoung;Choe, Ho-Seop;Yoon, Hwa-Mook;Kim, Jong-Hwan;Hur, Yoon-Young
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.346-349
    • /
    • 2007
  • In this paper, the integrated search system made for the web news and review retrieval service is introduced. We made XSLTRobot that extract title, date, author and content from html document like news or reviews for search service. XSLTRobot used the XSLT technology in order to extract desired part of html page. The Intergrated Information Retrieval System(IIRS) is suitable for various search data format. And we introduce Dynamic Index Storage which is module of IIRS. Dynamic Index Storage is used to environment which needs fast index update like news. And it's design focused on retrieval performance because there was not many document that it has to update on a real time.

  • PDF

Automatic Document Summary Technique Using Fuzzy Theory (퍼지이론을 이용한 자동문서 요약 기술)

  • Lee, Sanghoon;Moon, Seung-Jin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.12
    • /
    • pp.531-536
    • /
    • 2014
  • With the very large quantity of information available on the Internet, techniques for dealing with the abundance of documents have become increasingly necessary but the problem of processing information in the documents is still technically challenging and remains under study. Automatic document summary techniques have been considered as one of critical solutions for processing documents to retain the important points and to remove duplicated contents of the original documents. In this paper, we propose a document summarization technique that uses a fuzzy theory. Proposed summary technique solves the ambiguous problem of various features determining the importance of the sentence and the experiment result shows that the technique generates better results than other previous techniques.

Design of XQL Query Processing System for Structural information retrieval (구조적 정보 검색을 위한 XQL 질의 처리 시스템 설계)

  • 김상영;김철원;김광현;박종훈;정현철
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.10a
    • /
    • pp.892-896
    • /
    • 2003
  • XML is used in various fields such as interface format for data swapping between application between several various system passing over thing to mark to web browser simply. Accordingly, a lot of studies about system that can manage effectively and search XML document with formation of information, reusability, disposal and durability, portability are proceeding. In this paper, explain about XQL and document structure processor and language processor of quality and make contents of XML document by tree structure, structure information presents method that find XML document tree structure information that is correct on question using XQL while do parsing. Through this, described for design and embodiment of efficient XML document search system that use XQL that compose structure information of document in tree structure and is proposed in language of quality after do parsing absorbing XML document that is scattered on web.

  • PDF

A Study on the Implementation of Information Extraction Agency for Ship Sale and Purchase using Content Based Retrieval (내용기반 검색을 이용한 선박매매 정보추출 에이전트의 구현에 관한 연구)

  • Ha, Chang-Seung;Jung, Lee-Sang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.1 s.45
    • /
    • pp.43-50
    • /
    • 2007
  • Delay in the process of Information Extraction, IE, is largely due to inability to correctly recognize the user's information requirement of particular search factors. Especially if the wrapper rules are used in a search engine, the search generally fails to classify internet documents properly and efficiently since the application of the same wrapper rules lacks extensibility throughout various types of existing internet document. In case of buying or selling a ship, if the price range, type. place of delivery, inspection site and other information relevant to the sales would be available through the internet for proper retrieval the sales could more readily succeed by using Ontology relating to sales or purchase information and by selectively searching for the desired information through the content based retrieval system. This system proposes to improve various wrapper systems existing throughout different internet sites and to eliminate unnecessary information tagged on the existing internet documents in order to create a more advanced information retrieval system.

  • PDF

Automatic indexing as a subject analysis technique (주제분석기법으로서의 자동색인)

  • 이영자
    • Journal of Korean Library and Information Science Society
    • /
    • v.12
    • /
    • pp.61-96
    • /
    • 1985
  • The human subject analysis of a document has some critical problems. The method results in the inconsistency in analysis process and the contradiction of two objects of the subject analysis (one is the identification of the content for the retrieval of specific items and the other is to identify the content for the grouping of related materials). Since the subject analysis by mechanized has been recognized to be the possible way to aggregate the problems of manual analysis, various a n.0, pproaches of automatic indexing have been studied and experimented. This study is to examine the automatic indexing as one of the promising subject analysis techniques by statistical, syntactical and semantic a n.0, pproaches. In conclusion, the reasonable a n.0, pplication time of the automatic indexing should be made a decision based on the through investigation on the cost verse effectiveness, and automatic indexing system should be developed in the close relationship with the on-line search which is a good retrieval system for information explosion society. From now on, since the machine-readable document-text will be envisaged to be more and more available due to the rapid development of computer technology, the more substantial research on the automatic indexing will be also possible, which can bring about the increasing of practical automatic indexing systems.

  • PDF

A Shared Inlining Method for Resolving the Overlapping Problem of Elements (엘리먼트의 중첩 문제를 해결한 Shared Inlining 저장 기법)

  • Hong, Eun-Il;Lee, Young-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.5
    • /
    • pp.411-420
    • /
    • 2008
  • The number of XML documents, which are widely used as a standard method for information expression and exchange in the web-based environment, increases rapidly along with the growing production of large XML documents. Many studies have been made to store and retrieve these XML documents on RDBMS, among which Shared Inlining storage method has a higher level of retrieval efficiency. The Shared Inlining method is the technique that analyzes the DTD information and stores the XML document in RDBMS by dividing for each node component. This study proposes the technique to resolve the overlapping problem that occurs in the element with several child nodes in the existing Shared Inlining method. The suggested method stores the XML document in the Shared Inlining structures appropriate to the DTD definition and enhances the accuracy of retrieval.

A Study of using Emotional Features for Information Retrieval Systems (감정요소를 사용한 정보검색에 관한 연구)

  • Kim, Myung-Gwan;Park, Young-Tack
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.579-586
    • /
    • 2003
  • In this paper, we propose a novel approach to employ emotional features to document retrieval systems. Fine emotional features, such as HAPPY, SAD, ANGRY, FEAR, and DISGUST, have been used to represent Korean document. Users are allowed to use these features for retrieving their documents. Next, retrieved documents are learned by classification methods like cohesion factor, naive Bayesian, and, k-nearest neighbor approaches. In order to combine various approaches, voting method has been used. In addition, k-means clustering has been used for our experimentation. The performance of our approach proved to be better in accuracy than other methods, and be better in short texts rather than large documents.

Path Signatures : Path-oriented Query Processing System for XML document Retrieval (경로 서명 : XML문서 검색을 위한 경로-지향 질의처리 시스템)

  • Park, Hee-Sook;Park, Ju-Hyun;Cho, Woo-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.7
    • /
    • pp.1311-1317
    • /
    • 2007
  • Recently, due to the popularity and explosive growth of the Internet, the information exchange is increasing so rapidly over the Internet. Also the XML is becoming a standard as well as a major tool of data exchange on the Internet and thus we propose the new indexing technique for evaluating a path-oriented query and design and implementation of Path-oriented Query Processing System to give useful for users. In proposed indexing technique, which combined a binary trio structure with a path signature file to improve performance of XML document retrieval.

XML based Software Architecture Specification Language for Reuse (재사용을 위한 XML 기반 소프트웨어 아키텍쳐 명세 언어)

  • Lee, Yun-Su;Yun, Gyeong-Seop;Wang, Chang-Jong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.3
    • /
    • pp.808-817
    • /
    • 2000
  • Component specification languages in consideration of reuse are essential factor in classification, verification and retrieval of components. A number of legacy specification languages have already been used, however, they are complex and include many necessary elements in the specification for implementation. In this paper, we present XML-based component specification and software architecture specification language to solve these problems of legacy specification languages. The presented specification languages consist of component specification, which is composed of signature specification, interface specification and message specification, and software architecture specification providing graphical notations and textural notations. Component specification supports component retrieval with behavioral match and black-box reuse of component. In addition to this, it improves the efficiency of retrieval and document management with XML-based component specification. Software architecture specification supports the structural reuse of architecture, which is white-box reuse, through mesage-based architecture specification.

  • PDF