• Title/Summary/Keyword: structured document

Search Result 170, Processing Time 0.02 seconds

A Study on the Performance of Structured Document Retrieval Using Node Information (노드정보를 이용한 문서검색의 성능에 관한 연구)

  • Yoon, So-Young
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.103-120
    • /
    • 2007
  • Node is the semantic unit and a part of structured document. Information retrieval from structured documents offers an opportunity to go subdivided below the document level in search of relevant information, making any element in an structured document a retrievable unit. The node-based document retrieval constitutes several similarity calculating methods and the extended node retrieval method using structure information. Retrieval performance is hardly influenced by the methods for determining document similarity The extended node method outperformed the others as a whole.

XML Structured Model of Tree-type for Efficient Retrieval (효율적인 검색을 위한 Tree 형태의 XML 문서 구조 모델)

  • Kim Young-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.9 no.4 s.32
    • /
    • pp.27-32
    • /
    • 2004
  • A XML Document has a structure which may be irregular The irregular document structure is difficult for users to know exactly. In this paper, we propose the XML document model and the structure retrieval method for efficient management and structure retrieval of XML documents. So we use fixed-sized LETID having the information of element, describe the structured information retrieval algorithm for parent and child element to represent the structured information of XML documents. Using this method, we represent the structured information of XML document efficiently. We can directly access to specific clement by simple operation, and process various queries. We expect the method to support various structured retrieval of specific element such as parent, child. and sibling elements.

  • PDF

A Study of Distorted Document Image Restoration using Structured Light (Structured Light를 이용한 왜곡된 문서 영상 복원에 관한 연구)

  • 곽규섭;채옥삼
    • Proceedings of the IEEK Conference
    • /
    • 2000.11d
    • /
    • pp.235-238
    • /
    • 2000
  • This paper describes the implementation of document image restoration system for the geometric distortion using structured light. To get accurate document images, the bounded book must be flattened by pushing down the book with a class plate. However, most of ancient documents are too fragile to be pushed. The proposed system restores the distorted character image due to geometric distortion.

  • PDF

A Study on the Depth-Oriented Decomposition Indexing Method for Creating and Searching Structured Documents Based-on XML (XML을 이용한 구조적 문서 생성 및 탐색을 위한 깊이중심분할 색인기법에 관한 연구)

  • Yang, Ok-Yul;Lee, Yong-Ju
    • The KIPS Transactions:PartD
    • /
    • v.9D no.6
    • /
    • pp.1025-1042
    • /
    • 2002
  • The goal of this study is to generate a structured document which improves the performance of an information retrieval system by using thesaurus, information on relations between words (terms), and to study on the technique for searching this structured document. In order to accomplish this goal, we propose a DODI (Depth -Oriented Decomposition Index) technique for the structured document and an algorithm to search for related information efficient]y through this index technique that uses a thesaurus. We establish a storage system by which the structured document generated by this index technique is saved in a database through OpenXML and XML documents are generated through ForXML methods.

Design of Algorithm for Efficient Retrieve Pure Structure-Based Query Processing and Retrieve in Structured Document (구조적 문서의 효율적인 구조 질의 처리 및 검색을 위한 알고리즘의 설계)

  • 김현주
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.8
    • /
    • pp.1089-1098
    • /
    • 2001
  • Structure information contained in a structured document supports various access paths to document. In order to use structure information contained in a structured document, it is required to construct an index structural on document structures. Content indexing and structure indexing per document require high memory overhead. Therefore, processing of pure structure queries based on document structure like relationship between elements or element orders, low memory overhead for indexing are required. This paper suggests the GDIT(Global Document Instance Tree) data structure and indexing scheme about structure of document which supports low memory overhead for indexing and powerful types of user queries. The structure indexing scheme only index the lowest level element of document and does not effect number of document having retrieval element. Based on the index structure, we propose an query processing algorithm about pure structure, proof the indexing schemes keeps up indexing efficient in terms of space. The proposed index structure bases GDR concept and uses index technique based on GDIT.

  • PDF

Style Control of Structured Documents using DSSSL

  • Lee, Kyong-Ho;Lee, Jin-Ho;Choy, Yoon-Chul
    • Proceedings of the Korea Database Society Conference
    • /
    • 1997.10a
    • /
    • pp.455-462
    • /
    • 1997
  • SGML(Standard Generalized Markup Language) is the ISO standard fer describing the logical structure of documents and is also adopted as the CALS standard for document description. Since then, there have been growing interests in SGML application in a variety of fields. However because SGML doesn't provide a standard method for describing various processing informations, ie, formatting and transformation, most applications have applied methods that are system dependent. Recently, ISO defined DSSSL(Document Style Semantics and Specification Language) as a standard mechanism to specify the formatting, transformation and retrieval of structured documents. Therefore, in this paper, we present a DSSSL processing system far style control of structured documents such as SGML documents. The system processes DSSSL style sheet that describes layout of documents and browses the result of its application to a SGML document. We have conducted tests on a lot of SGML documents and DSSSL style sheets successfully. Now, we are developing the SGML document management system that supports creation, editing, storage and retrieval of SGML document based upon the DSSSL processor and the SGML parser which we have developed.

  • PDF

XML Document Repository System for structured retrieval (구조 검색을 위한 XML 문서 저장 시스템)

  • 임산송;현득창;정회경
    • The Journal of Information Technology
    • /
    • v.4 no.4
    • /
    • pp.89-100
    • /
    • 2001
  • XML (extensible Markup Language) is selected and published as a representative standard of electronic documents by W3C (World Wide Web Consortium). The structured information can be created and also transferred in XML documents. By utilizing XML, you can express the meaningful information unit as a structure comparing existed file typed information. With structured information, you can also manage retrieve, and reposit documents. According to the above facts, in this paper, it is the purpose to design and implement XML documents repository system to reposit and retrieve using structured information of XML documents. As a model it was designed to be stored by element unit which is the basic unit of documents and was also designed to retrieve the stored XML information by structured unit. It was, especially, designed to manage and reposit the structure of various documents effectively through creating schema as to DTD(Document Type Definition) and instance.

  • PDF

Usability Analysis of Structured Abstracts in Journal Articles for Document Clustering (문서 클러스터링을 위한 학술지 논문의 구조적 초록 활용성 연구)

  • Choi, Sang-Hee;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.1
    • /
    • pp.331-349
    • /
    • 2012
  • Structured abstracts have been regarded as an essential information factor to represent topics of journal articles. This study aims to provide an unconventional view to utilize structured abstracts with the analysis on sub fields of a structured abstract in depth. In this study, a structured abstract was segmented into four fields, namely, purpose, design, findings, and values/implications. Each field was compared in the performance analysis of document clustering. In result, the purpose statement of an abstract affected on the performance of journal article clustering more than any other fields. Furthermore, certain types of keywords were identified to be excluded in the document clustering to improve clustering performance, especially by Within group average clustering method. These keywords had stronger relationship to a specific abstract field such as research design than the topic of an article.

Design of SGML Document Storage Management System using GROVE (GROVE를 이용한 SGML 문서 저장 관리 시스템 설계)

  • 정회경;안성옥;오일덕
    • The Journal of Information Technology
    • /
    • v.2 no.2
    • /
    • pp.269-279
    • /
    • 1999
  • SGML(Standard Generalized Markup Language) is proper to view, modify and create new electronic document as documentation standard to create and interchange the structured document information. Accordingly, a study on efficient storage and management of very large structured SGML document information is need. This paper proposes design of data modeling based on GROVE(Graph Representation Of property ValuEs) defined in HyTime(Hypermedia Time-based Structuring Language) and describes design of SGML document storage management system.

  • PDF

Analysis of Indexing Schemes for Structure-Based Retrieval (구조 기반 검색을 위한 색인 구조에 대한 분석)

  • 김영자;김현주;배종민
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.5
    • /
    • pp.601-616
    • /
    • 2004
  • Information retrieval systems for structured documents provide multiple levels of retrieval capability by supporting structure-based queries. In order to process structure-based queries for structured documents, information for structural nesting relationship between elements and for element sequence must be maintained. This paper presents four index structures that can process various query types about structures such as structural relationships between elements or element occurrence order. The proposed algorithms are based on the concept of Global Document Instance Tree.

  • PDF