• Title/Summary/Keyword: Document information retrieval

Search Result 411, Processing Time 0.033 seconds

Document Structuring and Text Retrieval Using SGML, (SGML을 이용한 문헌의 구조화 및 텍스트 검색에 관한 연구)

  • 오민경;정영미
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1995.08a
    • /
    • pp.29-32
    • /
    • 1995
  • 본 논문에서는 SGML(Standard Generalized Markup Language)을 사용하여 텍스트 검색시스템을 구축하였다. SGML은 개괄적 마크업언어로서 문헌을 문헌요소라는 객체 단위로 이루어진 것으로 보고 이러한 문헌요소간의 관계를 표현하므로, 텍스트 검색시스템에서 SGML을 이용하면 문헌을 구조화할 수 있고 전문(full text)을 효율적으로 조직하고 검색하는 것이 가능하다.

  • PDF

Design of XML Document Management System based on Schema (스키마 기반의 XML문서 관리 시스템 설계)

  • 조윤기;김영란
    • Journal of the Korea Society of Computer and Information
    • /
    • v.6 no.4
    • /
    • pp.85-93
    • /
    • 2001
  • As progressing rapidly to the information society and increasing greatly the amount of information, many researchers have been made utilizing XML to store and retrieval the information effectively. But, many other existing method could not support various structured retrieval method for specific parent, children and sibling nodes. In this paper, we propose (1)an effective method of representation for structured information and of indexing mechanism using OETID(Ordered Element Type ID) for effective management and structured retrieval of the XML documents. Also it contains another proposal that is (2) a documents integration mechanism for retrieval result and storing technique to store structural information of the XML documents. With our methods, we could effectively represent structural information of XML documents, and could directly access the specific elements and process various queries by simple operations.

  • PDF

A Study on the Implementation and Performance Evaluation of Full-text Information Retrieval System based on Scientific Paper′s Content Structure (학술논문의 내용구조에 의한 전문검색시스템 구현과 성능평가에 관한 연구)

  • 이두영;이병기
    • Journal of the Korean Society for information Management
    • /
    • v.15 no.3
    • /
    • pp.73-93
    • /
    • 1998
  • Conventional full-text information retrieval system has been proved with high recall ratio and low precision ratio. One of the disadvantages of full-text IR system is that it is not designed to reflect the user's information need. It is due to the fact that full-text IR system has been designed based on physical and logical structure of document without considering the content of document. The purpose of the study is to develop more effective full-text IR system by resolving such disadvantages of conventional system. The study has developed new method of designing full-text IR system by using Content Structure Markup Language(CSML) other than conventioanal SGML.

  • PDF

Design and Implementation of Document Filing and Retrieval System in an OSI Environment (OSI 환경에서 문서 파일링 및 검색 시스템의 설계 및 구현)

  • 임재홍;박용진
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.2
    • /
    • pp.10-20
    • /
    • 1994
  • This paper describes a design and implementation of the DFR(Document Filing and Retrieval) system. one of applications of DOAM(Distributed Office Application Model) which is the international standard in ISO(International Standards Organization). On the basis of the international standard, the DFR system is implemented on SUN workstation and PC/386 with C language, and its implementation is verified by tracing the association descriptor and primitives of service elements when its operation is tested between client and server. The result of this study shows that the DFR system can be implemented on the basis of the international standard, and makes a contribution toward the establishment of functional standards for the DFR system.

  • PDF

A Study on the Design of a Knowledge Base for the Korean Retrieval (우리말 문헌정보검색을 위한 지식베이스 설계에 관한 연구)

  • Chang, Jae-Gyong
    • Journal of the Korean Society for information Management
    • /
    • v.3 no.1
    • /
    • pp.70-102
    • /
    • 1986
  • This study is an attempt to develop a knowledge base with Inference mechanism for document retrieval, which is the core element of expert system. The purpose of this study is to design the knowledge base in order to intellectually process user queries eventually improving the effectiveness of information retrieval, under the assumption that the user who wants to search a certain subject generally lack the prior knowledge about that subject. In this paper, some characteristics of Korean complex nouns are structurally analyzed and are represented in the knowledge base.

  • PDF

Query Processing Model Using Two-level Fuzzy Knowledge Base (2단계 퍼지 지식베이스를 이용한 질의 처리 모델)

  • Lee, Ki-Young;Kim, Young-Un
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.4 s.36
    • /
    • pp.1-16
    • /
    • 2005
  • When Web-based special retrieval systems for scientific field extremely restrict the expression of user's information request, the process of the information content analysis and that of the information acquisition become inconsistent. Accordingly, this study suggests the re-ranking retrieval model which reflects the content based similarity between user's inquiry terms and index words by grasping the document knowledge structure. In order to accomplish this, the former constructs a thesaurus and similarity relation matrix to provide the subject analysis mechanism and the latter propose the algorithm which establishes a search model such as query expansion in order to analyze the user's demands. Therefore, the algorithm that this study suggests as retrieval utilizing the information structure of a retrieval system can be content-based retrieval mechanism to establish a 2-step search model for the preservation of recall and improvement of accuracy which was a weak point of the previous fuzzy retrieval model.

  • PDF

Semantic Similarity Measures Between Words within a Document using WordNet (워드넷을 이용한 문서내에서 단어 사이의 의미적 유사도 측정)

  • Kang, SeokHoon;Park, JongMin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.11
    • /
    • pp.7718-7728
    • /
    • 2015
  • Semantic similarity between words can be applied in many fields including computational linguistics, artificial intelligence, and information retrieval. In this paper, we present weighted method for measuring a semantic similarity between words in a document. This method uses edge distance and depth of WordNet. The method calculates a semantic similarity between words on the basis of document information. Document information uses word term frequencies(TF) and word concept frequencies(CF). Each word weight value is calculated by TF and CF in the document. The method includes the edge distance between words, the depth of subsumer, and the word weight in the document. We compared out scheme with the other method by experiments. As the result, the proposed method outperforms other similarity measures. In the document, the word weight value is calculated by the proposed method. Other methods which based simple shortest distance or depth had difficult to represent the information or merge informations. This paper considered shortest distance, depth and information of words in the document, and also improved the performance.

Improving Retrieval Effectiveness with Multiple Weighting Schemes (다중 가중치 기법을 이용한 검색 효과의 개선)

  • 이준호
    • Journal of the Korean Society for information Management
    • /
    • v.12 no.2
    • /
    • pp.213-223
    • /
    • 1995
  • It has known that different representations of either queries or documents, or different retrieval techniques retrieve different sets of documents. Recent works suggest that significant improvements in retrieval performance can be achieved by combining multiple representations or multiple retrieval techniques. In this paper we propose a simple method for retrieving different documents within a single query representation, a single document representation and a single retrieval technique. We classify the types of documents, and describe the properties of weighting schemes. Then. we explain that different properties of weighting schemes may retrieve different types of documents. Experimental results show that significant improvements can be obtained by combining the retrieval results form different properties of weighting schemes.

  • PDF

A study of the existing problems of digital libraries and their future environment (현존하는 디지털도서관의 문제점과 미래환경에 관한 연구)

  • 박일종
    • Journal of Korean Library and Information Science Society
    • /
    • v.27
    • /
    • pp.391-421
    • /
    • 1997
  • Information scientists need not to answer whether future libraries will be a digital library or not, but to answer how they are structured and served effectively to users currently. 'The library with walls' or 'the library as place' need to be existed in the future, but 'digital library without the wall' or 'virtual library' will need to be studied continuously. This study has tried to reveal the existing problems of digital libraries and their future environment after considering the ambiguous concepts of various types of electronic libraries and their efforts for library automation, and the changed information retrieval circumstances during the last 30 to 40 years through a qualitative document study. As a result, the major findings and suggestions are prepared. The library of the future will be a part of local and national cooperative systems, be filled with the intelligent use of old and new technologies, and be able to su n.0, pport both a place with extensive collections and convenient, easy, & free access to remote intellectual resources. Also, the information storage and retrieval (ISAR) to the future library system would easily provide users with any types of data retrieval system by anybody rather than by an expert or a specialist, so called 'A&E retrieval' in the coming 21th century. It will be highly possible that the future society changes to the information marketplace whose data may be recognized as an intangible assets.

  • PDF

Keyword Extraction in Korean Using Unsupervised Learning Method (비감독 학습 기법에 의한 한국어의 키워드 추출)

  • Shin, Seong-Yoon;Rhee, Yang-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.6
    • /
    • pp.1403-1408
    • /
    • 2010
  • Korean information retrieval uses noun as index terms or keywords of representing the document. and noun and keyword extraction is to find all nouns presented in the document, In this paper, we proposes the method of keyword extraction using pre-built dictionary. This method reduces the execution time by reducing unnecessary operations. And noun, even large documents without affecting significantly the accuracy, can be extracted. This paper proposed noun extraction method using the appearance characteristics of the noun and keyword extraction method using unsupervised learning techniques.