• Title/Summary/Keyword: Boolean retrieval

Search Result 58, Processing Time 0.023 seconds

Sensitivity Analysis of Decision Tree's Learning Effectiveness in Boolean Query Reformulation (불리언 질의 재구성에서 의사결정나무의 학습 성능 감도 분석)

  • 윤정미;김남호;권영식
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.23 no.4
    • /
    • pp.141-149
    • /
    • 1998
  • One of the difficulties in using the current Boolean-based information retrieval systems is that it is hard for a user, especially a novice, to formulate an effective Boolean query. One solution to this problem is to let the system formulate a query for a user from his relevance feedback documents in this research, an intelligent query reformulation mechanism based on ID3 is proposed and the sensitivity of its retrieval effectiveness, i.e., recall, precision, and E-measure, to various input settings is analyzed. The parameters in the input settings is the number of relevant documents. Experiments conducted on the test set of Medlars revealed that the effectiveness of the proposed system is in fact sensitive to the number of the initial relevant documents. The case with two or more initial relevant documents outperformed the case with one initial relevant document with statistical significances. It is our conclusion that formulation of an effective query in the proposed system requires at least two relevant documents in its initial input set.

  • PDF

An Experimental Study on Fuzzy Document Retrieval System (퍼지개념을 적용한 질의식의 분석과 문헌정보 검색에 관한 연구)

  • Lee Seung Chai
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.21
    • /
    • pp.249-290
    • /
    • 1991
  • Theoretical developments in the information retrieval have offered a number of alternatives to traditional Boolean retrieval. Probability theory and fuzzy set theory have played prominent roles here. Fuzzy set theory is an attempt to generalize traditional set theory by permitting partial membership in a set and this means recognizing different degrees to which a document can match a request. In this study, an experimentation of a document retrieval system using the fuzzy relation matrix of the keywords is described and the results are offered. The queries composed of keywords and Boolean operaters AND, OR, NOT were processed in the retrieval method, and the method was implemented on the PC of 32bit level (30 MHz) in an experimental system. The measurement of the recall ratio and precision ratio verified the effectiveness of the proposed fuzzy relation matrix of keywords and retrieval method. Compared to traditional crisp method in the same document database, the recall ratio increased $10\%$ high although the precision ratio decreased slightly. The problems, in this experiment, to be resolved are first, the design of the automatic data input and fuzzy indexing modules, through which the system . can have the ability of competition and usefulness. Second, devising a systematic procedure for assigning fuzzy weights to keywords in documents and in queries.

  • PDF

Cost-based Optimization of Extended Boolean Queries (확장 불리언 질의에 대한 비용 기반 최적화)

  • 박병권
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.3
    • /
    • pp.29-40
    • /
    • 2001
  • In this paper, we suggest a query optimization algorithm to select the optimal processing method of an extended boolean query on inverted files. There can be a lot of methods for processing an extended boolean query according to the processing sequence oh the keywords con tamed in the query, In this sense, the problem of optimizing an extended boolean query it essentially that of optimizing the keyword sequence in the query. In this paper, we show that the problem is basically analogous to the problem of finding the optimal join order in database query optimization, and apply the ideas in the area to the problem solving. We establish the cost model for processing an extended boolean query and develop an algorithm to filled the optimal keyword-processing sequence based on the concept of keyword rank using the keyword selectivity and the access costs of inverted file. We prove that the method selected by the optimization algorithm is really optimum, and show, through experiments, that the optimal method is superior to the others in performance We believe that the suggested optimization algorithm will contribute to the significant enhancement of the information retrieval performance.

  • PDF

Meta Information Retrieval using Sentence Analysis of Korean Dialogue Style (한국어 대화체 문장 분석을 이용한 메타 정보검색)

  • 박인철
    • Journal of the Korea Computer Industry Society
    • /
    • v.4 no.10
    • /
    • pp.703-712
    • /
    • 2003
  • Today, documents existing on internet by the development of communication network increase in number. And it is required the information retrieval system that can efficiently acquire the necessary information. Most information retrieval systems retrieve documents using a simple keyword or a boolean query of keywords. But, the method is not fit for novice users to use and has many difficulties than user's dialogue query from the viewpoint of convenience and precise understanding for query. So, this paper has an aim to suggest the method that will cope with above problems and to design and implement a meta query processing system for information retrieval using Korean dialogue sentences. The system implemented in this paper can generates a new boolean query for a given Korean dialogue sentence and resolve lexical ambiguities through morphological analysis, syntactic analysis and extension of query using thesaurus.

  • PDF

Document ranking methods using term dependencies from a thesaurus (시소러스의 연관성 정보를 이용한 문서의 순위 결정 방법)

  • 이준호
    • Journal of the Korean Society for information Management
    • /
    • v.10 no.2
    • /
    • pp.3-22
    • /
    • 1993
  • In recent years various document ranking methods such as Relevance. R-Distance and K-Distance have been developed wh~ch can be used in thesaurus-based boolean retrieval systems. They give high quality document rankings in many cases by using term dependence lnformatlon from a thesaurus. However, they suffer from several problems resulting from inefficient and Ineffective evaluation of boolean operators AND. OR and NOT. In this paper we propose new thesaurus-based document ranking methods called KB-FSM and KB-EBM by exploitmg the enhanced fuzzy set model and the extended boolean model. The proposed methods overcome the problems of the previous methods and use term dependencies from a thesaurs effectively. We also show through performance comparison that KB-FSM and KBEBM provide higher retrieval effectiveness than Relevance. R-D~stance and K-Distance.

  • PDF

A Study on Korean Question Processing System Using Knowledge Base (지식(知識) 베이스를 이용한 한국어(韓國語) 질문 처리(處理) 시스템에 관한 연구)

  • Kim, Pan-Jun
    • Journal of Information Management
    • /
    • v.24 no.3
    • /
    • pp.1-30
    • /
    • 1993
  • Providing users who intend to retrieve document information in korean natural language with direct access to retrieval systems, a korean question processing system was developed in which korean natural language was translated into boolean search statements, which are the most frequently used in current information retrieval systems.

  • PDF

A Study on the Improvement of Performance of Concept-Based Information Retrieval Model Using a Distributed Subject Knowledge Base (주제별 분산 지식베이스에 의한 개념기반 정보검색시스템의 성능향상에 관한 연구)

  • 노영희
    • Journal of the Korean Society for information Management
    • /
    • v.19 no.1
    • /
    • pp.47-69
    • /
    • 2002
  • The concept based retrieval model has shown a higher performance than those of the simple matching function method or the P-norm retrieval method introduced to compensate the demerits of the Boolean retrieval model. However. it takes too long to create a semantic-net knowledge base, which is essential in concept exploration. In order to solve such demerits. a method was sought out by creating a distributed knowledge base by subjects to reduce construction time without hindering the performance of retrieval.

A Study on the Retrieval Systems for Digital Information Resources : Focused on the University Libraries in Busan, Ulsan, Gyeongnam Districts (전자정보자원의 검색시스템에 관한 연구 - 부산.울산.경남지역 대학도서관을 중심으로 -)

  • Doh, Tae-Hyeon
    • Journal of Korean Library and Information Science Society
    • /
    • v.39 no.4
    • /
    • pp.261-281
    • /
    • 2008
  • This study surveyed the retrieval systems for digital information resources of the university libraries in Busan, Ulsan, Gyeongnam districts and the vendors. Kinds of access points and retrieval conditions (Boolean logic, methods of index term identification, and particularities of the retrieval) of the libraries' systems are various and different to each other. Access points of the vendors' systems are unessentially various and different to each other, but the retrieval conditions are rather elaborated. Upon the result the suggestions for the standardization of digital information retrieval system are offered.

  • PDF

Survey and Suggestion for Standardization of Online Catalog Retrieval Systems: Focused on the University Library Catalogs in Busan, Ulsan, Gyeongnam District (자동화목록 검색시스템의 현황과 표준화 방안 - 부산.울산.경남지역 대학도서관 목록의 분석을 중심으로 -)

  • Doh, Tae-Hyeon
    • Journal of Korean Library and Information Science Society
    • /
    • v.38 no.4
    • /
    • pp.357-376
    • /
    • 2007
  • This study surveyed the online catalog retrieval systems of the university libraries in Busan, Ulsan, Gyeongnam districts. The types of library materials, and the kinds of access points and retrieval conditions(Boolean logic, methods of index term Identification, and particularities of the retrieval) of these systems are various and different to each other Upon the result of this survey a suggestion for the standardization of online catalog retrieval systems is offered.

  • PDF

Web Information Retrieval based on Natural Language Query Analysis and Keyword Expansion (자연어 질의 분석과 검색어 확장에 기반한 웹 정보 검색)

  • 윤성희;장혜진
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.2
    • /
    • pp.235-248
    • /
    • 2004
  • For the users of information retrieval systems, natural language query is the more ideal interface, compared with keyword and boolean expressions. This paper proposes a retrieval technique with expanded keyword from syntactically-analyzed structures of natural language query as user input. Through the steps combining or splitting the compound nouns based on syntactic tree traversal of the query, and expanding the other-formed or shorten-formed into multiple keyword, it can enhance the precision and correctness of the retrieval system.