• Title/Summary/Keyword: Retrieved Documents

Search Result 98, Processing Time 0.017 seconds

Shannon's Information Theory and Document Indexing (Shannon의 정보이론과 문헌정보)

  • Chung Young Mee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.6
    • /
    • pp.87-103
    • /
    • 1979
  • Information storage and retrieval is a part of general communication process. In the Shannon's information theory, information contained in a message is a measure of -uncertainty about information source and the amount of information is measured by entropy. Indexing is a process of reducing entropy of information source since document collection is divided into many smaller groups according to the subjects documents deal with. Significant concepts contained in every document are mapped into the set of all sets of index terms. Thus index itself is formed by paired sets of index terms and documents. Without indexing the entropy of document collection consisting of N documents is $log_2\;N$, whereas the average entropy of smaller groups $(W_1,\;W_2,...W_m)$ is as small $(as\;(\sum\limits^m_{i=1}\;H(W_i))/m$. Retrieval efficiency is a measure of information system's performance, which is largely affected by goodness of index. If all and only documents evaluated relevant to user's query can be retrieved, the information system is said $100\%$ efficient. Document file W may be potentially classified into two sets of relevant documents and non-relevant documents to a specific query. After retrieval, the document file W' is reclassified into four sets of relevant-retrieved, relevant-not retrieved, non-relevant-retrieved and non-relevant-not retrieved. It is shown in the paper that the difference in two entropies of document file Wand document file W' is a proper measure of retrieval efficiency.

  • PDF

A Study on the Effect of the Searcher색s Subject Background on the Result of Online Database Searches (탐색자의 주제배경이 탐색효과에 미치는 영향)

  • 이근봉
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.7 no.1
    • /
    • pp.293-317
    • /
    • 1994
  • The Purpose of this study is to verify the effect of the searcher's subject background on the result of online database searches. To achieve this purpose, an experimental method was adopted. 180 students performed online searches in the three different libraries chosen for this study. The subjects were classified into two groups according to the scores of the test. Data concerning processes, behavior, and results of the searches performed by the subjects in real situations were gathered. Immediately following the searches, the extent of their subject background were assessed through interview. The search effect consists of the 4 elements: search efficiency (the number of terms used per unit time), the number of relevant documents, the number of relevant documents per unit time, precision ratio. The major findings of this study are summarized as belows. 1. The searchers with strong subject background has significantly higher efficiency in searches made. Group A (of those with strong subject back-ground) use more search terms per unit time than Group B (of those with weak subject background) do. 2. In the searches made by those with strong subject background, more relevant documents art retrieved. 3. In the searches made by those with strong subject background, more relevant documents per unit time are retrieved. 4. The searchers with strong subject background has significantly higher precision ratio in searches made. In the searches made by those with strong subject background, more relevant documents of documents retrieved are retrieved.

  • PDF

A Study on the Use of Description and keywords Meta Tags for the Content of WWW Resources (웹 정보자원의 내용기술을 위한 Keywords와 Description 메타테그 활용도에 관한 연구)

  • 최재황;조현양
    • Journal of Korean Library and Information Science Society
    • /
    • v.32 no.2
    • /
    • pp.307-322
    • /
    • 2001
  • The purpose of this study is to investigate how and which meta tags are used, which meta tags are used frequently, and what relationships there are between retrieval of WWW documents and meta tags. For the study, 1,000 WWW documents were selected and examined from OCLC NetFirst. The total of 92 meta tags was discovered and "description" and "keywords"meta tags were analyzed intensively. In addition, analysis of WWW documents showed that there are no significant relationships in meta tag usages between documents retrieved at the beginning and documents retrieved at the end. Comparative study between general internet search engines and commercial DBs such as NetFirst is suggested as a further study.

  • PDF

(A Study of an Exact Match and a Partial Match as an Information Retrieval Technique) (완전 매치와 부분 매치 검색 기법에 관한 연구)

  • 김영귀
    • Journal of the Korean Society for information Management
    • /
    • v.7 no.1
    • /
    • pp.79-95
    • /
    • 1990
  • A retrieval technique was defined as a technique for comparing the document representations. So this study classified retrieval technique in terms of the charactristics of the retrieved set of documents and the representations that are used. The distinction is whether the set of retrieved documents contains only documents whose representations are an exact match with the query, or a partial match with query. For a partial match, the set of retrieved document will include also those that are an exact match with the query. Boolean-logic as one of the exact match retrieval techniques is in current in most of the large operational information retrieval systems despite of its problems and limitatlons. Partial match as an alternative technique has also various problems. Existing information retrieval systems are successful in aSSisting the user whose needs are well- defined (e.g. Boolean-logic), to retrieve relevant documents but it should be successful in providing retrieval assistance to the browser whose information requirements is ill-defined.

  • PDF

An Exploratory Study of Performances between a Subject Directory and Keyword Search Engine in the Network Databases (네트웍 데이터베이스에서의 주제별 디렉토리와 키워드 검색엔진의 검색효율에 관한 탐색적 연구)

  • Lee Myeong-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.31 no.2
    • /
    • pp.177-197
    • /
    • 1997
  • The study measured whether two search engines retrieve different Web documents for 6 queries. Two different search engines, Alta Vista in terms of keyword search engines and Yahoo in terms of subject directory engines were measured using as criteria, total number of documents retrieved, total number of relevant documents retrieved, recall and precision ratios. In addition, Alta Vista was suitable for specific and technical terms, while Yahoo was effective for general and plain terms. However, more elaborate research needs to be tested in terms of query characteristics.

  • PDF

Department of Computer Science, Chosun University

  • Young-cheon kim;Moon, You-Mi;Lee, Sung-joo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.7
    • /
    • pp.659-665
    • /
    • 2001
  • Relevance feedback is the most popular query reformulation strategy in a relevance feedback cycle, the user is presented with a list of the retrieved documents and, after examining them, marks those which are relevant. In practice, only the top 10(or 20) ranked documents need to be examined. The main idea consists of selecting important terms, or expressions, attached to the documents that have been identified as relevant by the user, and of enhancing the importance of these terms in a new query formulation. The expected effect is that the new query will be moved towards the relevant documents and away from the non-relevant ones. Local analysis techniques are interesting because they take advantage of the local context provided with the query. In this regard, they seem more appropriate than global analysis techniques. In a local strategy, the documents retrieved for a given query q are examined at query time to determine terms for query expansion. This is similar to a relevance feedback cycle but might be done without assistance from the user.

  • PDF

A Study on Improving the Effectiveness of Information Retrieval Through P-norm, RF, LCAF

  • Kim, Young-cheon;Lee, Sung-joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.2 no.1
    • /
    • pp.9-14
    • /
    • 2002
  • Boolean retrieval is simple and elegant. However, since there is no provision for term weighting, no ranking of the answer set is generated. As a result, the size of the output might be too large or too small. Relevance feedback is the most popular query reformulation strategy. in a relevance feedback cycle, the user is presented with a list of the retrieved documents and, after examining them, marks those which are relevant. In practice, only the top 10(or 20) ranked documents need to be examined. The main idea consists of selecting important terms, or expressions, attached to the documents that have been identified as relevant by the user, and of enhancing the importance of these terms in a new query formulation. The expected effect is that the new query will be moved towards the relevant documents and away from the non-relevant ones. Local analysis techniques are interesting because they take advantage of the local context provided with the query. In this regard, they seem more appropriate than global analysis techniques. In a local strategy, the documents retrieved for a given query q are examined at query time to determine terms for query expansion. This is similar to a relevance feedback cycle but might be done without assistance from the user.

A Study on the Effects of the Selection of Relevant Documents over Retrieval Documents (검색문헌의 적합문헌 선정에 있어 영향을 미치는 요인에 관한 연구)

  • 이상렬;최성진
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1996.08a
    • /
    • pp.11-14
    • /
    • 1996
  • The purpose of this study is to verify the hypothesis that the end-user's standards of the selection over retrieved documents affect the selecting of relevant documents after online bibliographic databases searching. To achieve the above-mentioned purpose, online-questionnaires were distributed, via e-mail, to end-users of using online bibliographic databases.

  • PDF

Retrieval Effectiveness of Subject Descriptor and Citation Searching in the Water Resources Literature (수자원문헌의 주제탐색과 인용탐색의 검색효율 비교 연구)

  • Lee Myeong-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.26
    • /
    • pp.213-233
    • /
    • 1994
  • This study measured whether subject descriptor searching and citation searching retrieve different documents for conceptual queries and methodological queries in natural science, engineering and social science. The retrieval effectiveness of two search methods was measured using as criteria, total number of documents retrieved, total number of relevant documents, overlapping and unique documents and precision ratio. The search subject was water resources and the databases used were Selected Water Resources Abstracts (SWRA) and SCISEARCH. Data were collected for 21 doctoral students working on their dissertations in the three fields of water resources. Principal findings included: 1) subject searching and citation searching each retrieved substantially equal number of documents; 2) total number of relevant documents for conceptual queries was larger than that for methodological queries, while there was a large variation among the three fields; 3) the average overlap was quite small, while citation searching yielded more unique documents than subject searching; 4) for conceptual queries, citation searching yielded a higher precision ratio than subject searching, while subject searching obtained a slightly higher precision ratio than citation searching for methodological queries ; and 5) citation searching was effective for both specific queries and broad queries if seed articles are well chosen, while subject searching only worked well for broad queries. It was further found that: 1) citation searching is not a subsidiary but a substantial retrieval method in water resources; 2) SWRA is effective for queries for engineering and SCISEARCH is appropriate for queries for natural science, while neither SWRA nor SCISEARCH work well for queries for social science; and 3) characteristics of queries affect retrieval results more than the characteristics of documents or the coverage of databases.

  • PDF

The study on the retrieval effectiveness of meta-search engine on the internet (인터넷상의 메타탐색엔진의 검색효율성 비교연구)

  • 김성희
    • Journal of Korean Library and Information Science Society
    • /
    • v.27
    • /
    • pp.457-483
    • /
    • 1997
  • This study was intended to compare the effectiveness of the Savvy search and Metacrawler in terms of the total number of relevant documents retrieved, precision, recall, and the number of deadlines. In addition, this study measured whether the Meta-search engine and general web search engines retrieved different web documents. As a result, Savvy search produced a higher precision and recall as compared with motacrawler search engine while the metacrawler had lower deadlines ration than savvy search, Also, Meta search engine was more effective than the general web search engine, The results show that the hybrid methodology of integrating a variety of web search engines can help solve retrieval effectiveness problems on the Internet.

  • PDF