• 제목/요약/키워드: Retrieved Documents

검색결과 98건 처리시간 0.017초

Shannon의 정보이론과 문헌정보 (Shannon's Information Theory and Document Indexing)

  • 정영미
    • 한국문헌정보학회지
    • /
    • 제6권
    • /
    • pp.87-103
    • /
    • 1979
  • Information storage and retrieval is a part of general communication process. In the Shannon's information theory, information contained in a message is a measure of -uncertainty about information source and the amount of information is measured by entropy. Indexing is a process of reducing entropy of information source since document collection is divided into many smaller groups according to the subjects documents deal with. Significant concepts contained in every document are mapped into the set of all sets of index terms. Thus index itself is formed by paired sets of index terms and documents. Without indexing the entropy of document collection consisting of N documents is $log_2\;N$, whereas the average entropy of smaller groups $(W_1,\;W_2,...W_m)$ is as small $(as\;(\sum\limits^m_{i=1}\;H(W_i))/m$. Retrieval efficiency is a measure of information system's performance, which is largely affected by goodness of index. If all and only documents evaluated relevant to user's query can be retrieved, the information system is said $100\%$ efficient. Document file W may be potentially classified into two sets of relevant documents and non-relevant documents to a specific query. After retrieval, the document file W' is reclassified into four sets of relevant-retrieved, relevant-not retrieved, non-relevant-retrieved and non-relevant-not retrieved. It is shown in the paper that the difference in two entropies of document file Wand document file W' is a proper measure of retrieval efficiency.

  • PDF

탐색자의 주제배경이 탐색효과에 미치는 영향 (A Study on the Effect of the Searcher색s Subject Background on the Result of Online Database Searches)

  • 이근봉
    • 한국비블리아학회지
    • /
    • 제7권1호
    • /
    • pp.293-317
    • /
    • 1994
  • The Purpose of this study is to verify the effect of the searcher's subject background on the result of online database searches. To achieve this purpose, an experimental method was adopted. 180 students performed online searches in the three different libraries chosen for this study. The subjects were classified into two groups according to the scores of the test. Data concerning processes, behavior, and results of the searches performed by the subjects in real situations were gathered. Immediately following the searches, the extent of their subject background were assessed through interview. The search effect consists of the 4 elements: search efficiency (the number of terms used per unit time), the number of relevant documents, the number of relevant documents per unit time, precision ratio. The major findings of this study are summarized as belows. 1. The searchers with strong subject background has significantly higher efficiency in searches made. Group A (of those with strong subject back-ground) use more search terms per unit time than Group B (of those with weak subject background) do. 2. In the searches made by those with strong subject background, more relevant documents art retrieved. 3. In the searches made by those with strong subject background, more relevant documents per unit time are retrieved. 4. The searchers with strong subject background has significantly higher precision ratio in searches made. In the searches made by those with strong subject background, more relevant documents of documents retrieved are retrieved.

  • PDF

웹 정보자원의 내용기술을 위한 Keywords와 Description 메타테그 활용도에 관한 연구 (A Study on the Use of Description and keywords Meta Tags for the Content of WWW Resources)

  • 최재황;조현양
    • 한국도서관정보학회지
    • /
    • 제32권2호
    • /
    • pp.307-322
    • /
    • 2001
  • The purpose of this study is to investigate how and which meta tags are used, which meta tags are used frequently, and what relationships there are between retrieval of WWW documents and meta tags. For the study, 1,000 WWW documents were selected and examined from OCLC NetFirst. The total of 92 meta tags was discovered and "description" and "keywords"meta tags were analyzed intensively. In addition, analysis of WWW documents showed that there are no significant relationships in meta tag usages between documents retrieved at the beginning and documents retrieved at the end. Comparative study between general internet search engines and commercial DBs such as NetFirst is suggested as a further study.

  • PDF

완전 매치와 부분 매치 검색 기법에 관한 연구 ((A Study of an Exact Match and a Partial Match as an Information Retrieval Technique))

  • 김영귀
    • 정보관리학회지
    • /
    • 제7권1호
    • /
    • pp.79-95
    • /
    • 1990
  • 본 연구는 그동안 연구되고 개발된 여러 검색 기법을 검색된 문헌 집합의 특성과 사용된 표현에 의해서 완전 매치 검색과 부분 매치 검색으로 구분하였다. 완전 매치는 부울 논리가 그 대표적이며 현행 대부분의 정보 검색 시스템에서 사용하고 있는 검색 기법이다. 부분 매치는 부울논리가 가지고 있는 문제점과 한계점을 극복하기 위한 대한으로서 많은 연 구가 있었으나 그 본질은 부울 논리 구조안에서 검색을 향상시킨다는 점에서 한계를 가질수 밖에 없다 하겠다. 대표적인 예로 확률 검색, 벡터 공간 모델, 그리고 퍼지 집합을 대상으로 두 검색 기법을 비교하고 앞으로의 검색 기법이 나아가야 할 방향을 제시하였다.

  • PDF

네트웍 데이터베이스에서의 주제별 디렉토리와 키워드 검색엔진의 검색효율에 관한 탐색적 연구 (An Exploratory Study of Performances between a Subject Directory and Keyword Search Engine in the Network Databases)

  • 이명희
    • 한국문헌정보학회지
    • /
    • 제31권2호
    • /
    • pp.177-197
    • /
    • 1997
  • 본 연구는 주제별 디렉토리인 Yahoo와 키워드 검색엔진인 Alta Vista가 대학도서관 이용자들에 의해 제기된 탐색질문에 대해 얼마나 적합한 문헌을 탐색해 내는지 알아보기 위하여 탐색적 연구의 형태로 진행되었다. 탐색결과는 검색된 문헌의 양, 검색된 적합문헌의 양, 재현율, 정확률의 측정기준에 의해 평가되었다 특히 Alta Vista는 특정적이고 전문적인 용어의 탐색에 적합한 반면 Yahoo는 일반적이며 추상적인 용어의 탐색에 적합한 것으로 드러났다.

  • PDF

Department of Computer Science, Chosun University

  • Young-cheon kim;Moon, You-Mi;Lee, Sung-joo
    • 한국지능시스템학회논문지
    • /
    • 제11권7호
    • /
    • pp.659-665
    • /
    • 2001
  • Relevance feedback is the most popular query reformulation strategy in a relevance feedback cycle, the user is presented with a list of the retrieved documents and, after examining them, marks those which are relevant. In practice, only the top 10(or 20) ranked documents need to be examined. The main idea consists of selecting important terms, or expressions, attached to the documents that have been identified as relevant by the user, and of enhancing the importance of these terms in a new query formulation. The expected effect is that the new query will be moved towards the relevant documents and away from the non-relevant ones. Local analysis techniques are interesting because they take advantage of the local context provided with the query. In this regard, they seem more appropriate than global analysis techniques. In a local strategy, the documents retrieved for a given query q are examined at query time to determine terms for query expansion. This is similar to a relevance feedback cycle but might be done without assistance from the user.

  • PDF

A Study on Improving the Effectiveness of Information Retrieval Through P-norm, RF, LCAF

  • Kim, Young-cheon;Lee, Sung-joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제2권1호
    • /
    • pp.9-14
    • /
    • 2002
  • Boolean retrieval is simple and elegant. However, since there is no provision for term weighting, no ranking of the answer set is generated. As a result, the size of the output might be too large or too small. Relevance feedback is the most popular query reformulation strategy. in a relevance feedback cycle, the user is presented with a list of the retrieved documents and, after examining them, marks those which are relevant. In practice, only the top 10(or 20) ranked documents need to be examined. The main idea consists of selecting important terms, or expressions, attached to the documents that have been identified as relevant by the user, and of enhancing the importance of these terms in a new query formulation. The expected effect is that the new query will be moved towards the relevant documents and away from the non-relevant ones. Local analysis techniques are interesting because they take advantage of the local context provided with the query. In this regard, they seem more appropriate than global analysis techniques. In a local strategy, the documents retrieved for a given query q are examined at query time to determine terms for query expansion. This is similar to a relevance feedback cycle but might be done without assistance from the user.

검색문헌의 적합문헌 선정에 있어 영향을 미치는 요인에 관한 연구 (A Study on the Effects of the Selection of Relevant Documents over Retrieval Documents)

  • 이상렬;최성진
    • 한국정보관리학회:학술대회논문집
    • /
    • 한국정보관리학회 1996년도 제3회 학술대회 논문집
    • /
    • pp.11-14
    • /
    • 1996
  • The purpose of this study is to verify the hypothesis that the end-user's standards of the selection over retrieved documents affect the selecting of relevant documents after online bibliographic databases searching. To achieve the above-mentioned purpose, online-questionnaires were distributed, via e-mail, to end-users of using online bibliographic databases.

  • PDF

수자원문헌의 주제탐색과 인용탐색의 검색효율 비교 연구 (Retrieval Effectiveness of Subject Descriptor and Citation Searching in the Water Resources Literature)

  • 이명희
    • 한국문헌정보학회지
    • /
    • 제26권
    • /
    • pp.213-233
    • /
    • 1994
  • This study measured whether subject descriptor searching and citation searching retrieve different documents for conceptual queries and methodological queries in natural science, engineering and social science. The retrieval effectiveness of two search methods was measured using as criteria, total number of documents retrieved, total number of relevant documents, overlapping and unique documents and precision ratio. The search subject was water resources and the databases used were Selected Water Resources Abstracts (SWRA) and SCISEARCH. Data were collected for 21 doctoral students working on their dissertations in the three fields of water resources. Principal findings included: 1) subject searching and citation searching each retrieved substantially equal number of documents; 2) total number of relevant documents for conceptual queries was larger than that for methodological queries, while there was a large variation among the three fields; 3) the average overlap was quite small, while citation searching yielded more unique documents than subject searching; 4) for conceptual queries, citation searching yielded a higher precision ratio than subject searching, while subject searching obtained a slightly higher precision ratio than citation searching for methodological queries ; and 5) citation searching was effective for both specific queries and broad queries if seed articles are well chosen, while subject searching only worked well for broad queries. It was further found that: 1) citation searching is not a subsidiary but a substantial retrieval method in water resources; 2) SWRA is effective for queries for engineering and SCISEARCH is appropriate for queries for natural science, while neither SWRA nor SCISEARCH work well for queries for social science; and 3) characteristics of queries affect retrieval results more than the characteristics of documents or the coverage of databases.

  • PDF

인터넷상의 메타탐색엔진의 검색효율성 비교연구 (The study on the retrieval effectiveness of meta-search engine on the internet)

  • 김성희
    • 한국도서관정보학회지
    • /
    • 제27권
    • /
    • pp.457-483
    • /
    • 1997
  • This study was intended to compare the effectiveness of the Savvy search and Metacrawler in terms of the total number of relevant documents retrieved, precision, recall, and the number of deadlines. In addition, this study measured whether the Meta-search engine and general web search engines retrieved different web documents. As a result, Savvy search produced a higher precision and recall as compared with motacrawler search engine while the metacrawler had lower deadlines ration than savvy search, Also, Meta search engine was more effective than the general web search engine, The results show that the hybrid methodology of integrating a variety of web search engines can help solve retrieval effectiveness problems on the Internet.

  • PDF