Shannon's Information Theory and Document Indexing

Chung Young Mee;

Journal of the Korean Society for Library and Information Science (한국문헌정보학회지)

Volume 6
/
Pages.87-103
/
1979
/
1225-598X(pISSN)

Korean Society For Library And Information Science (한국문헌정보학회)

Shannon's Information Theory and Document Indexing

Shannon의 정보이론과 문헌정보

Chung Young Mee (Yonsei University)

정영미 (연세대학교 도서관학과)

Published : 1979.12.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Information storage and retrieval is a part of general communication process. In the Shannon's information theory, information contained in a message is a measure of -uncertainty about information source and the amount of information is measured by entropy. Indexing is a process of reducing entropy of information source since document collection is divided into many smaller groups according to the subjects documents deal with. Significant concepts contained in every document are mapped into the set of all sets of index terms. Thus index itself is formed by paired sets of index terms and documents. Without indexing the entropy of document collection consisting of N documents is $log_2\;N$, whereas the average entropy of smaller groups $(W_1,\;W_2,...W_m)$ is as small $(as\;(\sum\limits^m_{i=1}\;H(W_i))/m$. Retrieval efficiency is a measure of information system's performance, which is largely affected by goodness of index. If all and only documents evaluated relevant to user's query can be retrieved, the information system is said $100\%$ efficient. Document file W may be potentially classified into two sets of relevant documents and non-relevant documents to a specific query. After retrieval, the document file W' is reclassified into four sets of relevant-retrieved, relevant-not retrieved, non-relevant-retrieved and non-relevant-not retrieved. It is shown in the paper that the difference in two entropies of document file Wand document file W' is a proper measure of retrieval efficiency.

Journal of the Korean Society for Library and Information Science (한국문헌정보학회지)

Shannon's Information Theory and Document Indexing

Shannon의 정보이론과 문헌정보

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)