A Study on the Pivoted Inverse Document Frequency Weighting Method

Lee, Jae-Yun;

doi:10.3743/KOSIM.2003.20.4.233

정보관리학회지 (Journal of the Korean Society for information Management)

제20권4호통권50호
/
Pages.233-248
/
2003
/
1013-0799(pISSN)
/
2586-2073(eISSN)

한국정보관리학회 (Korean Society for Information Management)

DOI QR Code

피벗 역문헌빈도 가중치 기법에 대한 연구

A Study on the Pivoted Inverse Document Frequency Weighting Method

이재윤 (연세대학교 문헌정보학과)

Lee, Jae-Yun

발행 : 2003.12.30

https://doi.org/10.3743/KOSIM.2003.20.4.233 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

역문헌빈도 가중치 기법은 문헌 집단에서 출현빈도가 낮을수록 색인어의 중요도가 높다는 가정에 근거하고 있다. 그런데 이는 중간빈도어를 중요하게 여기는 여타 이론과는 일치하지 않는 것이다. 이 연구에서는 저빈도어보다 중간빈도어가 더 중요하다는 가정에 근거하여 역문헌빈도 가중치 공식을 수정한 피벗 역문헌번도 가중치 기법을 제안하였다. 제안된 기법을 검증하기 위해서 세 실험집단을 대상으로 검색실험을 수행한 결과, 피벗 역문헌빈도 가중치기법이 역문헌빈도 가중치 기법에 비해서 특히 검색결과 상위에서의 성능을 향상시키는 것으로 나타났다.

The Inverse Document Frequency (IDF) weighting method is based on the hypothesis that in the document collection the lower the frequency of a term is, the more important the term is as a subject word. This well-known hypothesis is, however, somewhat questionable because some low frequency terms turn out to be insufficient subject words. This study suggests the pivoted IDF weighting method for better retrieval effectiveness, on the assumption that medium frequency terms are more important than low frequency terms. We thoroughly evaluated this method on three test collections and it showed performance improvements especially at high ranks.

키워드

참고문헌

김지영, 장동현, 맹성현, 이석훈, 서정현, 김현. 2000. 한국어 테스트 컬렉션 HANTEC의 확장 및 보완. 제12회 한글 및 한국어 정보치리 학술대회 논문집. 210-215.
Buckley, C. J. Allan, and G. Salton. 1993. "Automatic routing and ad-hoc retrieval using SMART: TREC 2." Proceedings of the Second Text REtrieval Conference (TREC 2): 45-55.
Luhn, H. P. 1958. "The automatic creation of literature abstracts." IBM Journal of Research and Development. 2(2): 159-165. 재인용: 정영미, '정보검색론'. (개정판) 서울: 구미무역(주) 출판부. 1993. https://doi.org/10.1147/rd.22.0159
Robertson, S. E. 1972. "Term specificity." Journal of Documentation. 28(2): 164.
Robertson, S. E., and Karen Sparck Jones. 1976. "Relevance weighting of search terms." Journal of the American Society for Information Science. 27. 129-146. https://doi.org/10.1002/asi.4630270302
Roelleke, T. 2003. "A frequency-based and a Poisson-based definition of the probability of being informative." Proceedings of the 26th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. 227-234.
Salton, G., C. S. Yang. and C. T. Yu. 1975. "A theory of term importance in automatic text analysis." Journal of the American Society for Information Science, 26(1): 33-44. https://doi.org/10.1002/asi.4630260106
Singhal, Amit. 1997. Term Weighting Revisited. Ph.D. Thesis, Department of Computer Science, Cornell University.
Sparck Jones, Karen. 1972. "A statistical interpretation of term specificity and its application in retrieval." Journal of Documentation. 28 (1): 11-21. https://doi.org/10.1108/eb026526

피인용 문헌

Examining the Intellectual Structure of Housing Studies in Korea with Text Mining and Factor Analysis vol.44, pp.2, 2010, https://doi.org/10.4275/KSLIS.2010.44.2.285
The Topic-Rank Technique for Enhancing the Performance of Blog Retrieval vol.16, pp.1, 2011, https://doi.org/10.9708/jksci.2011.16.1.019

정보관리학회지 (Journal of the Korean Society for information Management)

피벗 역문헌빈도 가중치 기법에 대한 연구

A Study on the Pivoted Inverse Document Frequency Weighting Method

초록

키워드

참고문헌

피인용 문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)