Design of Keyword Extraction System Using TFIDF

TFIDF를 이용한 키워드 추출 시스템 설계

  • Published : 2002.03.01

Abstract

In this paper, a test was performed to determine whether words in Anchor Text were appropriate as key words. As a result of the test. there were proper words of high weighting factor, while some others did not even appear in the text. therefore, were not appropriate as key words. In order to resolve this problem. a new method was proposed to extract key words. Using the proposed method, inappropriate key words can be removed so that new key words be set, and then, ranking becomes possible with the TFIDF value as a weighting factor of the key word. It was verified that the new method has higher accuracy compared to the previous methods.

본 논문에서는 먼저 Anchor Text의 단어들이 키워드로 적합한지 TFIDF를 이용하여 테스트하였다. 그 결과는 가중치가 높아서 키워드로 적합한 단어가 있었는가 하면. 아예 문서에 나오지도 않는 단어가 있어 키워드로 적합하지 않은 단어도 있었다. 이를 해결하기 위하여 새로운 키워드 추출 방법을 제시하였다. 본 논문에서는 적합하지 않은 키워드를 제거함으로써 새로운 키워드를 만들어 내고 TFIDF값을 각 키워드의 가중치로 이용하여 Ranking이 가능하게 하였다. 이렇게 추출된 키워드는 기존의 방법보다 정확도가 높아졌음 증명했다.

Keywords

References

  1. Proceeding of the 7th International World Wide Web Conference(WWW7) The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin;Lawrence Page
  2. AAAI 1195 Spring Symposium on Information Gathering from Heterogeneous WebWatcher: A Learning Apprentice for the World Web Armstrong. R.;Fritag. D.;Joachims. T.;Michell. T.
  3. Intermation Retrieval Systems Theory and Implementaion Gerald Kowalski
  4. Agents '98 CiteSeer: An Automous Web Agent for Automatic Retrieval' and Identification of Interesting Publications Kuet D. Bollacker;Steve Lawence;C. Lee Giles
  5. Communications of the AMC v.18 no.11 A Vector Space Model for Automatic Indexing Salton G.;A.Wong;C.S. Yang
  6. Tech Report 87-881 Dept. of Computer Science Term weighting approaches in automatic text retrieval Salton. G.;Buckley. C.
  7. Science v.253 Developments in automatic text retrieval G. Salton
  8. Information Retrieval DataStructure and Algorithms William. B.;Frakes;Ricardo;Baeza/Yates