A Study on Keyword Extraction From a Single Document Using Term Clustering

용어 클러스터링을 이용한 단일문서 키워드 추출에 관한 연구

  • 한승희 (서울여자대학교 사회과학대학 문헌정보학과)
  • Received : 2010.07.19
  • Accepted : 2010.08.11
  • Published : 2010.08.30


In this study, a new keyword extraction algorithm is applied to a single document with term clustering. A single document is divided by multiple passages, and two ways of calculating similarities between two terms are investigated; the first-order similarity and the second-order distributional similarity. In this experiment, the best cluster performance is achieved with a 50-term passage from the second-order distributional similarity. From the results of first experiment, the second-order distribution similarity was also applied to various keyword extraction methods using statistic information of terms. In the second experiment, pf(paragraph frequency) and $tf{\times}ipf$(term frequency by inverse paragraph frequency) were found to improve the overall performance of keyword extraction. Therefore, it showed that the algorithm fulfills the necessary conditions which good keywords should have.


Term Clustering;Keyword Extraction;Single Document;Second-order Similarity;Text Mining


Supported by : 서울여자대학교 사회과학연구소


