A Korean Text Summarization System Using Aggregate Similarity

도합유사도를 이용한 한국어 문서요약 시스템

  • 김재훈 (한국해양대하가교 컴퓨터공학과 및 첨단정보기술연구센터) ;
  • 김준홍 ((주)휴니트 테크놀로지스 기술연구소/통신연구팀)
  • Published : 2001.06.01

Abstract

In this paper. a document is represented as a weighted graph called a text relationship map. In the graph. a node represents a vector of nouns in a sentence, an edge completely connects other nodes. and a weight on the edge is a value of the similarity between two nodes. The similarity is based on the word overlap between the corresponding nodes. The importance of a node. called an aggregate similarity in this paper. is defined as the sum of weights on the links connecting it to other nodes on the map. In this paper. we present a Korean text summarization system using the aggregate similarity. To evaluate our system, we used two test collection, one collection (PAPER-InCon) consists of 100 papers in the field of computer science: the other collection (NEWS) is composed of 105 articles in the newspapers and had built by KOROlC. Under the compression rate of 20%. we achieved the recall of 46.6% (PAPER-InCon) and 30.5% (NEWS) and the precision of 76.9% (PAPER-InCon) and 42.3% (NEWS).

Keywords

Noun extraction;Aggregate similarity;Text summarization