DOI QR코드

DOI QR Code

An Experimental Study on Automatic Summarization of Multiple News Articles

복수의 신문기사 자동요약에 관한 실험적 연구

  • 김용광 (연세대학교 문헌정보학과 대학원) ;
  • 정영미 (연세대학교 문헌정보학과)
  • Published : 2006.03.01

Abstract

This study proposes a template-based method of automatic summarization of multiple news articles using the semantic categories of sentences. First, the semantic categories for core information to be included in a summary are identified from training set of documents and their summaries. Then, cue words for each slot of the template are selected for later classification of news sentences into relevant slots. When a news article is input, its event/accident category is identified, and key sentences are extracted from the news article and filled in the relevant slots. The template filled with simple sentences rather than original long sentences is used to generate a summary for an event/accident. In the user evaluation of the generated summaries, the results showed the 54.l% recall ratio and the 58.l% precision ratio in essential information extraction and 11.6% redundancy ratio.

이 연구에서는 복수의 신문기사를 자동으로 요약하기 위해 문장의 의미범주를 활용한 템플리트 기반 요약 기법을 제시하였다. 먼저 학습과정에서 사건/사고 관련 신문기사의 요약문에 포함할 핵심 정보의 의미범주를 식별한 다음 템플리트를 구성하는 각 슬롯의 단서어를 선정한다. 자동요약 과정에서는 입력되는 복수의 뉴스기사들을 사건/사고 별로 범주화한 후 각 기사로부터 주요 문장을 추출하여 템플리트의 각 슬롯을 채운다. 마지막으로 문장을 단문으로 분리하여 템플리트의 내용을 수정한 후 이로부터 요약문을 작성한다. 자동 생성된 요약문을 평가한 결과 요약 정확률과 요약 재현율은 각각 0.541과 0.581로 나타났고, 요약문장 중복률은 0.116으로 나타났다.

Keywords

References

  1. 정영미. 2005. '정보검색연구'. 서울: 구미무역(주) 출판부
  2. Ando, R. K., Boguraev, B. K., Byrd, R. J., and Neff, M. S. 2000. 'Multi-document Summarization by Visualizing Topical Content.' In Proceedings of the Workshop on Automatic Summarization, 79-88. New Brunswick, New Jersey: Association for Computational Linguistics
  3. DeJong, G. 1982. 'An overview of the FRUMP system'. In W G. Lehnert and M. H. Ringle, eds. Strategies for Natural Language Processing. 149-176
  4. Edmunson, H. P. 1969. 'New methods in automatic extracting.' Journal of the ACM, 16(2): 264-285 https://doi.org/10.1145/321510.321519
  5. Goldstein, J., Mittal, V. O., Carbonell, J. G., and Kantrowitz, M. 2000. 'Multi-document summarization by sentence extraction.' In Proceedings of the Workshop on Automatic Summarization 40-48
  6. Mani, I. and Bloedorn, E. 1999. 'Summarizing similarities and differences among related documents.' Information Retrieval, 1: 35-67 https://doi.org/10.1023/A:1009930203452
  7. McKeown, L. and Radev. D. 1995. 'Generating summaries of muliple news articles.' In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 74-82
  8. Myaeng, S. H. and Jang, D.H. 1999. 'Development and evaluation of a statistically based document summarization system.' In I. Mani and M. T. Maybury, eds. Advances in Automatic text Summarization. Cambridge, MA: The MIT Press. 61-70
  9. Radev, D. R., Jing, H., and Budzikowska, M. 2000. 'Centroid - based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies.' NAACL/ANLP Workshop on Automatic Summarization. 21-30
  10. Rau, L. F., Jacobs, P. S., Zernik, U. 1989. 'Information extraction and text summarization using linguistic knowledge acquisition.' Information Processing and Management, 25(4): 419-428 https://doi.org/10.1016/0306-4573(89)90069-1
  11. Stein, G .C., Bagga, A., and Wise, B. 2000. 'Multi-document summarization: methodologies and evaluations.' In Proceedings of Conference TALN 2000
  12. Yang, Y. and Pedersen, J. O. 1997. 'A comparative study on feature selection in text categorization.' In Proceedings of the 14th International Conference on Machine Learning