Document Summarization using Topic Phrase Extraction and Query-based Summarization

;;;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 31 Issue 4
/
Pages.488-497
/
2004
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Document Summarization using Topic Phrase Extraction and Query-based Summarization

주제어구 추출과 질의어 기반 요약을 이용한 문서 요약

한광록 (호서대학교 컴퓨터공학부) ;
오삼권 (호서대학교 컴퓨터공학) ;
임기욱 (선문대학교 지식정보산업공학과)

Published : 2004.04.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper describes the hybrid document summarization using the indicative summarization and the query-based summarization. The learning models are built from teaming documents in order to extract topic phrases. We use Naive Bayesian, Decision Tree and Supported Vector Machine as the machine learning algorithm. The system extracts topic phrases automatically from new document based on these models and outputs the summary of the document using query-based summarization which considers the extracted topic phrases as queries and calculates the locality-based similarity of each topic phrase. We examine how the topic phrases affect the summarization and how many phrases are proper to summarization. Then, we evaluate the extracted summary by comparing with manual summary, and we also compare our summarization system with summarization mettled from MS-Word.

본 논문에서는 추출 요약 방식과 질의어 기반의 요약 방식을 혼합한 문서 요약 방법에 관해서 기술한다. 학습문서를 이용해 주제어구 추출을 위한 학습 모델을 만든다. 학습 알고리즘은 Naive Bayesian, 결정트리, Supported Vector Machine을 이용한다. 구축된 모델을 이용하여 입력 문서로부터 주제어구 리스트를 자동으로 추출한다. 추출된 주제어구들을 질의어로 하여 이들의 국부적 유사도에 의한 기여도를 계산함으로써 요약문을 추출한다. 본 논문에서는 주제어구가 원문 요약에 미치는 영향과, 몇 개의 주제어구 추출이 문서 요약에 적당한지를 실험하였다. 추출된 요약문과 수동으로 추출한 요약문을 비교하여 결과를 평가하였으며, 객관적인 성능 평가를 위하여 MS-Word에 포함된 문서 요약 기능과 실험 결과를 비교하였다.

Keywords

References

Witten I. H., Paynter G W., Frank E., Gutwin C., Nevill-Manning C. G., 'KEA : Practical Automatic Keyphrase Extracting,' ACM Digital Library, pp. 254-255, 1999 https://doi.org/10.1145/313238.313437
Owen de Kretser, Moffat. 'Needles and Haystacks: A Search Engine for Personal Information Collections,' Proceedings of the 23rd Australasian Computer Science Conference, Canberra, pp. 58-65, 2000 https://doi.org/10.1109/ACSC.2000.824381
Francine R. Chen, Dan S. Bloomberg, 'Extraction of Inductive Summary Sentences from Imaged Documents,' Proceedings of the International Conference on Document Analysis and Recognition, pp. 227-232, 1997 https://doi.org/10.1109/ICDAR.1997.619846
Min-Yen Kan, Kathleen R. McKeown, Judith L. Klavans 'Applying Natural Language Generation to Indicative Summarization,' Proceedings of the 8th European Workshop on Natural Language Generation, Toulouse, France, pp. 92-100, 2001 https://doi.org/10.3115/1117840.1117853
Min-Yen Kan, Kathleen R. McKeown, Judith L. Klavans 'Domain-specific Informative and Indicative Summarization for Information Retrieval,' Proceedings of the Document Understanding Workshop, New Orleans, USA: September, 2001
한경수, '질의 기반을 이용한 적합성 피드백 기반 자동문서 요약', 고려대학교 컴퓨터학과 석사 논문, 2000
Mark Sanderson, 'Accurate User Directed Summarization from Existing Tools,' 1999
Jade Goldstein and Chin-Yew Lin, 'Summarizing Text Documents: Sentence Selection and Evaluation Metrics,' 1999
Je Ryu, Kwang-Rok Han, etal., 'Automatic Extraction of Core Sentences from Document,' Proceedings of the International Conference on Electronics, Information & Communications, 2000
류제, 한광록, 손석원, 임기욱, '단어의 공기 관계 그래프를 이용한 문서의 핵심 문장 추출에 관한 연구', 정보처리논문지, 제7권 제11호, pp. 3427-3437, 2000
류제, 한광록, '단어의 공기 관계 그래프를 이용한 인터넷 문서의 키워드 추출', HCI2000 학술대회발표논문집 9권 1호, pp. 894-899, 2000
강승식, '한글 형태소 분석 HAM 라이브러리', http://nlp.kookmin.ac.kr
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G., 'Domain-Specific Keyphrase Extraction,' Proceedings of the 16th International Joint Conference on Artificial Intelligence, Morgan Kaufmann Publishers, San Francisco, CA, pp. 668-673, 1999
Ting, K.M., Witten, I.H., 'Issues in stacked generalization', Journal of Artificial Intelligence Research, 10, pp. 271-289, 1999
Witten I.H, Frank E, 'Data Mining Practical Machine Learning Tools and Techniques with Java Implementations,' Morgan Kaufmann Publishers, pp. 188-193. 1999
G. Salton, A. Singhal, C. Buckley, M. Mitra, 'Automatic Text Structuring and Summarization,' Information Processing & Management, 1997 https://doi.org/10.1016/S0306-4573(96)00062-3

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Document Summarization using Topic Phrase Extraction and Query-based Summarization

주제어구 추출과 질의어 기반 요약을 이용한 문서 요약

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)