Document Summarization Based on Sentence Clustering Using Graph Division

Lee Il-Joo;Kim Min-Koo;

doi:10.3745/KIPSTB.2006.13B.2.149

정보처리학회논문지B (The KIPS Transactions:PartB)

제13B권2호
/
Pages.149-154
/
2006
/
1598-284X(pISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

그래프 분할을 이용한 문장 클러스터링 기반 문서요약

Document Summarization Based on Sentence Clustering Using Graph Division

이일주 (동원대학 모바일컨텐츠과) ;
김민구 (아주대학교 정보 및 컴퓨터공학부)

Lee Il-Joo ;
Kim Min-Koo

발행 : 2006.04.01

https://doi.org/10.3745/KIPSTB.2006.13B.2.149 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

문서요약은 여러 개의 하위 주제로 구성되어 있는 문서에 대해 문서의 복잡도를 줄이면서 하위 주제를 모두 포함하는 요약문을 생성하는 것이 목적이다. 본 논문은 그래프 분할을 이용하여 하위 주제별로 중요 문장을 추출하는 요약시스템을 제안한다. 문장별 공기정보에 의한 단어의 연관성 분석을 통해 선정된 대표어를 이용하여 문서를 그래프로 표현한다. 그래프는 연결정보에 의해 하위 주제를 의미하는 부분 그래프로 분할되며 부분 그래프는 긴밀한 관계를 갖는 문장들이 클러스터링된 형태이다. 부분 그래프별로 중요 문장을 추출하면 하위 주제별 핵심 내용들로만 요약문을 구성하게 되어 요약 성능이 향상된다.

The main purpose of document summarization is to reduce the complexity of documents that are consisted of sub-themes. Also it is to create summarization which includes the sub-themes. This paper proposes a summarization system which could extract any salient sentences in accordance with sub-themes by using graph division. A document can be represented in graphs by using chosen representative terms through term relativity analysis based on co-occurrence information. This graph, then, is subdivided to represent sub-themes through connected information. The divided graphs are types of sentence clustering which shows a close relationship. When salient sentences are extracted from the divided graphs, summarization consisted of core elements of sentences from the sub-themes can be produced. As a result, the summarization quality will be improved.

키워드

참고문헌

Inderjeet Mani, Automatic Summarization, John Benjarnins Publishing Co., 2001
Mary McKenna, Elizabeth D.Liddy, 'Evaluation of Automatic Text Summarization Across Multiple Documents,' MAl Symposium, 1998
H.P.Edmundson, 'New Methods in Automatic Extracting,' Journal of the ACM, 16(2), 1969 https://doi.org/10.1145/321510.321519
Marti A Hearst, 'Multi-paragraph segmentation of expository text,' In Proceedings of the 32nd Annual Meeting of the ACL, June, 1994 https://doi.org/10.3115/981732.981734
Salton.G., Singhal.A., Mitra.M., and Buckly.C., 'Automatic text structuring and summarization,' Information Processing and Management, Vol.33, No.2, 1997 https://doi.org/10.1016/S0306-4573(96)00062-3
류동원, 이종혁, '단어공기정보를 이용한 자동화 문서요약' 한국정보과학회학술논문발표지 27권 1호, pp.345-347, 2000
류제, '단어의 공기 관계 그래프를 이용한 문서의 핵심 문장 ？추출에 관한 연구' 호서대학교 벤처전문대학원 석사학위논문, 2000
정영미, 최상희, '문장 클러스터링에 기반한 자동요약 모형' 한국정보관리학회지, 제18권 3호, pp.159-178, 2001
박성배, 장병탁, 'Co-Trained Support Vector Machines을 이용한 문서분류' 한국정보과학회 봄 학술발표 논문집 (B), 제29권 1호, pp. 259-261, 2002
Julian Kupiec, Jan Pedersen, and Francine Chen, 'A Trainable Document Summarizer,' In Proceedings of ACM-SIGIR'95, pp.68-73,1995 https://doi.org/10.1145/215206.215333
Barzilay, Regina and Michael Elhadad, 'Lexical Chains for Text Summarization', Master's thesis, Ben-Gurion University, 1997
C.J.van Rijsbergen., 'A Theoritical Basis for the Use of Co-occurrence Data in Information Retrieval,' Journal of Documentation.Vol.33:106-119,1977 https://doi.org/10.1108/eb026637
김재훈, 김준홍, '도합유사도를 이용한 한국어 문서요약 시스템' 한국 인지과학회 논문지 제12권 제1.2호, pp.35-42, 2001
Skorochodko,E.F., 'Adaptive method of automatic abstracting and indexing,' Information Processing 71: Processing of the IFIP Congress 71, ed. by Freiman, pp.1179-1182, NorthHolland Publishing Company, 1972
김철언, 그래프론과 알고리듬, POSTEC PRESS, 1997
Sparck Jones, K., 'Automatic summarizing.factors and directions,' Advances in Automatic Text Summarization, pp.1-12, The MIT Press. 1999
Morris. A.H., Kasper and G.M, Adams. D.A., 'The effects and limitations of automated text condensing on reading comprehension performance,' Information systems Research, 3(1), pp.17-35, 1992 https://doi.org/10.1287/isre.3.1.17
http://www.itl.nistgov/iaui/894.02/Irelated_projects/tipster_sumnac
http://www.isi.edu/-cyl/ROUGE/

정보처리학회논문지B (The KIPS Transactions:PartB)

그래프 분할을 이용한 문장 클러스터링 기반 문서요약

Document Summarization Based on Sentence Clustering Using Graph Division

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)