Search | Korea Science

Document Summarization via Convex-Concave Programming

Kim, Minyoung
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.16 no.4
- /
- pp.293-298
- /
- 2016
Document summarization is an important task in various areas where the goal is to select a few the most descriptive sentences from a given document as a succinct summary. Even without training data of human labeled summaries, there has been several interesting existing work in the literature that yields reasonable performance. In this paper, within the same unsupervised learning setup, we propose a more principled learning framework for the document summarization task. Specifically we formulate an optimization problem that expresses the requirements of both faithful preservation of the document contents and the summary length constraint. We circumvent the difficult integer programming originating from binary sentence selection via continuous relaxation and the low entropy penalization. We also suggest an efficient convex-concave optimization solver algorithm that guarantees to improve the original objective at every iteration. For several document datasets, we demonstrate that the proposed learning algorithm significantly outperforms the existing approaches.
https://doi.org/10.5391/IJFIS.2016.16.4.293 인용 PDF KSCI

Text Extraction and Summarization from Web News (웹 뉴스의 기사 추출과 요약)

Han, Kwang-Rok;Sun, Bok-Keun;Yoo, Hyoung-Sun
- Journal of the Korea Society of Computer and Information
- /
- v.12 no.5
- /
- pp.1-10
- /
- 2007
Many types of information provided through the web including news contents contain unnecessary clutters. These clutters make it difficult to build automated information processing systems such as the summarization, extraction and retrieval of documents. We propose a system that extracts and summarizes news contents from the web. The extraction system receives news contents in HTML as input and builds an element tree similar to DOM tree, and extracts texts while removing clutters with the hyperlink attribute in the HTML tag from the element tree. Texts extracted through the extraction system are transferred to the summarization system, which extracts key sentences from the texts. We implement the summarization system using co-occurrence relation graph. The summarized sentences of this paper are expected to be transmissible to PDA or cellular phone by message services such as SMS.
PDF

Document Summarization Method using Complete Graph (완전그래프를 이용한 문서요약 연구)

Lyu, Jun-Hyun;Park, Soon-Cheol
- Journal of Korea Society of Industrial Information Systems
- /
- v.10 no.2
- /
- pp.26-31
- /
- 2005
In this paper, we present the document summarizers which are simpler and more condense than the existing ones generally used in the web search engines. This method is a statistic-based summarization method using the concept of the complete graph. We suppose that each sentence as a vertex and the similarity between two sentences as a link of the graph. We compare this summarizer with those of Clustering and MMR techniques which are well-known as the good summarization methods. For the comparison, we use FScore using the summarization results generated by human subjects. Our experimental results verify the accuracy of this method, being about $30\%$ better than the others.
PDF

Activity-based key-frame detection and video summarization in a wide-area surveillance system (광범위한 지역 감시시스템에서의 행동기반 키프레임 검출 및 비디오 요약)

Kwon, Hye-Young;Lee, Kyoung-Mi
- Journal of Internet Computing and Services
- /
- v.9 no.3
- /
- pp.169-178
- /
- 2008
In this paper, we propose a video summarization system which is based on activity in video acquired by multiple non-overlapping cameras for wide-area surveillance. The proposed system separates persons by time-independent background removal and detects activities of the segmented persons by their motions. In this paper, we extract eleven activities based on whose direction the persons move to and consider a key-frame as a frame which contains a meaningful activity. The proposed system summarizes based on activity-based key-frames and controls an amount of summarization according to an amount of activities. Thus the system can summarize videos by camera, time, and activity.
PDF

Multi-document Summarization Based on Cluster using Term Co-occurrence (단어의 공기정보를 이용한 클러스터 기반 다중문서 요약)

Lee, Il-Joo;Kim, Min-Koo
- Journal of KIISE:Software and Applications
- /
- v.33 no.2
- /
- pp.243-251
- /
- 2006
In multi-document summarization by means of salient sentence extraction, it is important to remove redundant information. In the removal process, the similarities and differences of sentences are considered. In this paper, we propose a method for multi-document summarization which extracts salient sentences without having redundant sentences by way of cohesive term clustering method that utilizes co-occurrence Information. In the cohesive term clustering method, we assume that each term does not exist independently, but rather it is related to each other in meanings. To find the relations between terms, we cluster sentences according to topics and use the co-occurrence information oi terms in the same topic. We conduct experimental tests with the DUC(Document Understanding Conferences) data. In the tests, our method shows better performance of summarization than other summarization methods which use term co-occurrence information based on term cohesion of document or sentence unit, and simple statistical information.
PDF KSCI

Empirical Study for Automatic Evaluation of Abstractive Summarization by Error-Types (오류 유형에 따른 생성요약 모델의 본문-요약문 간 요약 성능평가 비교)

Seungsoo Lee;Sangwoo Kang
- Korean Journal of Cognitive Science
- /
- v.34 no.3
- /
- pp.197-226
- /
- 2023
Generative Text Summarization is one of the Natural Language Processing tasks. It generates a short abbreviated summary while preserving the content of the long text. ROUGE is a widely used lexical-overlap based metric for text summarization models in generative summarization benchmarks. Although it shows very high performance, the studies report that 30% of the generated summary and the text are still inconsistent. This paper proposes a methodology for evaluating the performance of the summary model without using the correct summary. AggreFACT is a human-annotated dataset that classifies the types of errors in neural text summarization models. Among all the test candidates, the two cases, generation summary, and when errors occurred throughout the summary showed the highest correlation results. We observed that the proposed evaluation score showed a high correlation with models finetuned with BART and PEGASUS, which is pretrained with a large-scale Transformer structure.
https://doi.org/10.19066/cogsci.2023.34.3.003 인용 PDF

Music Structure Analysis and Application (악곡구조 분석과 활용)

Seo, Jung-Bum;Bae, Jae-Hak
- The KIPS Transactions:PartB
- /
- v.14B no.1 s.111
- /
- pp.33-42
- /
- 2007
This paper presents a new methodology for music structure analysis which facilitates rhetoric-based music summarization. Similarity analysis of musical constituents suggests the structure of a musical piece. We can recognize its musical form from the structure. Musical forms have rhetorical characteristics of their on. We have utilized the characteristics for locating musical motifs. Motif extraction is to music summarization what topic sentence extraction is to text summarization. We have evaluated the effectiveness of this methodology through a popular music case study.
https://doi.org/10.3745/KIPSTB.2007.14-B.1.033 인용 PDF KSCI

Feature-Based Summarization Method for a Large Opinion Documents Collection (대용량 오피니언 문서에 대한 특성 기반 요약 기법)

Chang, Jae-Young
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.16 no.1
- /
- pp.33-42
- /
- 2016
Recently, an environment in which public opinions are expressed about various areas is expanded around SNSs or internet potals, thus, opinion documents get bigger rapidly. Under these circumstances, it is essential to utilize automatic summarization techniques for understanding whole contents of large opinion documents. However, it is hard to summarize efficiently those documents with traditional text summarization technologies since the documents include subject expressions as well as features of targets objects. Proposed method in this paper defines features of opinion documents, and designed to retrieve representative sentences expressing opinions of those features. In addition, through experiments, we prove the usefulness of proposed method.
https://doi.org/10.7236/JIIBC.2016.16.1.33 인용 PDF KSCI

Extraction of Informative Features for Automatic Indexation of Human Sensibility Ergonomic Documents (감성공학 문서 데이터의 지표 자동화를 위한 코퍼스 분석 기반 특성정보 추출)

배희숙;곽현민;채균식;이상태
- Science of Emotion and Sensibility
- /
- v.7 no.2
- /
- pp.133-140
- /
- 2004
A large number of indices are produced from human sensibility ergonomic data, which are accumulated by the project "Study on the Development of Web-Based Database System of Human Sensibility and its Support". Since the research in this field will be increased rapidly, it is necessary to automate the index processing of human sensibility ergonomic data. From the similarity between indexation and summarization, we propose the automation of this process. In this paper, we study on extraction of keywords, information types and expression features that are considered as basic elements of following techniques for automatic summarization: classification of documents, extraction of information types and linguistic features. This study can be applied to automatic summarization system and knowledge management system in the domain of human sensibility ergonomics.rgonomics.
PDF

Topic-based Multi-document Summarization Using Non-negative Matrix Factorization and K-means (비음수 행렬 분해와 K-means를 이용한 주제기반의 다중문서요약)

Park, Sun;Lee, Ju-Hong
- Journal of KIISE:Software and Applications
- /
- v.35 no.4
- /
- pp.255-264
- /
- 2008
This paper proposes a novel method using K-means and Non-negative matrix factorization (NMF) for topic -based multi-document summarization. NMF decomposes weighted term by sentence matrix into two sparse non-negative matrices: semantic feature matrix and semantic variable matrix. Obtained semantic features are comprehensible intuitively. Weighted similarity between topic and semantic features can prevent meaningless sentences that are similar to a topic from being selected. K-means clustering removes noises from sentences so that biased semantics of documents are not reflected to summaries. Besides, coherence of document summaries can be enhanced by arranging selected sentences in the order of their ranks. The experimental results show that the proposed method achieves better performance than other methods.
PDF KSCI

Search Result 144, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)