• Title/Summary/Keyword: Sentence Extraction

Search Result 98, Processing Time 0.019 seconds

Event Sentence Extraction for Online Trend Analysis (온라인 동향 분석을 위한 이벤트 문장 추출 방안)

  • Yun, Bo-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.9
    • /
    • pp.9-15
    • /
    • 2012
  • A conventional event sentence extraction research doesn't learn the 3W features in the learning step and applies the rule on whether the 3W feature exists in the extraction step. This paper presents a sentence weight based event sentence extraction method that calculates the weight of the 3W features in the learning step and applies the weight of the 3W features in the extraction step. In the experimental result, we show that top 30% features by the $TF{\times}IDF$ weighting method is good in the feature filtering. In the real estate domain of the public issue, the performance of sentence weight based event sentence extraction method is improved by who and when of 3W features. Moreover, In the real estate domain of the public issue, the sentence weight based event sentence extraction method is better than the other machine learning based extraction method.

A Study on Extracting News Contents from News Web Pages (뉴스 웹 페이지에서 기사 본문 추출에 관한 연구)

  • Lee, Yong-Gu
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.1
    • /
    • pp.305-320
    • /
    • 2009
  • The news pages provided through the web contain unnecessary information. This causes low performance and inefficiency of the news processing system. In this study, news content extraction methods, which are based on sentence identification and block-level tags news web pages, was suggested. To obtain optimal performance, combinations of these methods were applied. The results showed good performance when using an extraction method which applied the sentence identification and eliminated hyperlink text from web pages. Moreover, this method showed better results when combined with the extraction method which used block-level. Extraction methods, which used sentence identification, were effective for raising the extraction recall ratio.

Improving the effectiveness of document extraction summary based on the amount of sentence information (문장 정보량 기반 문서 추출 요약의 효과성 제고)

  • Kim, Eun Hee;Lim, Myung Jin;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.11 no.3
    • /
    • pp.31-38
    • /
    • 2022
  • In the document extraction summary study, various methods for selecting important sentences based on the relationship between sentences were proposed. In the Korean document summary using the summation similarity of sentences, the summation similarity of the sentences was regarded as the amount of sentence information, and the summary sentences were extracted by selecting important sentences based on this. However, the problem is that it does not take into account the various importance that each sentence contributes to the entire document. Therefore, in this study, we propose a document extraction summary method that provides a summary by selecting important sentences based on the amount of quantitative and semantic information in the sentence. As a result, the extracted sentence agreement was 58.56% and the ROUGE-L score was 34, which was superior to the method using only the combined similarity. Compared to the deep learning-based method, the extraction method is lighter, but the performance is similar. Through this, it was confirmed that the method of compressing information based on semantic similarity between sentences is an important approach in document extraction summary. In addition, based on the quickly extracted summary, the document generation summary step can be effectively performed.

Joint Hierarchical Semantic Clipping and Sentence Extraction for Document Summarization

  • Yan, Wanying;Guo, Junjun
    • Journal of Information Processing Systems
    • /
    • v.16 no.4
    • /
    • pp.820-831
    • /
    • 2020
  • Extractive document summarization aims to select a few sentences while preserving its main information on a given document, but the current extractive methods do not consider the sentence-information repeat problem especially for news document summarization. In view of the importance and redundancy of news text information, in this paper, we propose a neural extractive summarization approach with joint sentence semantic clipping and selection, which can effectively solve the problem of news text summary sentence repetition. Specifically, a hierarchical selective encoding network is constructed for both sentence-level and document-level document representations, and data containing important information is extracted on news text; a sentence extractor strategy is then adopted for joint scoring and redundant information clipping. This way, our model strikes a balance between important information extraction and redundant information filtering. Experimental results on both CNN/Daily Mail dataset and Court Public Opinion News dataset we built are presented to show the effectiveness of our proposed approach in terms of ROUGE metrics, especially for redundant information filtering.

Opinion Extraction based on Syntactic Pieces

  • Aoki, Suguru;Yamamoto, Kazuhide
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.76-85
    • /
    • 2007
  • This paper addresses a task of opinion extraction from given documents and its positive/negative classification. We propose a sentence classification method using a notion of syntactic piece. Syntactic piece is a minimum unit of structure, and is used as an alternative processing unit of n-gram and whole tree structure. We compute its semantic orientation, and classify opinion sentences into positive or negative. We have conducted an experiment on more than 5000 opinion sentences of multiple domains, and have proven that our approach attains high performance at 91% precision.

  • PDF

Speech Recognition for twenty questions game (스무고개 게임을 위한 음성인식)

  • 노용완;윤재선;홍광석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.203-206
    • /
    • 2002
  • In this paper, we present a sentence speech recognizer for twenty questions game. The proposed approaches for speaker-independent sentence speech recognition can be divided into two steps. One is extraction of the number of syllables in eojeol for candidate reduction, and the other is knowledge based language model for sentence recognition. For twenty questions game, we implemented speech recognizer using 956 sentences and 1095 eojeols. The results obtained in our experiments were 87% sentence recognition rate and 90.15% eojeol recognition rate.

  • PDF

Information Extraction and Sentence Classification applied to Clinical Trial MEDLINE Abstracts

  • Hara, Kazuo;Matsumoto, Yuji
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.85-90
    • /
    • 2005
  • In this paper, firstly we report experimental results on applying information extraction (IE) methodology to the task of summarizing clinical trial design information in focus on ‘Compared Treatment’, ‘Endpoint’ and ‘Patient Population’ from clinical trial MEDLINE abstracts. From these results, we have come to see this problem as one that can be decomposed into a sentence classification subtask and an IE subtask. By classifying sentences from clinical trial abstracts and only performing IE on sentences that are most likely to contain relevant information, we hypothesize that the accuracy of information extracted from the abstracts can be increased. As preparation for testing this theory in the next stage, we conducted an experiment applying state-of-the-art sentence classification techniques to the clinical trial abstracts and evaluated its potential in the original task of the summarization of clinical trial design information.

  • PDF

Enhancing Snippet Extraction Method using Fuzzy and Semantic Features (퍼지와 의미특징을 이용한 스니핏 추출 향상 방법)

  • Park, Sun;Lee, Yeonwoo;Cho, Kwangmoon;Yang, Huyeol;Lee, Seong Ro
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.11
    • /
    • pp.2374-2381
    • /
    • 2012
  • This paper proposes a new enhancing snippet extraction method using fuzzy and semantic features. The proposed method creates a delegate of sentence by using semantic features. It extracts snippet using fuzzy association between a delegate sentence and sentence set which well represents query. In addition, the method uses pseudo relevance feedback to expand query which extracts snippet to be well reflected semantic user's intention. The experimental results demonstrate the proposed method can achieve better snippet extraction performance than the previous methods.

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

  • Hwang, Sangwon;Hong, Jang-Eui;Nam, Young-Kwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1639-1658
    • /
    • 2019
  • Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

An Experimental Study on Multi-Document Summarization for Question Answering (질의응답을 위한 복수문서 요약에 관한 실험적 연구)

  • Choi, Sang-Hee;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.3
    • /
    • pp.289-303
    • /
    • 2004
  • This experimental study proposes a multi-document summarization method that produces optimal summaries in which users can find answers to their queries. In order to identify the most effective method for this purpose, the performance of the three summarization methods were compared. The investigated methods are sentence clustering, passage extraction through spreading activation, and clustering-passage extraction hybrid methods. The effectiveness of each summarizing method was evaluated by two criteria used to measure the accuracy and the redundancy of a summary. The passage extraction method using the sequential bnb search algorithm proved to be most effective in summarizing multiple documents with regard to summarization precision. This study proposes the passage extraction method as the optimal multi-document summarization method.