Search | Korea Science

Cha, Jun Seok;Kim, Jeong In;Kim, Jung Min
- Smart Media Journal
- /
- v.6 no.1
- /
- pp.22-29
- /
- 2017
Due to the rapid advancement and distribution of smart devices of late, document data on the Internet is on the sharp increase. The increment of information on the Web including a massive amount of documents makes it increasingly difficult for users to understand corresponding data. In order to efficiently summarize documents in the field of automated summary programs, various researches are under way. This study uses TextRank algorithm to efficiently summarize documents. TextRank algorithm expresses sentences or keywords in the form of a graph and understands the importance of sentences by using its vertices and edges to understand semantic relations between vocabulary and sentence. It extracts high-ranking keywords and based on keywords, it extracts important sentences. To extract important sentences, the algorithm first groups vocabulary. Grouping vocabulary is done using a scale of specific weight. The program sorts out sentences with higher scores on the weight scale, and based on selected sentences, it extracts important sentences to summarize the document. This study proved that this process confirmed an improved performance than summary methods shown in previous researches and that the algorithm can more efficiently summarize documents.
PDF KSCI

Kwon, Youngdae;Kim, Noo-ri;Lee, Jee-Hyong
- Journal of KIISE
- /
- v.44 no.2
- /
- pp.179-185
- /
- 2017
Document summarization aims to generate a summary that is consistent and contains the highly related sentences in a document. In this study, we implemented for document summarization that extracts highly related sentences from a whole document by considering both similarities and entailment relations between sentences. Accordingly, we proposed a new algorithm, TextRank-NLI, which combines a Recurrent Neural Network based Natural Language Inference model and a Graph-based ranking algorithm used in single document extraction-based summarization task. In order to evaluate the performance of the new algorithm, we conducted experiments using the same datasets as used in TextRank algorithm. The results indicated that TextRank-NLI showed 2.3% improvement in performance, as compared to TextRank.
https://doi.org/10.5626/JOK.2017.44.2.179 인용 KSCI

Lee, Sang-Kon;Lee, Wan-Kwon
- The KIPS Transactions:PartB
- /
- v.10B no.3
- /
- pp.347-358
- /
- 2003
VSField-associated term is a single or compound word whose terms occur in any document, and which makes it possible to recognize a field of text by using common knowledge of human. For example, human recognizes the field of document such as or , a field name of text, when she encounters a word 'Pitcher' or 'election', respectively We Proposes an efficient construction method of field-associated terms (FTs) for specializing field to decide a field of text. We could fix document classification scheme from well-classified document database or corpus. Considering focus field we discuss levels and stability ranks of field-associated terms. To construct a balanced FT collection, we construct a single FTs. From the collections we could automatically construct FT's levels, and stability ranks. We propose a new extraction algorithms of FT's for document classification by using FT's concentration rate, its occurrence frequencies.
https://doi.org/10.3745/KIPSTB.2003.10B.3.347 인용 PDF KSCI

Jang, Kye-Hun;Lee, Kyung-Soon
- The KIPS Transactions:PartB
- /
- v.19B no.1
- /
- pp.37-42
- /
- 2012
The pseudo relevance feedback suggests that frequent words at the top documents are related to initial query. However, the main drawback associated with the term frequency method is the fact that it relies on feature independence, and disregards any dependencies that may exist between words in the text. In this paper, we propose query expansion based on word graph using term proximity. It supplements term frequency method. On TREC WT10g test collection, experimental results in MAP(Mean Average Precision) show that the proposed method achieved 6.4% improvement over language model.
https://doi.org/10.3745/KIPSTB.2012.19B.1.037 인용 PDF KSCI