Search | Korea Science

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Al-Sabahi, Kamal;Zuping, Zhang;Kang, Yang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.13 no.1
- /
- pp.254-276
- /
- 2019
Since the amount of information on the internet is growing rapidly, it is not easy for a user to find relevant information for his/her query. To tackle this issue, the researchers are paying much attention to Document Summarization. The key point in any successful document summarizer is a good document representation. The traditional approaches based on word overlapping mostly fail to produce that kind of representation. Word embedding has shown good performance allowing words to match on a semantic level. Naively concatenating word embeddings makes common words dominant which in turn diminish the representation quality. In this paper, we employ word embeddings to improve the weighting schemes for calculating the Latent Semantic Analysis input matrix. Two embedding-based weighting schemes are proposed and then combined to calculate the values of this matrix. They are modified versions of the augment weight and the entropy frequency that combine the strength of traditional weighting schemes and word embedding. The proposed approach is evaluated on three English datasets, DUC 2002, DUC 2004 and Multilingual 2015 Single-document Summarization. Experimental results on the three datasets show that the proposed model achieved competitive performance compared to the state-of-the-art leading to a conclusion that it provides a better document representation and a better document summary as a result.
https://doi.org/10.3837/tiis.2019.01.015 인용 PDF KSCI HTML

An Efficient Machine Learning-based Text Summarization in the Malayalam Language

P Haroon, Rosna;Gafur M, Abdul;Nisha U, Barakkath
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.6
- /
- pp.1778-1799
- /
- 2022
Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers.
https://doi.org/10.3837/tiis.2022.06.001 인용 PDF KSCI HTML

Automatic Summarization of Basketball Video Using the Score Information (스코어 정보를 이용한 농구 비디오의 자동요약)

Jung, Cheol-Kon;Kim, Eui-Jin;Lee, Gwang-Gook;Kim, Whoi-Yul
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.32 no.9C
- /
- pp.881-887
- /
- 2007
In this paper, we proposed a method for content based automatic summarization of basketball game videos. For meaningful summary, we used the score information in basketball videos. And the score information is obtained by recognizing the digits on the score caption and analyzing the variation of the score. Generally, important events of basketball are the 3-point shot, one-sided runs, the lead changes, and so on. We have detected these events using score information and made summaries and highlights of basketball video games.
PDF KSCI

Automatic Music Summarization Using Vector Quantization and Segment Similarity

Kim, Sang-Ho;Kim, Sung-Tak;Kim, Hoi-Rin
- The Journal of the Acoustical Society of Korea
- /
- v.27 no.2E
- /
- pp.51-56
- /
- 2008
In this paper, we propose an effective method for music summarization which automatically extracts a representative part of the music by using signal processing technology. Proposed method uses a vector quantization technique to extract several segments which can be regarded as the most important contents in the music. In general, there is a repetitive pattern in music, and human usually recognizes the most important or catchy tune from the repetitive pattern. Thus the repetition which is extracted using segment similarity is considered to express a music summary. The segments extracted are again combined to generate a complete music summary. Experiments show the proposed method captures the main theme of the music more effectively than conventional methods. The experimental results also show that the proposed method could be used for real-time application since the processing time in generating music summary is much faster than other methods.
PDF KSCI

Implementation of Text Summarize Automation Using Document Length Normalization (문서 길이 정규화를 이용한 문서 요약 자동화 시스템 구현)

이재훈;김영천;이성주
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2001.12a
- /
- pp.51-55
- /
- 2001
With the rapid growth of the World Wide Web and electronic information services, information is becoming available on-Line at an incredible rate. One result is the oft-decried information overload. No one has time to read everything, yet we often have to make critical decisions based on what we are able to assimilate. The technology of automatic text summarization is becoming indispensable for dealing with this problem. Text summarization is the process of distilling the most important information from a source to produce an abridged version for a particular user or task. Information retrieval(IR) is the task of searching a set of documents for some query-relevant documents. On the other hand, text summarization is considered to be the task of searching a document, a set of sentences, for some topic-relevant sentences. In this paper, we show that document information, that is more reliable and suitable for query, using document length normalization of which is gained through information retrieval . Experimental results of this system in newspaper articles show that document length normalization method superior to other methods use query itself.
PDF

Document Summarization using Term Weighting (용어 가중치에 의한 문서요약)

Park, Sun;Kim, Chul Won
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2012.10a
- /
- pp.704-706
- /
- 2012
In this paper, we proposes a document summarization method using the term weighting. The proposed method can minimize the user intervention to use the pseudo relevance feedback. It also can improve the quality of document summaries because the inherent semantic of the sentence set are well reflected by term weighting derived from semantic feature.
PDF

Automatic Summarization of Basketball Video Using the Score Information (스코어 정보를 이용한 농구 비디오의 자동요약)

Jung, Cheol-Kon;Kim, Eui-Jin;Lee, Gwang-Gook;Kim, Whoi-Yul
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.32 no.8C
- /
- pp.738-744
- /
- 2007
In this paper, we proposed a method for content based automatic summarization of basketball game videos. For meaningful summary, we used the score information in basketball videos. And the score information is obtained by recognizing the digits on the score caption and analyzing the variation of the score. Generally, important events of basketball are the 3-point shot, one-sided runs, the lead changes, and so on. We have detected these events using score information and made summaries and highlights of basketball video games.
PDF KSCI

A Automatic Document Summarization Method based on Principal Component Analysis

Kim, Min-Soo;Lee, Chang-Beom;Baek, Jang-Sun;Lee, Guee-Sang;Park, Hyuk-Ro
- Communications for Statistical Applications and Methods
- /
- v.9 no.2
- /
- pp.491-503
- /
- 2002
In this paper, we propose a automatic document summarization method based on Principal Component Analysis(PCA) which is one of the multivariate statistical methods. After extracting thematic words using PCA, we select the statements containing the respective extracted thematic words, and make the document summary with them. Experimental results using newspaper articles show that the proposed method is superior to the method using either word frequency or information retrieval thesaurus.
https://doi.org/10.5351/CKSS.2002.9.2.491 인용 PDF KSCI

Document Summarization using Weighting based on Cloud (클라우드 기반의 가중치에 의한 문서요약)

Park, Sun;Kim, Chul Won
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2013.10a
- /
- pp.305-306
- /
- 2013
In this paper, we proposes a document summarization method using the weighting based on cloud. The proposed method can minimize the user intervention to use the relevance feedback. It also can improve the quality of document summaries because the inherent semantic of the sentence set are well reflected by term weighting derived from semantic feature using nonnegative matrix factorizaitno based cloud.
PDF

Document Summarization using Weighting based on Cloud (클라우드 기반의 가중치에 의한 문서요약)

Park, Sun;Kim, Chul Won
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2013.10a
- /
- pp.968-969
- /
- 2013
In this paper, we proposes a document summarization method using the weighting based on cloud. The proposed method can minimize the user intervention to use the relevance feedback. It also can improve the quality of document summaries because the inherent semantic of the sentence set are well reflected by term weighting derived from semantic feature using nonnegative matrix factorizaitno based cloud.
PDF

Search Result 145, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)