• Title/Summary/Keyword: Feature Based Summarization

Search Result 25, Processing Time 0.026 seconds

Video Summarization Using Importance-based Fuzzy One-Class Support Vector Machine (중요도 기반 퍼지 원 클래스 서포트 벡터 머신을 이용한 비디오 요약 기술)

  • Kim, Ki-Joo;Choi, Young-Sik
    • Journal of Internet Computing and Services
    • /
    • v.12 no.5
    • /
    • pp.87-100
    • /
    • 2011
  • In this paper, we address a video summarization task as generating both visually salient and semantically important video segments. In order to find salient data points, one can use the OC-SVM (One-class Support Vector Machine), which is well known for novelty detection problems. It is, however, hard to incorporate into the OC-SVM process the importance measure of data points, which is crucial for video summarization. In order to integrate the importance of each point in the OC-SVM process, we propose a fuzzy version of OC-SVM. The Importance-based Fuzzy OC-SVM weights data points according to the importance measure of the video segments and then estimates the support of a distribution of the weighted feature vectors. The estimated support vectors form the descriptive segments that best delineate the underlying video content in terms of the importance and salience of video segments. We demonstrate the performance of our algorithm on several synthesized data sets and different types of videos in order to show the efficacy of the proposed algorithm. Experimental results showed that our approach outperformed the well known traditional method.

Moving Average Filter for Automatic Music Segmentation & Summarization (이동 평균 필터를 적용한 음악 세그멘테이션 및 요약)

  • Kim Kil-Youn;Oh Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.143-146
    • /
    • 2006
  • Music is now digitally produced and distributed via internet and we face a huge amount of music day by day. A music summarization technology has been studied in order to help people concentrate on the most impressive section of the song andone can skim a song as listening the climax(chorus, refrain) only. Recent studies try to find the climax section using various methods such as finding diagonal line segment or kernel based segmentation. All these methods fail to capture the inherent structure of music due to polyphonic and noisy nature of music. In this paper, after applying moving average filter to time domain of MFCC/chroma feature, we achieved a remarkable result to capture the music structure.

  • PDF

Development and Evaluation of a Document Summarization System using Features and a Text Component Identification Method (텍스트 구성요소 판별 기법과 자질을 이용한 문서 요약 시스템의 개발 및 평가)

  • Jang, Dong-Hyun;Myaeng, Sung-Hyon
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.6
    • /
    • pp.678-689
    • /
    • 2000
  • This paper describes an automatic summarization approach that constructs a summary by extracting sentences that are likely to represent the main theme of a document. As a way of selecting summary sentences, the system uses a model that takes into account lexical and statistical information obtained from a document corpus. As such, the system consists of two parts: the training part and the summarization part. The former processes sentences that have been manually tagged for summary sentences and extracts necessary statistical information of various kinds, and the latter uses the information to calculate the likelihood that a given sentence is to be included in the summary. There are at least three unique aspects of this research. First of all, the system uses a text component identification model to categorize sentences into one of the text components. This allows us to eliminate parts of text that are not likely to contain summary sentences. Second, although our statistically-based model stems from an existing one developed for English texts, it applies the framework to individual features separately and computes the final score for each sentence by combining the pieces of evidence using the Dempster-Shafer combination rule. Third, not only were new features introduced but also all the features were tested for their effectiveness in the summarization framework.

  • PDF

Automatic Product Feature Extraction for Efficient Analysis of Product Reviews Using Term Statistics (효율적인 상품평 분석을 위한 어휘 통계 정보 기반 평가 항목 추출 시스템)

  • Lee, Woo-Chul;Lee, Hyun-Ah;Lee, Kong-Joo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.6
    • /
    • pp.497-502
    • /
    • 2009
  • In this paper, we introduce an automatic product feature extracting system that improves the efficiency of product review analysis. Our system consists of 2 parts: a review collection and correction part and a product feature extraction part. The former part collects reviews from internet shopping malls and revises spoken style or ungrammatical sentences. In the latter part, product features that mean items that can be used as evaluation criteria like 'size' and 'style' for a skirt are automatically extracted by utilizing term statistics in reviews and web documents on the Internet. We choose nouns in reviews as candidates for product features, and calculate degree of association between candidate nouns and products by combining inner association degree and outer association degree. Inner association degree is calculated from noun frequency in reviews and outer association degree is calculated from co-occurrence frequency of a candidate noun and a product name in web documents. In evaluation results, our extraction method showed an average recall of 90%, which is better than the results of previous approaches.

Product Evaluation Summarization Through Linguistic Analysis of Product Reviews (상품평의 언어적 분석을 통한 상품 평가 요약 시스템)

  • Lee, Woo-Chul;Lee, Hyun-Ah;Lee, Kong-Joo
    • The KIPS Transactions:PartB
    • /
    • v.17B no.1
    • /
    • pp.93-98
    • /
    • 2010
  • In this paper, we introduce a system that summarizes product evaluation through linguistic analysis to effectively utilize explosively increasing product reviews. Our system analyzes polarities of product reviews by product features, based on which customers evaluate each product like 'design' and 'material' for a skirt product category. The system shows to customers a graph as a review summary that represents percentages of positive and negative reviews. We build an opinion word dictionary for each product feature through context based automatic expansion with small seed words, and judge polarity of reviews by product features with the extracted dictionary. In experiment using product reviews from online shopping malls, our system shows average accuracy of 69.8% in extracting judgemental word dictionary and 81.8% in polarity resolution for each sentence.

A Dependency Graph-Based Keyphrase Extraction Method Using Anti-patterns

  • Batsuren, Khuyagbaatar;Batbaatar, Erdenebileg;Munkhdalai, Tsendsuren;Li, Meijing;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1254-1271
    • /
    • 2018
  • Keyphrase extraction is one of fundamental natural language processing (NLP) tools to improve many text-mining applications such as document summarization and clustering. In this paper, we propose to use two novel techniques on the top of the state-of-the-art keyphrase extraction methods. First is the anti-patterns that aim to recognize non-keyphrase candidates. The state-of-the-art methods often used the rich feature set to identify keyphrases while those rich feature set cover only some of all keyphrases because keyphrases share very few similar patterns and stylistic features while non-keyphrase candidates often share many similar patterns and stylistic features. Second one is to use the dependency graph instead of the word co-occurrence graph that could not connect two words that are syntactically related and placed far from each other in a sentence while the dependency graph can do so. In experiments, we have compared the performances with different settings of the graphs (co-occurrence and dependency), and with the existing method results. Finally, we discovered that the combination method of dependency graph and anti-patterns outperform the state-of-the-art performances.

Applying Lexical Semantics to Automatic Extraction of Temporal Expressions in Uyghur

  • Murat, Alim;Yusup, Azharjan;Iskandar, Zulkar;Yusup, Azragul;Abaydulla, Yusup
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.824-836
    • /
    • 2018
  • The automatic extraction of temporal information from written texts is a key component of question answering and summarization systems and its efficacy in those systems is very decisive if a temporal expression (TE) is successfully extracted. In this paper, three different approaches for TE extraction in Uyghur are developed and analyzed. A novel approach which uses lexical semantics as an additional information is also presented to extend classical approaches which are mainly based on morphology and syntax. We used a manually annotated news dataset labeled with TIMEX3 tags and generated three models with different feature combinations. The experimental results show that the best run achieved 0.87 for Precision, 0.89 for Recall, and 0.88 for F1-Measure in Uyghur TE extraction. From the analysis of the results, we concluded that the application of semantic knowledge resolves ambiguity problem at shallower language analysis and significantly aids the development of more efficient Uyghur TE extraction system.

An Automatic Summarization System Based On a Probabilistic Model Using Document Structure Information (문서 구조 정보를 이용한 확률 모델 기반 자동요약 시스템)

  • Jang, Dong-Hyun;Myaeng, Sung-Hyon
    • Annual Conference on Human and Language Technology
    • /
    • 1997.10a
    • /
    • pp.15-22
    • /
    • 1997
  • 인터넷과 정보 서비스 기술의 발달로 일반 대중에게 제공되는 정보의 양은 기하급수적으로 증가하고 있는 추세지만 사용자가 원하는 정보를 얻기는 더욱 어려워지고 있으며, 필요한 정보를 찾은 경우에도 그 양이 많기 때문에 전체적인 내용을 파악하는 데 많은 시간을 소비하게 된다. 이러한 문제를 해결하고자 본 연구에서는 통계적 모델을 사용하여 문서로부터 문장을 추출한 후 요약문을 작성하여 사용자에게 제시하는 시스템을 개발하였다. 문서 요약 시스템의 구축을 위하여 사용된 방법은 문서 집합으로부터 중요 문장을 추출한 후 이로부터 요약문에 나타날 수 있는 특성(feature)과 중요 단어를 학습하여 학습된 내용을 이용하여 요약문을 하는 방법이다. 시스템 개발 및 평가를 위해 사용된 문서는 정보 과학 분야의 논문 모음이며 이를 학습 데이터와 실험 데이터로 구분한 후 학습 데이터로부터 필요한 정보를 얻고 실험 데이터로 평가하였다.

  • PDF

Query-Based Summarization using Non-negative Matrix Factorization (비음수 행렬 인수분해를 이용한 질의 기반의 문서 요약)

  • Park Sun;Lee Ju-Hong;Ahn Chan-Min;Park Tae-Su;Kim Deok-Hwan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.394-396
    • /
    • 2006
  • 기존 질의기반의 문서요약은 질의와 문서간의 사전 학습으로 요약의 질을 높이거나, 문서의 고유 구조(inherent structure)를 반영하여 요약의 정확도를 높이기 위하여 문서를 그래프로 변환한다. 본 논문은 비음수 행렬 인수분해 (NMF, Non-negative Matrix Factorization)를 이용하여 질의 기반의 문서를 요약하는 새로운 방법을 제안하였다. 제안된 방법은 질의와 문서간에 사전학습이 필요 없다. 또한 문서를 그래프로 변형시키는 복잡한 처리 없이 NMF에 의해 얻어진 의미 특징(semantic feature)과 의미 변수(semantic variable)로 문서의 고유 구조를 반영하여 요약의 정확도를 높일 수 있다. 마지막으로 단순한 방법으로 문장을 쉽게 요약 할 수 있다.

  • PDF

Development and Implementation of Multi-source Remote Sensing Imagery Fusion Based on PCI Geomatica

  • Yu, ZENG;Jixian, ZHANG;Qin, YAN;Pinglin, QIAO
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.1334-1336
    • /
    • 2003
  • On the basis of comprehensive analysis and summarization of the image fusion algorithms provided by PCI Geomatica software, deficiencies in image fusion processing functions of this software are put forwarded in this paper. This limitation could be improved by further developing PCI Geomatica on the user’ side. Five effective algorithms could be added into PCI Geomatica. In this paper, the detailed description of how to customize and further develop PCI Geomatica by using Microsoft Visual C++ 6.0, PCI SDK Kit and GDB technique is also given. Through this way, the remote sensing imagery fusion functions of PCI Geomatica software can be extended.

  • PDF