• 제목/요약/키워드: Social Summarization

검색결과 16건 처리시간 0.022초

A Survey on Automatic Twitter Event Summarization

  • Rudrapal, Dwijen;Das, Amitava;Bhattacharya, Baby
    • Journal of Information Processing Systems
    • /
    • 제14권1호
    • /
    • pp.79-100
    • /
    • 2018
  • Twitter is one of the most popular social platforms for online users to share trendy information and views on any event. Twitter reports an event faster than any other medium and contains enormous information and views regarding an event. Consequently, Twitter topic summarization is one of the most convenient ways to get instant gist of any event. However, the information shared on Twitter is often full of nonstandard abbreviations, acronyms, out of vocabulary (OOV) words and with grammatical mistakes which create challenges to find reliable and useful information related to any event. Undoubtedly, Twitter event summarization is a challenging task where traditional text summarization methods do not work well. In last decade, various research works introduced different approaches for automatic Twitter topic summarization. The main aim of this survey work is to make a broad overview of promising summarization approaches on a Twitter topic. We also focus on automatic evaluation of summarization techniques by surveying recent evaluation methodologies. At the end of the survey, we emphasize on both current and future research challenges in this domain through a level of depth analysis of the most recent summarization approaches.

신문기사와 소셜 미디어를 활용한 한국어 문서요약 데이터 구축 (Building a Korean Text Summarization Dataset Using News Articles of Social Media)

  • 이경호;박요한;이공주
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제9권8호
    • /
    • pp.251-258
    • /
    • 2020
  • 문서 요약을 위한 학습 데이터는 문서와 그 요약으로 구성된다. 기존의 문서 요약 데이터는 사람이 수동으로 요약을 작성하였기 때문에 대량의 데이터 확보가 어려웠다. 그렇기 때문에 온라인으로 쉽게 수집 가능하며 문서의 품질이 우수한 인터넷 신문기사가 문서 요약 연구에 많이 활용되어 왔다. 본 연구에서는 언론사가 소셜 미디어에 게시한 설명글과 제목, 부제를 본문의 요약으로 사용하여 한국어 문서 요약 데이터를 구성하는 것을 제안한다. 약 425,000개의 신문기사와 그 요약데이터를 구축할 수 있었다. 구성한 데이터의 유용성을 보이기 위해 추출 요약 시스템을 구현하였다. 본 연구에서 구축한 데이터로 학습한 교사 학습 모델과 비교사 학습 모델의 성능을 비교하였다. 실험 결과 제안한 데이터로 학습한 모델이 비교사 학습 알고리즘에 비해 더 높은 ROUGE 점수를 보였다.

동영상 실시간 시청시 유발전위(ERP) N400 속성을 이용한 주제무관 쇼트 선별 자동영상요약 연구 (A Video Summarization Study On Selecting-Out Topic-Irrelevant Shots Using N400 ERP Components in the Real-Time Video Watching)

  • 김용호;김현희
    • 한국멀티미디어학회논문지
    • /
    • 제20권8호
    • /
    • pp.1258-1270
    • /
    • 2017
  • 'Semantic gap' has been a year-old problem in automatic video summarization, which refers to the gap between semantics implied in video summarization algorithms and what people actually infer from watching videos. Using the external EEG bio-feedback obtained from video watchers as a solution of this semantic gap problem has several another issues: First, how to define and measure noises against ERP waveforms as signals. Second, whether individual differences among subjects in terms of noise and SNR for conventional ERP studies using still images captured from videos are the same with those differently conceptualized and measured from videos. Third, whether individual differences of subjects by noise and SNR levels help to detect topic-irrelevant shots as signals which are not matched with subject's own semantic topical expectations (mis-match negativity at around 400m after stimulus on-sets). The result of repeated measures ANOVA test clearly shows a 2-way interaction effect between topic-relevance and noise level, implying that subjects of low noise level for video watching session are sensitive to topic-irrelevant visual shots, while showing another 3-way interaction among topic-relevance, noise and SNR levels, implying that subjects of high noise level are sensitive to topic-irrelevant visual shots only if they are of low SNR level.

실시간 뇌파반응을 이용한 주제관련 영상물 쇼트 자동추출기법 개발연구 (Automatic Extraction Techniques of Topic-relevant Visual Shots Using Realtime Brainwave Responses)

  • 김용호;김현희
    • 한국멀티미디어학회논문지
    • /
    • 제19권8호
    • /
    • pp.1260-1274
    • /
    • 2016
  • To obtain good summarization algorithms, we need first understand how people summarize videos. 'Semantic gap' refers to the gap between semantics implied in video summarization algorithms and what people actually infer from watching videos. We hypothesized that ERP responses to real time videos will show either N400 effects to topic-irrelevant shots in the 300∼500ms time-range after stimulus on-set or P600 effects to topic-relevant shots in the 500∼700ms time range. We recruited 32 participants in the EEG experiment, asking them to focus on the topic of short videos and to memorize relevant shots to the topic of the video. After analysing real time videos based on the participants' rating information, we obtained the following t-test result, showing N400 effects on PF1, F7, F3, C3, Cz, T7, and FT7 positions on the left and central hemisphere, and P600 effects on PF1, C3, Cz, and FCz on the left and central hemisphere and C4, FC4, P8, and TP8 on the right. A further 3-way MANOVA test with repeated measures of topic-relevance, hemisphere, and electrode positions showed significant interaction effects, implying that the left hemisphere at central, frontal, and pre-frontal positions were sensitive in detecting topic-relevant shots while watching real time videos.

Analysis of Research status based on Citation Context

  • Kim, Byungkyu;Choi, Seon-heui;Kang, Muyeong;Kang, Ji-Hoon
    • International Journal of Contents
    • /
    • 제11권2호
    • /
    • pp.63-68
    • /
    • 2015
  • A citation analysis utilizes the relations among citations and is the most popular bibliometric method. This analysis is based on 1) the evaluation by paper, journal and researcher of the research output, 2) the identification of emerging research topics, 3) the production of a map of the intellectual structure of the research domain and 4) various services for academic information. However, this approach has a limitation in that a citation is treated in a very simple manner, even though the purpose of citation can vary greatly. To address this problem, new approaches have been studied that take into account the citation context. This research separates the citations according to the citation functions and tries to conduct an analysis according to the newly classified citations. Furthermore, research on the citation summarization and visualization based on both the citation context and the citation function of the citations was also attempted. However, since there are very few studies related to citation context in South Korea, more research and development is needed in this area. This study analyzes the status of the research in terms of the citation context. For this, we utilized social network analysis methods.

실시간 동영상 시청시 주제탐색조건과 주제관련성이 내재적 유발전위 활성에 미치는 영향 (The Influence of Topic Exploration and Topic Relevance On Amplitudes of Endogenous ERP Components in Real-Time Video Watching)

  • 김용호;김현희
    • 한국멀티미디어학회논문지
    • /
    • 제22권8호
    • /
    • pp.874-886
    • /
    • 2019
  • To delve into the semantic gap problem of the automatic video summarization, we focused on an endogenous ERP responses at around 400ms and 600ms after the on-set of audio-visual stimulus. Our experiment included two factors: the topic exploration of experimental conditions (Topic Given vs. Topic Exploring) as a between-subject factor and the topic relevance of the shots (Topic-Relevant vs. Topic-Irrelevant) as a within-subject factor. For the Topic Given condition of 22 subjects, 6 short historical documentaries were shown with their video titles and written summaries, while in the Topic Exploring condition of 25 subjects, they were asked instead to explore topics of the same videos with no given information. EEG data were gathered while they were watching videos in real time. It was hypothesized that the cognitive activities to explore topics of videos while watching individual shots increase the amplitude of endogenous ERP at around 600 ms after the onset of topic relevant shots. The amplitude of endogenous ERP at around 400ms after the onset of topic-irrelevant shots was hypothesized to be lower in the Topic Given condition than that in the Topic Exploring condition. The repeated measure MANOVA test revealed that two hypotheses were acceptable.

이용자 태그를 활용한 비디오 스피치 요약의 자동 생성 연구 (Investigating an Automatic Method in Summarizing a Video Speech Using User-Assigned Tags)

  • 김현희
    • 한국문헌정보학회지
    • /
    • 제46권1호
    • /
    • pp.163-181
    • /
    • 2012
  • 본 연구는 스피치 요약의 알고리즘을 구성하기 위해서 방대한 스피치 본문의 복잡한 분석 없이 적용될 수 있는 이용자 태그 기법, 문장 위치 및 문장 중복도 제거 기법의 효율성을 분석해 보았다. 그런 다음, 이러한 분석 결과를 기초로 하여 스피치 요약 방법을 구성, 평가하여 효율적인 스피치 요약 방안을 제안하는 것을 연구 목적으로 하고 있다. 제안된 스피치 요약 방법은 태그 및 표제 키워드 정보를 활용하고 중복도를 최소화하면서 문장 위치에 대한 가중치를 적용할 수 있는 수정된 Maximum Marginal Relevance 모형을 사용하여 구성하였다. 제안된 요약 방법의 성능은 스피치 본문의 단어 빈도 및 단어 위치 정보를 적용하여 상대적으로 복잡한 어휘 처리를 한 Extractor 시스템의 성능과 비교되었다. 비교 결과, 제안된 요약 방법을 사용한 경우가 Extractor 시스템의 경우 보다 평균 정확률은 통계적으로 유의미한 차이를 보이며 더 높았고, 평균 재현율은 더 높았지만 통계적으로 유의미한 차이를 보이지는 못했다.

주제 균형 지능형 텍스트 요약 기법 (Subject-Balanced Intelligent Text Summarization Scheme)

  • 윤여일;고은정;김남규
    • 지능정보연구
    • /
    • 제25권2호
    • /
    • pp.141-166
    • /
    • 2019
  • 최근 다양한 매체를 통해 생성되는 방대한 양의 텍스트 데이터를 효율적으로 관리 및 활용하기 위한 방안으로써 문서 요약에 대한 연구가 활발히 진행되고 있다. 특히 최근에는 기계 학습 및 인공 지능을 활용하여 객관적이고 효율적으로 요약문을 도출하기 위한 다양한 자동 요약 기법이(Automatic Summarization) 고안되고 있다. 하지만 현재까지 제안된 대부분의 텍스트 자동 요약 기법들은 원문에서 나타난 내용의 분포에 따라 요약문의 내용이 구성되는 방식을 따르며, 이와 같은 방식은 비중이 낮은 주제(Subject), 즉 원문 내에서 언급 빈도가 낮은 주제에 대한 내용이 요약문에 포함되기 어렵다는 한계를 갖고 있다. 본 논문에서는 이러한 한계를 극복하기 위해 저빈도 주제의 누락을 최소화하는 문서 자동 요약 기법을 제안한다. 구체적으로 본 연구에서는 (i) 원문에 포함된 다양한 주제를 식별하고 주제별 대표 용어를 선정한 뒤 워드 임베딩을 통해 주제별 용어 사전을 생성하고, (ii) 원문의 각 문장이 다양한 주제에 대응되는 정도를 파악하고, (iii) 문장을 주제별로 분할한 후 각 주제에 해당하는 문장들의 유사도를 계산한 뒤, (iv) 요약문 내 내용의 중복을 최소화하면서도 원문의 다양한 내용을 최대한 포함할 수 있는 자동적인 문서 요약 기법을 제시한다. 제안 방법론의 평가를 위해 TripAdvisor의 리뷰 50,000건으로부터 용어 사전을 구축하고, 리뷰 23,087건에 대한 요약 실험을 수행한 뒤 기존의 단순 빈도 기반의 요약문과 주제별 분포의 비교를 진행하였다. 실험 결과 제안 방법론에 따른 문서 자동 요약을 통해 원문 내각 주제의 균형을 유지하는 요약문을 도출할 수 있음을 확인하였다.

Online-Based Local Government Image Typology: A Case Study on Jakarta Provincial Government Official YouTube Videos

  • Pratama, Arif Budy
    • Journal of Contemporary Eastern Asia
    • /
    • 제16권1호
    • /
    • pp.1-21
    • /
    • 2017
  • The Jakarta Provincial Government utilizes the YouTube channel to interact with citizens and enhance transparency. The purpose of this study is to explore online perceptions of local government image perceived by online audiences through the YouTube platform. The concepts of organizational image and credibility in the political image are adapted to analyze online public perceptions on the Jakarta Provincial Government image. Using the video summarization approach on Three hundred and forty-six official YouTube videos, which were uploaded from 1 March 2016 to 31 May 2016, and content analysis of Eight thousand two hundred and thirty-seven comments, this study shows both political and bureaucratic image emerge concurrently in the Jakarta Provincial Government case. The typology model is proposed to describe and explain the four image variations that occurred in the case study. Practical recommendations are suggested to manage YouTube channel as one of the social media used in the local government context.

Toward a Structural and Semantic Metadata Framework for Efficient Browsing and Searching of Web Videos

  • 김현희
    • 한국문헌정보학회지
    • /
    • 제51권1호
    • /
    • pp.227-243
    • /
    • 2017
  • This study proposed a structural and semantic framework for the characterization of events and segments in Web videos that permits content-based searches and dynamic video summarization. Although MPEG-7 supports multimedia structural and semantic descriptions, it is not currently suitable for describing multimedia content on the Web. Thus, the proposed metadata framework that was designed considering Web environments provides a thorough yet simple way to describe Web video contents. Precisely, the metadata framework was constructed on the basis of Chatman's narrative theory, three multimedia metadata formats (PBCore, MPEG-7, and TV-Anytime), and social metadata. It consists of event information, eventGroup information, segment information, and video (program) information. This study also discusses how to automatically extract metadata elements including structural and semantic metadata elements from Web videos.