• 제목/요약/키워드: video summarization

Search Result 60, Processing Time 0.025 seconds

Investigating an Automatic Method for Summarizing and Presenting a Video Speech Using Acoustic Features (음향학적 자질을 활용한 비디오 스피치 요약의 자동 추출과 표현에 관한 연구)

  • Kim, Hyun-Hee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.4
    • /
    • pp.191-208
    • /
    • 2012
  • Two fundamental aspects of speech summary generation are the extraction of key speech content and the style of presentation of the extracted speech synopses. We first investigated whether acoustic features (speaking rate, pitch pattern, and intensity) are equally important and, if not, which one can be effectively modeled to compute the significance of segments for lecture summarization. As a result, we found that the intensity (that is, difference between max DB and min DB) is the most efficient factor for speech summarization. We evaluated the intensity-based method of using the difference between max-DB and min-DB by comparing it to the keyword-based method in terms of which method produces better speech summaries and of how similar weight values assigned to segments by two methods are. Then, we investigated the way to present speech summaries to the viewers. As such, for speech summarization, we suggested how to extract key segments from a speech video efficiently using acoustic features and then present the extracted segments to the viewers.

Automatic Extraction Techniques of Topic-relevant Visual Shots Using Realtime Brainwave Responses (실시간 뇌파반응을 이용한 주제관련 영상물 쇼트 자동추출기법 개발연구)

  • Kim, Yong Ho;Kim, Hyun Hee
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.8
    • /
    • pp.1260-1274
    • /
    • 2016
  • To obtain good summarization algorithms, we need first understand how people summarize videos. 'Semantic gap' refers to the gap between semantics implied in video summarization algorithms and what people actually infer from watching videos. We hypothesized that ERP responses to real time videos will show either N400 effects to topic-irrelevant shots in the 300∼500ms time-range after stimulus on-set or P600 effects to topic-relevant shots in the 500∼700ms time range. We recruited 32 participants in the EEG experiment, asking them to focus on the topic of short videos and to memorize relevant shots to the topic of the video. After analysing real time videos based on the participants' rating information, we obtained the following t-test result, showing N400 effects on PF1, F7, F3, C3, Cz, T7, and FT7 positions on the left and central hemisphere, and P600 effects on PF1, C3, Cz, and FCz on the left and central hemisphere and C4, FC4, P8, and TP8 on the right. A further 3-way MANOVA test with repeated measures of topic-relevance, hemisphere, and electrode positions showed significant interaction effects, implying that the left hemisphere at central, frontal, and pre-frontal positions were sensitive in detecting topic-relevant shots while watching real time videos.

An Efficient Face Region Detection for Content-based Video Summarization (내용기반 비디오 요약을 위한 효율적인 얼굴 객체 검출)

  • Kim Jong-Sung;Lee Sun-Ta;Baek Joong-Hwan
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.7C
    • /
    • pp.675-686
    • /
    • 2005
  • In this paper, we propose an efficient face region detection technique for the content-based video summarization. To segment video, shot changes are detected from a video sequence and key frames are selected from the shots. We select one frame that has the least difference between neighboring frames in each shot. The proposed face detection algorithm detects face region from selected key frames. And then, we provide user with summarized frames included face region that has an important meaning in dramas or movies. Using Bayes classification rule and statistical characteristic of the skin pixels, face regions are detected in the frames. After skin detection, we adopt the projection method to segment an image(frame) into face region and non-face region. The segmented regions are candidates of the face object and they include many false detected regions. So, we design a classifier to minimize false lesion using CART. From SGLD matrices, we extract the textual feature values such as Inertial, Inverse Difference, and Correlation. As a result of our experiment, proposed face detection algorithm shows a good performance for the key frames with a complex and variant background. And our system provides key frames included the face region for user as video summarized information.

Video Summarization Using Hidden Markov Model (은닉 마르코브 모델을 이용한 비디오 요약 시스템)

  • 박호식;배철수
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.6
    • /
    • pp.1175-1181
    • /
    • 2004
  • This paper proposes a system to analyze and summarize the video shots of baseball game TV program into fifteen categories. Our System consists of three modules: feature extraction, Hidden Markov Model (HMM) training, and video shot categorization. Video Shots belongs to the same class are not necessarily similar, so we require that the training set is large enough to include video shot with all possible variations to create a robust Hidden Markov Model. In the experiments, we have illustrated that our system can recognize the 15 different shot classes with a success ratio of 84.72%.

Video Summarization Service based on MPEG-21 DIP (MPEG-21 DIP 기반 비디오 요약 서비스)

  • 류지웅;김문철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.199-201
    • /
    • 2003
  • MPEG-21 환경에서의 DI(Digital Item)은 MPEG-21 프레임워크 내에서 표준화된 표현 형식, 식별 체계, 서술 형식을 따르는 구조화된 디지털 객체이며, 유통, 처리의 최소 단위이다 따라서. 이러한 DI가 MPEG-21 멀티미디어 프레임워크 환경에서 사용자 터미널에 전달되었을 때 어떻게 처리되어야 될 것인지를 규정하는 것은 매우 중요한 과제이며. 이와 관련한 기술이 DIP(Digital Item Processing)이다. 본 논문에서는 DIP의 한 응용 예로서 멀티미디어 콘텐츠를 계층적으로 기술하는 Video Summary의 응용 방안에 대한 연구 결과를 제시하고자 한다.

  • PDF

Investigating an Automatic Method in Summarizing a Video Speech Using User-Assigned Tags (이용자 태그를 활용한 비디오 스피치 요약의 자동 생성 연구)

  • Kim, Hyun-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.46 no.1
    • /
    • pp.163-181
    • /
    • 2012
  • We investigated how useful video tags were in summarizing video speech and how valuable positional information was for speech summarization. Furthermore, we examined the similarity among sentences selected for a speech summary to reduce its redundancy. Based on such analysis results, we then designed and evaluated a method for automatically summarizing speech transcripts using a modified Maximum Marginal Relevance model. This model did not only reduce redundancy but it also enabled the use of social tags, title words, and sentence positional information. Finally, we compared the proposed method to the Extractor system in which key sentences of a video speech were chosen using the frequency and location information of speech content words. Results showed that the precision and recall rates of the proposed method were higher than those of the Extractor system, although there was no significant difference in the recall rates.

Design and Evaluation of Video Summarization Algorithm based on EEG Information (뇌파정보를 활용한 영상물 요약 알고리즘 설계와 평가)

  • Kim, Hyun-Hee;Kim, Yong-Ho
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.52 no.4
    • /
    • pp.91-110
    • /
    • 2018
  • We proposed a video summarization algorithm based on an ERP (Event Related Potentials)-based topic relevance model, a MMR (Maximal Marginal Relevance), and discriminant analysis to generate a semantically meaningful video skim. We then conducted implicit and explicit evaluations to evaluate our proposed ERP/MMR-based method. The results showed that in the implicit and explicit evaluations, the average scores of the ERP / MMR methods were statistically higher than the average score of the SBD (Shot Boundary Detection) method used as a competitive baseline, respectively. However, there was no statistically significant difference between the average score of ERP/MMR (${\lambda}=0.6$) method and that of ERP/MMR (${\lambda}=1.0$) method in both assessments.

The Abstraction Retrieval System of Cultural Videos using Scene Change Detection (장면전환검출을 이용한 교양비디오 개요 검색 시스템)

  • Kang Oh-Hyung;Lee Ji-Hyun;Rhee Yang-Won
    • The KIPS Transactions:PartB
    • /
    • v.12B no.7 s.103
    • /
    • pp.761-766
    • /
    • 2005
  • This paper proposes a video model for the implementation of the cultural video database system. We have utilized an efficient scene change detection method that segments cultural video into semantic units for efficient indexing and retrieval of video. Since video has a large volume and needs to be played for a longer time, it implies difficulty of viewing the entire video. To solve this Problem. the cultural video abstraction was made to save the time and widen the choices of video the video abstract is the summarization of scenes, which includes important events produced by setting up the abstraction rule.

Automatic Poster Generation System Using Protagonist Face Analysis

  • Yeonhwi You;Sungjung Yong;Hyogyeong Park;Seoyoung Lee;Il-Young Moon
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.4
    • /
    • pp.287-293
    • /
    • 2023
  • With the rapid development of domestic and international over-the-top markets, a large amount of video content is being created. As the volume of video content increases, consumers tend to increasingly check data concerning the videos before watching them. To address this demand, video summaries in the form of plot descriptions, thumbnails, posters, and other formats are provided to consumers. This study proposes an approach that automatically generates posters to effectively convey video content while reducing the cost of video summarization. In the automatic generation of posters, face recognition and clustering are used to gather and classify character data, and keyframes from the video are extracted to learn the overall atmosphere of the video. This study used the facial data of the characters and keyframes as training data and employed technologies such as DreamBooth, a text-to-image generation model, to automatically generate video posters. This process significantly reduces the time and cost of video-poster production.

Activity-based video summarization in a wide-area surveillance system (광범위한 지역 감시시스템에서의 행동기반 비디오 요약)

  • Kwon, Hye-Young;Lee, Youn-Mi;Lee, Kyoung-Mi
    • 한국HCI학회:학술대회논문집
    • /
    • 2007.02a
    • /
    • pp.719-724
    • /
    • 2007
  • 본 논문에서는 광범위한 지역을 감시하기 위해 설치된 여러 대의 카메라로부터 획득된 비디오에 대해 행동을 기반으로 한 비디오 요약 시스템을 제안한다. 제안된 시스템은 시야가 겹쳐지지 않은 다수의 CCTV 카메라를 통해서 촬영한 비디오들을 30분 단위로 나누어 비디오 데이터베이스를 구축하여 시간별, 카메라별 비디오 검색이 가능하다. 또한 비디오에서 키프레임을 추출하여 카메라별, 사람별, 행동별로 비디오를 요약할 수 있도록 하였다. 또한 임계치에 따라 키프레임 검색정도를 조절함으로써 비디오 요약정도를 조절할 수 있다. in. out, stay, left, right, forward, backward와 관련된 11가지 행동을 추출하여 요약된 정보를 가지고 현재 사람의 행동이 어떤 영역에서 어떤 방향으로 움직이고 있는 지에 대한 정보를 보여줌으로써 더 자세히 행동추적을 할 수 있다. 또한 카메라 3대에 대한 전체적인 키프레임에 대한 행동별 통계를 통해서 감시지역의 행동기반 이벤트를 간단히 확인해 볼 수 있다.

  • PDF