• Title/Summary/Keyword: Image caption

Search Result 51, Processing Time 0.032 seconds

Web Image Caption Extraction using Positional Relation and Lexical Similarity (위치적 연관성과 어휘적 유사성을 이용한 웹 이미지 캡션 추출)

  • Lee, Hyoung-Gyu;Kim, Min-Jeong;Hong, Gum-Won;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.335-345
    • /
    • 2009
  • In this paper, we propose a new web image caption extraction method considering the positional relation between a caption and an image and the lexical similarity between a caption and the main text containing the caption. The positional relation between a caption and an image represents how the caption is located with respect to the distance and the direction of the corresponding image. The lexical similarity between a caption and the main text indicates how likely the main text generates the caption of the image. Compared with previous image caption extraction approaches which only utilize the independent features of image and captions, the proposed approach can improve caption extraction recall rate, precision rate and 28% F-measure by including additional features of positional relation and lexical similarity.

An Effective Method for Replacing Caption in Video Images (비디오 자막 문자의 효과적인 교환 방법)

  • Chun Byung-Tae;Kim Sook-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.2 s.34
    • /
    • pp.97-104
    • /
    • 2005
  • Caption texts frequently inserted in a manufactured video image for helping an understanding of the TV audience. In the movies. replacement of the caption texts can be achieved without any loss of an original image, because the caption texts have their own track in the films. To replace the caption texts in early methods. the new texts have been inserted the caption area in the video images, which is filled a certain color for removing established caption texts. However, the use of these methods could be lost the original images in the caption area, so it is a Problematic method to the TV audience. In this Paper, we propose a new method for replacing the caption text after recovering original image in the caption area. In the experiments. the results in the complex images show some distortion after recovering original images, but most results show a good caption text with the recovered image. As such, this new method is effectively demonstrated to replace the caption texts in video images.

  • PDF

Image Caption Generation using Recurrent Neural Network (Recurrent Neural Network를 이용한 이미지 캡션 생성)

  • Lee, Changki
    • Journal of KIISE
    • /
    • v.43 no.8
    • /
    • pp.878-882
    • /
    • 2016
  • Automatic generation of captions for an image is a very difficult task, due to the necessity of computer vision and natural language processing technologies. However, this task has many important applications, such as early childhood education, image retrieval, and navigation for blind. In this paper, we describe a Recurrent Neural Network (RNN) model for generating image captions, which takes image features extracted from a Convolutional Neural Network (CNN). We demonstrate that our models produce state of the art results in image caption generation experiments on the Flickr 8K, Flickr 30K, and MS COCO datasets.

Implement of Realtime Character Recognition System for Numeric Region of Sportscast (스포츠 중계 화면 내 숫자영역에 대한 실시간 문자인식 시스템 구현)

  • 성시훈;전우성
    • Proceedings of the IEEK Conference
    • /
    • 2001.06d
    • /
    • pp.5-8
    • /
    • 2001
  • We propose a realtime numeric caption recognition algorithm that automatically recognizes the numeric caption generated by computer graphics (CG) and displays the modified caption using the recognized resource only when a valuable numeric caption appears in the aimed specific region of the live sportscast scene produced by other broadcasting stations. We extract the mesh feature from the enhanced binary image as a feature vector after acquiring the sports broadcast scenes using a frame grabber in realtime and then recover the valuable resource from just a numeric image by perceiving the character using the neural network. Finally, the result is verified by the knowledge-based rule set designed for more stable and reliable output and is displayed on a screen as the converted CC caption serving our purpose. At present, we have actually provided the realtime automatic mile-to-kilometer caption conversion system taking up our algorithm f3r the regular Major League Baseball (MLB) program being broadcasted live throughout Korea over our nationwide network. This caption conversion system is able to automatically convert the caption in mile universally used in the United States into that in kilometer in realtime, which is familiar to almost Koreans, and makes us get a favorable criticism from the TV audience.

  • PDF

Knowledge-Based Numeric Open Caption Recognition for Live Sportscast

  • Sung, Si-Hun
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.1871-1874
    • /
    • 2003
  • Knowledge-based numeric open caption recognition is proposed that can recognize numeric captions generated by character generator (CG) and automatically superimpose a modified caption using the recognized text only when a valid numeric caption appears in the aimed specific region of a live sportscast scene produced by other broadcasting stations. in the proposed method, mesh features are extracted from an enhanced binary image as feature vectors, then a valuable information is recovered from a numeric image by perceiving the character using a multiplayer perceptron (MLP) network. The result is verified using knowledge-based hie set designed for a more stable and reliable output and then the modified information is displayed on a screen by CG. MLB Eye Caption based on the proposed algorithm has already been used for regular Major League Base-ball (MLB) programs broadcast five over a Korean nationwide TV network and has produced a favorable response from Korean viewer.

  • PDF

Methods for Video Caption Extraction and Extracted Caption Image Enhancement (영화 비디오 자막 추출 및 추출된 자막 이미지 향상 방법)

  • Kim, So-Myung;Kwak, Sang-Shin;Choi, Yeong-Woo;Chung, Kyu-Sik
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.4
    • /
    • pp.235-247
    • /
    • 2002
  • For an efficient indexing and retrieval of digital video data, research on video caption extraction and recognition is required. This paper proposes methods for extracting artificial captions from video data and enhancing their image quality for an accurate Hangul and English character recognition. In the proposed methods, we first find locations of beginning and ending frames of the same caption contents and combine those multiple frames in each group by logical operation to remove background noises. During this process an evaluation is performed for detecting the integrated results with different caption images. After the multiple video frames are integrated, four different image enhancement techniques are applied to the image: resolution enhancement, contrast enhancement, stroke-based binarization, and morphological smoothing operations. By applying these operations to the video frames we can even improve the image quality of phonemes with complex strokes. Finding the beginning and ending locations of the frames with the same caption contents can be effectively used for the digital video indexing and browsing. We have tested the proposed methods with the video caption images containing both Hangul and English characters from cinema, and obtained the improved results of the character recognition.

Motion-Compensated Interpolation for Non-moving Caption Region (정지자막 영역의 움직임 보상 보간 기법)

  • Lee, Jeong-Hun;Han, Dong-Il
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.363-364
    • /
    • 2007
  • In this paper, we present a novel motion-compensated interpolation technique for non-moving caption region to prevent the block artifacts due to the failure of conventional block-based motion estimation algorithm on the block is consist of non-moving caption and moving object. Experimental results indicate good performance of the proposed scheme with significantly reduced block artifacts on image sequence that include non-moving caption. Also the proposed method is simple and adequate for hardware implementation.

  • PDF

The Effect of Layout Framing on SNS Shopping Information: A-D Perspective (SNS 쇼핑정보의 레이아웃 프레이밍 연구: A-D 관점에서)

  • Yanjinlkham Khurelchuluun;Zainab Shabir;Dong-Seok Lee;Gwi-Gon Kim
    • Journal of Industrial Convergence
    • /
    • v.21 no.11
    • /
    • pp.1-12
    • /
    • 2023
  • With the recent explosive popularity of SNS, it is increasingly important to utilize SNS marketing, and in this process, the importance of image and caption order in SNS layout is also growing. This research aims to analyze the impact of SNS layouts (Image First vs. Caption First) on the user's attitude toward SNS shopping. A survey was conducted targeting 350 general public and college(graduate) students living in Daegu City and Gyeongbuk Province. The data was analyzed using PROCESS, regression analysis, and t-test by SPSS 21.0 program. The result of this study, it was confirmed that the Image First was more accessible than the Caption First. The Caption First was confirmed to be more diagnostic than the Image First. Moreover, from three specific mediation paths, only two were confirmed, named is through diagnosticity and usefulness, and through accessibility, diagnosticity, and usefullness. The path through diagnosticity and usefulness were stronger than another. Additionally, the impact of accessibility on diagnosticity was found to be higher when involvement was high rather than when involvement was low.

A Method for Caption Segmentation using Minimum Spanning Tree

  • Chun, Byung-Tae;Kim, Kyuheon;Lee, Jae-Yeon
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.906-909
    • /
    • 2000
  • Conventional caption extraction methods use the difference between frames or color segmentation methods from the whole image. Because these methods depend heavily on heuristics, we should have a priori knowledge of the captions to be extracted. Also they are difficult to implement. In this paper, we propose a method that uses little heuristics and simplified algorithm. We use topographical features of characters to extract the character points and use KMST(Kruskal minimum spanning tree) to extract the candidate regions for captions. Character regions are determined by testing several conditions and verifying those candidate regions. Experimental results show that the candidate region extraction rate is 100%, and the character region extraction rate is 98.2%. And then we can see the results that caption area in complex images is well extracted.

  • PDF

Parallel Injection Method for Improving Descriptive Performance of Bi-GRU Image Captions (Bi-GRU 이미지 캡션의 서술 성능 향상을 위한 Parallel Injection 기법 연구)

  • Lee, Jun Hee;Lee, Soo Hwan;Tae, Soo Ho;Seo, Dong Hoan
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.11
    • /
    • pp.1223-1232
    • /
    • 2019
  • The injection is the input method of the image feature vector from the encoder to the decoder. Since the image feature vector contains object details such as color and texture, it is essential to generate image captions. However, the bidirectional decoder model using the existing injection method only inputs the image feature vector in the first step, so image feature vectors of the backward sequence are vanishing. This problem makes it difficult to describe the context in detail. Therefore, in this paper, we propose the parallel injection method to improve the description performance of image captions. The proposed Injection method fuses all embeddings and image vectors to preserve the context. Also, We optimize our image caption model with Bidirectional Gated Recurrent Unit (Bi-GRU) to reduce the amount of computation of the decoder. To validate the proposed model, experiments were conducted with a certified image caption dataset, demonstrating excellence in comparison with the latest models using BLEU and METEOR scores. The proposed model improved the BLEU score up to 20.2 points and the METEOR score up to 3.65 points compared to the existing caption model.