• Title/Summary/Keyword: Video Frames

Search Result 885, Processing Time 0.026 seconds

Methods for Video Caption Extraction and Extracted Caption Image Enhancement (영화 비디오 자막 추출 및 추출된 자막 이미지 향상 방법)

  • Kim, So-Myung;Kwak, Sang-Shin;Choi, Yeong-Woo;Chung, Kyu-Sik
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.4
    • /
    • pp.235-247
    • /
    • 2002
  • For an efficient indexing and retrieval of digital video data, research on video caption extraction and recognition is required. This paper proposes methods for extracting artificial captions from video data and enhancing their image quality for an accurate Hangul and English character recognition. In the proposed methods, we first find locations of beginning and ending frames of the same caption contents and combine those multiple frames in each group by logical operation to remove background noises. During this process an evaluation is performed for detecting the integrated results with different caption images. After the multiple video frames are integrated, four different image enhancement techniques are applied to the image: resolution enhancement, contrast enhancement, stroke-based binarization, and morphological smoothing operations. By applying these operations to the video frames we can even improve the image quality of phonemes with complex strokes. Finding the beginning and ending locations of the frames with the same caption contents can be effectively used for the digital video indexing and browsing. We have tested the proposed methods with the video caption images containing both Hangul and English characters from cinema, and obtained the improved results of the character recognition.

MF sampler: Sampling method for improving the performance of a video based fashion retrieval model (MF sampler: 동영상 기반 패션 검색 모델의 성능 향상을 위한 샘플링 방법)

  • Baek, Sanghun;Park, Jonghyuk
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.329-346
    • /
    • 2022
  • Recently, as the market for short form videos (Instagram, TikTok, YouTube) on social media has gradually increased, research using them is actively being conducted in the artificial intelligence field. A representative research field is Video to Shop, which detects fashion products in videos and searches for product images. In such a video-based artificial intelligence model, product features are extracted using convolution operations. However, due to the limitation of computational resources, extracting features using all the frames in the video is practically impossible. For this reason, existing studies have improved the model's performance by sampling only a part of the entire frame or developing a sampling method using the subject's characteristics. In the existing Video to Shop study, when sampling frames, some frames are randomly sampled or sampled at even intervals. However, this sampling method degrades the performance of the fashion product search model while sampling noise frames where the product does not exist. Therefore, this paper proposes a sampling method MF (Missing Fashion items on frame) sampler that removes noise frames and improves the performance of the search model. MF sampler has improved the problem of resource limitations by developing a keyframe mechanism. In addition, the performance of the search model is improved through noise frame removal using the noise detection model. As a result of the experiment, it was confirmed that the proposed method improves the model's performance and helps the model training to be effective.

Luminance Projection Model for Efficient Video Similarity Measure (효율적인 비디오 유사도 측정을 위한 휘도 투영모델)

  • Kim, Sang-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.10 no.2
    • /
    • pp.132-135
    • /
    • 2009
  • The video similarity measure is very important factor to index and to retrieve for video data. In this paper, we propose the luminance projection model to measure the video similarity efficiently. Most algorithms for video indexing have been commonly used histograms, edges, or motion features, whereas in this paper, the proposed algorithm is employed an efficient measure using the luminance projection. To index effectively the video sequences and to decrease the computational complexity, we calculate video similarity using the key frames extracted by the cumulative measure, and compare the set of key frames using the modified Hausdorff distance. Experimental results show that the proposed luminance projection model yields the remarkable accuracy and performance than the conventional algorithm.

  • PDF

Gradient Fusion Method for Night Video Enhancement

  • Rao, Yunbo;Zhang, Yuhong;Gou, Jianping
    • ETRI Journal
    • /
    • v.35 no.5
    • /
    • pp.923-926
    • /
    • 2013
  • To resolve video enhancement problems, a novel method of gradient domain fusion wherein gradient domain frames of the background in daytime video are fused with nighttime video frames is proposed. To verify the superiority of the proposed method, it is compared to conventional techniques. The implemented output of our method is shown to offer enhanced visual quality.

Video Segmentation and Key frame Extraction using Multi-resolution Analysis and Statistical Characteristic

  • Cho, Wan-Hyun;Park, Soon-Young;Park, Jong-Hyun
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.457-469
    • /
    • 2003
  • In this paper, we have proposed the efficient algorithm that can segment the video scene change using a various statistical characteristics obtained from by applying the wavelet transformation for each frames. Our method firstly extracts the histogram features from low frequency subband of wavelet-transformed image and then uses these features to detect the abrupt scene change. Second, it extracts the edge information from applying the mesh method to the high frequency subband of transformed image. We quantify the extracted edge information as the values of variance characteristic of each pixel and use these values to detect the gradual scene change. And we have also proposed an algorithm how extract the proper key frame from segmented video scene. Experiment results show that the proposed method is both very efficient algorithm in segmenting video frames and also is to become the appropriate key frame extraction method.

Digital Video Watermarking Using Frame Division And 3D Wavelet Transform (프레임 분할과 3D 웨이블릿 변환을 이용한 비디오 워터마킹)

  • Kim, Kwang-Il;Cui, Jizhe;Kim, Jong-Weon;Choi, Jong-Uk
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.3
    • /
    • pp.155-162
    • /
    • 2008
  • In this paper we proposed a video watermarking algorithm based on a three dimension discrete wavelet transform (3D DWT) and direct spread spectrum (DSS). In the proposed method, the information watermark is embedded into followed frames, after sync watermark is embedded into the first frame. Input frames are divided into sub frames which are located odd row and even row. The sub frames are arranged as 3D frames, and transformed into 3D wavelet domain. In this domain the watermark is embedded using DSS. Existing video watermarking using 3D DWT is non-blind method but, proposed algorithm uses blind method. The experimental results show that the proposed algorithm is robust against frame cropping, noise addition, compression, etc. acquiring BER of 10% or below and sustains level of 40dB or above on the average.

A Method for Generating Inbetween Frames in Sign Language Animation (수화 애니메이션을 위한 중간 프레임 생성 방법)

  • O, Jeong-Geun;Kim, Sang-Cheol
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.5
    • /
    • pp.1317-1329
    • /
    • 2000
  • The advanced techniques for video processing and computer graphics enables a sign language education system to appear. the system is capable of showing a sign language motion for an arbitrary sentence using the captured video clips of sign language words. In this paper, a method is suggested which generates the frames between the last frame of a word and the first frame of its following word in order to animate hand motion. In our method, we find hand locations and angles which are required for in between frame generation, capture and store the hand images at those locations and angles. The inbetween frames generation is simply a task of finding a sequence of hand angles and locations. Our method is computationally simple and requires a relatively small amount of disk space. However, our experiments show that inbetween frames for the presentation at about 15fps (frame per second) are achieved so tat the smooth animation of hand motion is possible. Our method improves on previous works in which computation cost is relativey high or unnecessary images are generated.

  • PDF

Real-time Stabilization Method for Video acquired by Unmanned Aerial Vehicle (무인 항공기 촬영 동영상을 위한 실시간 안정화 기법)

  • Cho, Hyun-Tae;Bae, Hyo-Chul;Kim, Min-Uk;Yoon, Kyoungro
    • Journal of the Semiconductor & Display Technology
    • /
    • v.13 no.1
    • /
    • pp.27-33
    • /
    • 2014
  • Video from unmanned aerial vehicle (UAV) is influenced by natural environments due to the light-weight UAV, specifically by winds. Thus UAV's shaking movements make the video shaking. Objective of this paper is making a stabilized video by removing shakiness of video acquired by UAV. Stabilizer estimates camera's motion from calculation of optical flow between two successive frames. Estimated camera's movements have intended movements as well as unintended movements of shaking. Unintended movements are eliminated by smoothing process. Experimental results showed that our proposed method performs almost as good as the other off-line based stabilizer. However estimation of camera's movements, i.e., calculation of optical flow, becomes a bottleneck to the real-time stabilization. To solve this problem, we make parallel stabilizer making average 30 frames per second of stabilized video. Our proposed method can be used for the video acquired by UAV and also for the shaking video from non-professional users. The proposed method can also be used in any other fields which require object tracking, or accurate image analysis/representation.

A PSNR Estimation Method Exploiting the Visual Rhythm for Reconstructed Video Frames at IPTV Set-top Box (비쥬얼리듬을 이용한 IPTV Set-top Box 재생영상에 대한 PSNR 추정 기법)

  • Kwon, Jae-Cheol;Suh, Chang-Ryul
    • Journal of Broadcast Engineering
    • /
    • v.14 no.2
    • /
    • pp.114-126
    • /
    • 2009
  • In this paper, we propose a PSNR(peak-to-peak signal to noise ratio) estimation method exploiting visual rhythm information for the reconstructed video frames at the customer's STB(Set-top Box). Key idea is that we can estimate the PSNR by using VR(visual rhythm) information even though a VR consists of the pixels in a vertical direction of a 2D(2-dimensional) video frame, because VR is the 1D projected version of a 2D video frame approximately. Simulation results show that the estimated PSNR from VR information is closely related to the PSNR from 2D video frames. The advantages of the proposed scheme includes that it can monitor the video quality efficiently while minimizing the computation load of STB, and show the location, duration and occurrence count of severe picture degradation.

Distributed Coding Scheme for Multi-view Video through Efficient Side Information Generation

  • Yoo, Jihwan;Ko, Min Soo;Kwon, Soon Chul;Seo, Young-Ho;Kim, Dong-Wook;Yoo, Jisang
    • Journal of Electrical Engineering and Technology
    • /
    • v.9 no.5
    • /
    • pp.1762-1773
    • /
    • 2014
  • In this paper, a distributed image coding scheme for multi-view video through an efficient generation of side information is proposed. A distributed video coding technique corrects the errors in the side information, which is generated with the original image, by using the channel coding technique at the decoder. Therefore, the more correct the generated side information is, the better the performance of distributed video coding. The proposed technique is to apply the distributed video coding schemes to the image coding for multi-view video. It generates side information by selectively and efficiently using both 3-dimensional warping based on the depth map with spatially adjacent frames and motion-compensated temporal interpolation with temporally adjacent frames. In this scheme the difference between the adjacent frames, the sizes of the motion vectors for the adjacent blocks, and the edge information are used as the selection criteria. From the experiments, it was observed that the quality of the side information generated by the proposed technique was improved by the average peak signal-to-noise ratio of 0.97dB than the one by motion-compensated temporal interpolation or 3-dimensional warping. The result from analyzing the rate-distortion curves revealed that the proposed scheme could reduce the bit-rate by 8.01% on average at the same peak signal-to-noise ratio value, compared to previous work.