• Title/Summary/Keyword: 2D Video

Search Result 910, Processing Time 0.03 seconds

MPEG-4 to H.264 Transcoding (MPEG-4에서 H.264로 트랜스코딩)

  • 이성선;이영렬
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.5
    • /
    • pp.275-282
    • /
    • 2004
  • In this paper, a transcoding method that transforms MPEG-4 video bitstream coded in 30 Hz frame rate into H.264 video bitstream of 15 Hz frame rate is proposed. The block modes and motion vectors in MPEG-4 is utilized in H.264 for block mode conversion and motion vector (MV) interpolation methods. The proposed three types of MV interpolation method can be used without performing full motion estimation in H.264. The proposed transcoder reduces computation amount for full motion estimation in H.264 and provides good quality of H.264 video at low bitrates. In experimental results, the proposed methods achieves 3.2-4 times improvement in computational complexity compared to the cascaded pixel-domain transcoding, while the PSNR (peak signal to noise ratio) is degraded with 0.2-0.9dB depending on video sizes.

Intra Prediction Information Skip using Analysis of Adjacent Pixels for H.264/AVC (인접 화소 성분 분석을 이용한 H.264/AVC에서의 Intra 예측 정보 생략)

  • Kim, Dae-Yeon;Kim, Dong-Kyun;Lee, Yung-Lyul
    • Journal of Broadcast Engineering
    • /
    • v.14 no.3
    • /
    • pp.271-279
    • /
    • 2009
  • The Moving Picture Experts Group (MPEG) and Video Coding Experts Group (VCEG) have developed a new standard that promises to outperform the earlier MPEG-4 and H.263 standards. The new standard is called H.264/AVC (Advanced Video Coding) and is published jointly as MPEG-4 Part 10 and ITU-T Recommendation H.264. In particular, the H.264/AVC intra prediction coding provides nine directional prediction modes for every $4{\times}4$ block in order to reduce spatial redundancies. In this paper, an ABS (Adaptive Bit Skip) mode is proposed. In order to achieve coding efficiency, the proposed method can remove the mode bits to represent the prediction mode by using the similarity of adjacent pixels. Experimental results show that the proposed method achieves the PSNR gain of about 0.2 dB in R-D curve and reduces the bit rates about 3.6% compared with H.264/AVC.

Design and Implementation of Multi-View 3D Video Player (다시점 3차원 비디오 재생 시스템 설계 및 구현)

  • Heo, Young-Su;Park, Gwang-Hoon
    • Journal of Broadcast Engineering
    • /
    • v.16 no.2
    • /
    • pp.258-273
    • /
    • 2011
  • This paper designs and implements a multi-view 3D video player system which is operated faster than existing video player systems. The structure for obtaining the near optimum speed in a multi-processor environment by parallelizing the component modules is proposed to process large volumes of multi-view image data at high speed. In order to use the concurrency of bottleneck, we designed image decoding, synthesis and rendering modules in a pipeline structure. For load balancing, the decoder module is divided into the unit of viewpoint, and the image synthesis module is geometrically divided based on synthesized images. As a result of this experiment, multi-view images were correctly synthesized and the 3D sense could be felt when watching the images on the multi-view autostereoscopic display. The proposed application processing structure could be used to process large volumes of multi-view image data at high speed, using the multi-processors to their maximum capacity.

A Method for Improvement of Coding Efficiency in Scalability Extension of H.264/AVC (H.264/AVC Scalability Extension의 부호화 효율 향상 기법)

  • Kang, Chang-Soo
    • 전자공학회논문지 IE
    • /
    • v.47 no.2
    • /
    • pp.21-26
    • /
    • 2010
  • This paper proposed an efficient algorithm to reduce the amount of calculation for Scalability Extension which takes a great deal of the operational time in H.264/AVC. This algorithm decides a search range according to the direction of predicted motion vector, and then performs an adaptive spiral search for the candidates with JM(Joint Model) FME(Fast Motion Estimation) which employs the rate-distortion optimization(RDO) method. Experimental results by applying the proposed method to various video sequences showed that the process time was decreased up to 80% comparing to the previous prediction methods. The degradation of video Quality was only from 0.05dB to 0.19dB and the compression ratio decreased as small as 0.58% in average. Therefore, we are sure that the proposed method is an efficient method for the fast inter prediction.

Generation of Stereoscopic Image from 2D Image based on Saliency and Edge Modeling (관심맵과 에지 모델링을 이용한 2D 영상의 3D 변환)

  • Kim, Manbae
    • Journal of Broadcast Engineering
    • /
    • v.20 no.3
    • /
    • pp.368-378
    • /
    • 2015
  • 3D conversion technology has been studied over past decades and integrated to commercial 3D displays and 3DTVs. The 3D conversion plays an important role in the augmented functionality of three-dimensional television (3DTV), because it can easily provide 3D contents. Generally, depth cues extracted from a static image is used for generating a depth map followed by DIBR (Depth Image Based Rendering) rendering for producing a stereoscopic image. However except some particular images, the existence of depth cues is rare so that the consistent quality of a depth map cannot be accordingly guaranteed. Therefore, it is imperative to make a 3D conversion method that produces satisfactory and consistent 3D for diverse video contents. From this viewpoint, this paper proposes a novel method with applicability to general types of image. For this, saliency as well as edge is utilized. To generate a depth map, geometric perspective, affinity model and binomic filter are used. In the experiments, the proposed method was performed on 24 video clips with a variety of contents. From a subjective test for 3D perception and visual fatigue, satisfactory and comfortable viewing of 3D contents was validated.

The Development of Terrestrial DMB System for Video Associated Data Services (비디오 부가데이터 서비스를 위한 지상파 DMB 시스템 개발)

  • Kim, Hyun-Soon;Kyung, Il-Soo;Kim, Sang-Hun;Kim, Man-Sik
    • Journal of Broadcast Engineering
    • /
    • v.11 no.4 s.33
    • /
    • pp.541-553
    • /
    • 2006
  • Since DMB on-air was started, not high-qualified audio, video services but various service models have been required. This paper is about systems for one of these services, video associated data service. A terrestrial DMB system to make contents of video associated data services and transmit them on DMB channel is proposed in this paper. This system satisfies standard of the video associated data services for terrestrial DMB; MPEG-4 BIFS (BInary Format for Scene) Core2D scene description profile and graphics profile. This system is designed to support two major features of broadcasting, real-time authoring non automatic transmission and non real-time authoring automatic transmission, and focuses on the abilities to make high-qualified contents efficiently and transmit them to video encoder reliably. This system proved its performance through conformance tests with various receivers, so can be used in future on-air.

Error Resilient Scheme in Video Data Transmission using Information Hiding (정보은닉을 이용한 동영상 데이터의 전송 오류 보정)

  • Bae, Chang-Seok;Choe, Yoon-Sik
    • The KIPS Transactions:PartB
    • /
    • v.10B no.2
    • /
    • pp.189-196
    • /
    • 2003
  • This paper describes an error resilient video data transmission method using information hiding. In order to localize transmission errors in receiver, video encoder embeds one bit for a macro block during encoding process. Embedded information is detected during decoding process in the receiver, and the transmission errors can be localized by comparing the original embedding data. The localized transmission errors can be easily corrected, thus the degradation in a reconstructed image can be alleviated. Futhermore, the embedded information can be applied to protect intellectual property rights of the video data. Experimental results for 3 QCIF sized video sequenced composed of 150 frames respectively show that, while degradation in video streams in which the information is embedded is negligible, especially in a noisy channel, the average PSNR of reconstructed images can be improved about 5 dB by using embedded information. Also, intellectual property rights information can be effectively obtained from reconstructed images.

ASCII data hiding method based on blind video watermarking using minimum modification of motion vectors (움직임벡터의 변경 최소화 기법을 이용한 블라인드 비디오 워터마킹 기반의 문자 정보 은닉 기법)

  • Kang, Kyung-Won;Ryu, Tae-Kyung;Jeong, Tae-Il;Park, Tae-Hee;Kim, Jong-Nam;Moon, Kwang-Seok
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.1C
    • /
    • pp.78-85
    • /
    • 2007
  • With the advancement of the digital broadcasting and popularity of the Internet, recently, many studies are making on the digital watermarking for the copyright protection of digital data. This paper proposes the minimum modification method of motion vector to minimize the degradation of video quality, hiding subtitles of many language and information of OST(original sound track), character profiles, etc. as well as the copyright protection. Our proposed algorithm extracts feature vector by comparing motion vector data with watermark data, and minimize the modification of motion vectors by deciding the inversion of bit. Thus the degradation of video quality is minimized comparing to conventional algorithms. This algorithm also can check data integrity, and retrieve embedded hidden data simply and blindly. And our proposed scheme can be useful for conventional MPEG-1, -2 standards without any increment of bit rate in the compressed video domain. The experimental result shows that the proposed scheme obtains better video quality than other previous algorithms by about $0.5{\sim}1.5dB$.

Heterogeneous Resolution Stereo Video Coding System (이종 해상도 스테레오 비디오 코딩 시스템)

  • Park, Sea-Nae;Sim, Dong-Gyu
    • Journal of Broadcast Engineering
    • /
    • v.13 no.1
    • /
    • pp.162-173
    • /
    • 2008
  • In this paper, we propose an effective stereo-view video coding method that considers stereo-view and displayer characteristics. Current many stereo video displayers are designed for not only stereo display but also conventional single view display. In these systems, the resolution of two input videos for a stereo mode is half of that of single view for compatibility with conventional single view video services. In this raper, we propose a stereo video codec to deal with both single view and stereo view services by encoding whole left image and down-sampled right image. However, direct disparity estimation is not possible between two views because the resolution of a left image is different from that of the corresponding right image. So, we propose a disparity estimation method to make use of full information of the left reference image without down-sampling. In experimental result, we achieved $0.5{\sim}0.8\;dB$ coding gain, compared with several conventional algorithms.

Development of Emotion Recognition Model Using Audio-video Feature Extraction Multimodal Model (음성-영상 특징 추출 멀티모달 모델을 이용한 감정 인식 모델 개발)

  • Jong-Gu Kim;Jang-Woo Kwon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.4
    • /
    • pp.221-228
    • /
    • 2023
  • Physical and mental changes caused by emotions can affect various behaviors, such as driving or learning behavior. Therefore, recognizing these emotions is a very important task because it can be used in various industries, such as recognizing and controlling dangerous emotions while driving. In this paper, we attempted to solve the emotion recognition task by implementing a multimodal model that recognizes emotions using both audio and video data from different domains. After extracting voice from video data using RAVDESS data, features of voice data are extracted through a model using 2D-CNN. In addition, the video data features are extracted using a slowfast feature extractor. And the information contained in the audio and video data, which have different domains, are combined into one feature that contains all the information. Afterwards, emotion recognition is performed using the combined features. Lastly, we evaluate the conventional methods that how to combine results from models and how to vote two model's results and a method of unifying the domain through feature extraction, then combining the features and performing classification using a classifier.