• Title/Summary/Keyword: HEVC decoding

Search Result 37, Processing Time 0.152 seconds

An Analysis of Memory Access Complexity for HEVC Decoder (HEVC 복호화기의 메모리 접근 복잡도 분석)

  • Jo, Song Hyun;Kim, Youngnam;Song, Yong Ho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.5
    • /
    • pp.114-124
    • /
    • 2014
  • HEVC is a state-of-the-art video coding standard developed by JCT-VC. HEVC provides about 2 times higher subjective coding efficiency than H.264/AVC. One of the main goal of HEVC development is to efficiently coding UHD resolution video so that HEVC is expected to be widely used for coding UHD resolution video. Decoding such high resolution video generates a large number of memory accesses, so a decoding system needs high-bandwidth for memory system and/or internal communication architecture. In order to determine such requirements, this paper presents an analysis of the memory access complexity for HEVC decoder. we first estimate the amount of memory access performed by software HEVC decoder on an embedded system and a desktop computer. Then, we present the memory bandwidth models for HEVC decoder by analyzing the data flow of HEVC decoding tools. Experimental results show the software decoder produce 6.9-40.5 GB/s of DRAM accesses. also, the analysis reveals the hardware decoder requires 2.4 GB/s of DRAM bandwidth.

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

  • Kim, Yong-Hwan;Kim, Dong-Hyeok;Yi, Joo-Young;Kim, Je-Woo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.1
    • /
    • pp.1-9
    • /
    • 2014
  • This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.

SRP Based Programmable FHD HEVC Decoder (SRP 기반 FHD HEVC Decoder)

  • Song, Joon Ho;Lee, Sang-jo;Lee, Won Chang;Kim, Doo Hyun;Kim, Jae Hyun;Lee, Shihwa
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2014.06a
    • /
    • pp.160-162
    • /
    • 2014
  • A programmable video decoding system with multi-core DSP and co-processors is presented. This system is adopted by Digital TV SoC (System on Chip) and is used for FHD HEVC (High Efficiency Video Coding) decoder. Using the DSP based programmable solution, we can reduce commercialization period by one year because we can parallelize algorithm development, software optimization and hardware design. In addition to the HEVC decoding, the proposed system can be used for other application such as other video decoding standard for multi-format decoder or video quality enhancement.

  • PDF

Tile, Slice, and Deblocking Filter Parallelization Method in HEVC (HEVC 복호기에서의 타일, 슬라이스, 디블록킹 필터 병렬화 방법)

  • Son, Sohee;Baek, Aram;Choi, Haechul
    • Journal of Broadcast Engineering
    • /
    • v.22 no.4
    • /
    • pp.484-495
    • /
    • 2017
  • The development of display devices and the increase of network transmission bandwidth bring demands for over 2K high resolution video such as panorama video, 4K ultra-high definition commercial broadcasting, and ultra-wide viewing video. To compress these image sequences with significant amount of data, High Efficiency Video Coding (HEVC) standard with the highest coding efficiency is a promising solution. HEVC, the latest video coding standard, provides high encoding efficiency using various advanced encoding tools, but it also requires significant amounts of computation complexity compared to previous coding standards. In particular, the complexity of HEVC decoding process is a imposing challenges on real-time playback of ultra-high resolution video. To accelerate the HEVC decoding process for ultra high resolution video, this paper introduces a data-level parallel video decoding method using slice and/or tile supported by HEVC. Moreover, deblocking filter process is further parallelized. The proposed method distributes independent decoding operations of each tile and/or each slice to multiple threads as well as deblocking filter operations. The experimental results show that the proposed method facilitates executions up to 2.0 times faster than the HEVC reference software for 4K videos.

A Design of Pipelined-parallel CABAC Decoder Adaptive to HEVC Syntax Elements (HEVC 구문요소에 적응적인 파이프라인-병렬 CABAC 복호화기 설계)

  • Bae, Bong-Hee;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.5
    • /
    • pp.155-164
    • /
    • 2015
  • This paper describes a design and implementation of CABAC decoder, which would handle HEVC syntax elements in adaptively pipelined-parallel computation manner. Even though CABAC offers the high compression rate, it is limited in decoding performance due to context-based sequential computation, and strong data dependency between context models, as well as decoding procedure bin by bin. In order to enhance the decoding computation of HEVC CABAC, the flag-type syntax elements are adaptively pipelined by precomputing consecutive flag-type ones; and multi-bin syntax elements are decoded by processing bins in parallel up to three. Further, in order to accelerate Binary Arithmetic Decoder by reducing the critical path delay, the update and renormalization of context modeling are precomputed parallel for the cases of LPS as well as MPS, and then the context modeling renewal is selected by the precedent decoding result. It is simulated that the new HEVC CABAC architecture could achieve the max. performance of 1.01 bins/cycle, which is two times faster with respect to the conventional approach. In ASIC design with 65nm library, the CABAC architecture would handle 224 Mbins/sec, which could decode QFHD HEVC video data in real time.

An Efficient Interpolation Hardware Architecture for HEVC Inter-Prediction Decoding

  • Jin, Xianzhe;Ryoo, Kwangki
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.2
    • /
    • pp.118-123
    • /
    • 2013
  • This paper proposes an efficient hardware architecture for high efficiency video coding (HEVC), which is the next generation video compression standard. It adopts several new coding techniques to reduce the bit rate by about 50% compared with the previous one. Unlike the previous H.264/AVC 6-tap interpolation filter, in HEVC, a one-dimensional seven-tap and eight-tap filter is adopted for luma interpolation, but it also increases the complexity and gate area in hardware implementation. In this paper, we propose a parallel architecture to boost the interpolation performance, achieving a luma $4{\times}4$ block interpolation in 2-4 cycles. The proposed architecture contains shared operations reducing the gate count increased due to the parallel architecture. This makes the area efficiency better than the previous design, in the best case, with the performance improved by about 75.15%. It is synthesized with the MagnaChip $0.18{\mu}m$ library and can reach the maximum frequency of 200 MHz.

Fast Thumbnail Extraction Algorithm with Partial Decoding for HEVC (HEVC에서 부분복호화를 통한 썸네일 추출 알고리듬)

  • Lee, Wonjin;Jeong, Jechang
    • Journal of Broadcast Engineering
    • /
    • v.23 no.3
    • /
    • pp.431-436
    • /
    • 2018
  • In this paper, a simple but effective algorithm to reduce the computational complexity of thumbnail generation and to improve image quality without aliasing artifacts is proposed. For the high speed decoding, the proposed algorithm performs partial decoding per $4{\times}4$ boundary in TU(Transform Unit), and preforms TU boundary in PU(Prediction Unit). The proposed method defines the weights based on intra prediction directions and estimates the thumbnail pixel by using that weights. this method remains thumbnail extraction time and improves thumbnail image quality compared with conventional algorithms.

Multi-Sever based Distributed Coding based on HEVC/H.265 for Studio Quality Video Editing

  • Kim, Jongho;Lim, Sung-Chang;Jeong, Se-Yoon;Kim, Hui-Yong
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.201-208
    • /
    • 2018
  • High Efficiency Video Coding range extensions (HEVC RExt) is a kind of extension model of HEVC. HEVC RExt was specially designed for dealing the high quality images. HEVC RExt is very essential for studio editing which handle the very high quality and various type of images. There are some problems to dealing these massive data in studio editing. One of the most important procedure is re-encoding and decoding procedure during the editing. Various codecs are widely used for studio data editing. But most of the codecs have common problems to dealing the massive data in studio editing. First, the re-encoding and decoding processes are frequently occurred during the studio data editing and it brings enormous time-consuming and video quality loss. This paper, we suggest new video coding structure for the efficient studio video editing. The coding structure which is called "ultra-low delay (ULD)". It has the very simple and low-delayed referencing structure. To simplify the referencing structure, we can minimize the number of the frames which need decoding and re-encoding process. It also prevents the quality degradation caused by the frequent re-encoding. Various fast coding algorithms are also proposed for efficient editing such as tool-level optimization, multi-serve based distributed coding and SIMD (Single instruction, multiple data) based parallel processing. It can reduce the enormous computational complexity during the editing procedure. The proposed method shows 9500 times faster coding speed with negligible loss of quality. The proposed method also shows better coding gain compare to "intra only" structure. We can confirm that the proposed method can solve the existing problems of the studio video editing efficiently.

Tile Partitioning-based HEVC Parallel Decoding Optimization for Asymmetric Multicore Processor (비대칭 멀티코어 시스템 상의 HEVC 병렬 디코딩 최적화를 위한 타일 분할 기법)

  • Ryu, Yeongil;Roh, Hyun-Joon;Ryu, Eun-Seok
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.1060-1065
    • /
    • 2016
  • Recently, there is an emerging need for parallel UHD video processing, and the usage of computing systems that have an asymmetric processor such as ARM big.LITTLE is actively increasing. Thus, a new parallel UHD video processing method that is optimized for the asymmetric multicore systems is needed. This paper proposes a novel HEVC tile partitioning method for parallel processing by analyzing the computational power of asymmetric multicores. The proposed method analyzes (1) the computing power of asymmetric multicores and (2) the regression model of computational complexity per video resolution. Finally, the model (3) determines the optimal HEVC tile resolution for each core and partitions/allocates the tiles to suitable cores. The proposed method minimizes the gap in the decoding time between the fastest CPU core and the slowest CPU core. Experimental results with the 4K UHD official test sequences show average 20% improvement in the decoding speedup on the ARM asymmetric multicore system.

SIMD Instruction-based Fast HEVC RExt Decoder (SIMD 명령어 기반 HEVC RExt 복호화기 고속화)

  • Mok, Jung-Soo;Ahn, Yong-Jo;Ryu, Hochan;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.20 no.2
    • /
    • pp.224-237
    • /
    • 2015
  • In this paper, we introduce the fast decoding method with the SIMD (Single Instruction Multiple Data) instructions for HEVC RExt (High Efficiency Video Coding Range Extensions). Several tools of HEVC RExt such as intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules can be classified as the proper modules for applying the SIMD instructions. In consideration of bit-depth increasement of RExt, intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules are accelerated by SSE (Streaming SIMD Extension) instructions. In addition, we propose effective implementations for interpolation filter, inverse-quantization, and clipping modules by utilizing a set of AVX2 (Advanced Vector eXtension 2) instructions that can use 256 bits register. The evaluation of the proposed methods were performed on the private HEVC RExt decoder developed based on HM 16.0. The experimental results show that the developed RExt decoder reduces 12% average decoding time, compared with the conventional sequential method.