• Title/Summary/Keyword: CAVLD

Search Result 6, Processing Time 0.022 seconds

Design of Hardwired Variable Length Decoder for H.264/AVC (하드웨어 구조의 H.264/AVC 가변길이 복호기 설계)

  • Yu, Yong-Hoon;Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.11
    • /
    • pp.71-76
    • /
    • 2008
  • H.264(or MPEG-4/AVC pt.10) is a high performance video coding standard, and is widely used. Variable length code (VLC) of the H.264 standard compresses data using the statistical distribution of values. A decoder parses the compressed bit stream and searches decoded values in lookup tables, and the decoding process is not easy to implement by hardware. We propose an architecture of variable length decoder(VLD) for the H.264 baseline profile(BP) L4. The CAVLD decodes syntax elements using the combination of arithmetic units and lookup tables for the optimized hardware architecture. A barral shifter and a first 1's detector parse NAL bit stream, and are shared by Exp-Golomb decoder and CAVLD. A FIFO memory between CAVLD and the reorder unit and a buffer at the output of the reorder unit eliminate the bottleneck of data stream. The proposed VLD is designed using Verilog-HDL and is implemented using an FPGA. The synthesis result using a 0.18um standard CMOS technology shows that the gate count is 22,604 and the decoder can process HD($1920{\times}1080$) video at 120MHz.

A design of synchronous nonlinear and parallel for pipeline stage on IP-based H.264 decoder implementation (IP기반 H.264 디코더 설계를 위한 동기식 비선형 및 병렬화 파이프라인 설계)

  • Ko, Byung-Soo;Kong, Jin-Hyeung
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.409-410
    • /
    • 2008
  • This paper presents nonlinear and parallel design for synchronous pipelining in IP-based H.264 decoder implementation. Since H.264 decoder includes the dataflow of feedback loop, the data dependency requires one NOP stage per pipelining latency to drop the throughput into 1/2. Further, it is found that, in execution time, the stage scheduled for MC is more occupied than that for CAVLD/ITQ/DF. The less efficient stage would be improved by nonlinear scheduling, while the fully-utilized stage could be accelerated by parallel scheduling of IP. The optimization yields 3 nonlinear {CAVLD&ITQ}|3 parallel (MC/IP&Rec.)| 3 nonlinear {DF} pipelined architecture for IP-based H.264 decoder. In experiments, the nonlinear and parallel pipelined H.264 decoder, including existing IPs, could deal with full HD video at 41.86MHz, in real time processing.

  • PDF

High Throughput Parallel Decoding Method for H.264/AVC CAVLC

  • Yeo, Dong-Hoon;Shin, Hyun-Chul
    • ETRI Journal
    • /
    • v.31 no.5
    • /
    • pp.510-517
    • /
    • 2009
  • A high throughput parallel decoding method is developed for context-based adaptive variable length codes. In this paper, several new design ideas are devised and implemented for scalable parallel processing, a reduction in area, and a reduction in power requirements. First, simplified logical operations instead of memory lookups are used for parallel processing. Second, the codes are grouped based on their lengths for efficient logical operation. Third, up to M bits of the input stream can be analyzed simultaneously. For comparison, we designed a logical-operation-based parallel decoder for M=8 and a conventional parallel decoder. High-speed parallel decoding becomes possible with our method. In addition, for similar decoding rates (1.57 codes/cycle for M=8), our new approach uses 46% less chip area than the conventional method.

An optimization of synchronous pipeline design for IP-based H.264 decoder design (IP기반 H.264 디코더 설계를 위한 동기화 파이프라인 최적화)

  • Ko, Byung-Soo;Kong, Jin-Hyeung
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.407-408
    • /
    • 2008
  • This paper presents a synchronous pipeline design for IP-based H.264 decoding system. The first optimization for pipelining aims at efficiently resolving the data dependency due to motion compensation/intra prediction feedback data flow in H.264 decoder. The second one would enhance the efficiency of execution per each pipelining stage to explore the optimized latency and stage number. Thus, the 3 stage pipeline of CAVLD&ITQ|MC/IP&Rec.|DF is obtained to yield the best throughput and implementation. In experiments, it is found that the synchronous pipelined H.264 decoding system, based on existing IPs, could deal with Full HD video at 125.34MHz, in real time.

  • PDF

A New H.264/AVC CAVLC Parallel Decoding Circuit (새로운 H.264/AVC CAVLC 고속 병렬 복호화 회로)

  • Yeo, Dong-Hoon;Shin, Hyun-Chul
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.11
    • /
    • pp.35-43
    • /
    • 2008
  • A new effective parallel decoding method has been developed for context-based adaptive variable length codes. In this paper, several new design ideas have been devised for scalable parallel processing, less area, and less power. First, simplified logical operations instead of memory look-ups are used for fast low power operations. Second the codes are grouped based on their lengths for efficient logical operation. Third, up to M bits of input are simultaneously analyzed. For comparison, we have designed the logical operation based parallel decoder for M=8 and a typical conventional method based decoder. High speed parallel decoding is possible with our method. For similar decoding rates (1.57codes/cycle for M=8), our new approach uses 46% less area than the typical conventional method.

Efficient CAVLC Decoder VLSI Design for HD Images (HD급 영상을 효율적으로 복호하기 위한 CAVLC 복호화기 VLSI 설계)

  • Oh, Myung-Seok;Lee, Won-Jae;Kim, Jae-Seok
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.4 s.316
    • /
    • pp.51-59
    • /
    • 2007
  • In this paper, we propose an efficient hardware architecture for H.264/AVC CAVLC (Context-based Adaptive Variable Length Coding) decoding which used for baseline profile and extended profile. Previous CAVLC architectures are consisted of five step block and each block gets effective bits from Controller block and Accumulator. If large number of non-zero coefficients exist, process for getting effective bits has to iterates many times. In order to reduce this unnecessary process, we propose two techniques, which combine five steps into four steps and reduce process to get efficiency bit by skipping addition step. By adopting these two techniques, the required processing time was reduced about 26% compared with previous architectures. It was designed in a hardware description language and total logic gate count was 16.83k using 0.18um standard cell library.