• Title/Summary/Keyword: DCT/IDCT Architecture

Search Result 20, Processing Time 0.027 seconds

Implementation of IQ/IDCT in H.264/AVC Decoder Using GP-GPU (GP-GPU를 이용한 H.264/AVC 디코더의 IQ/IDCT구현)

  • Jeong, Jun-Mo;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.14 no.2
    • /
    • pp.76-81
    • /
    • 2010
  • The need for dedicated hardware continue to decrease as the mobile CPU's performance increases. But, there is a limit to a mobile CPU's performance. GP-GPU(General-Purpose computing on Graphics Processing Units) can improve performance without adding other dedicated hardware. This paper presents the implementation of Inverse Quantization, Inverse DCT and Color Space Conversion module in H.264/AVC decoder using GP-GPU for a mobile environments. The proposed architecture improves approximately 40% of performance when it use all the features.

Two-dimentsional systolic arrays for DCT/DST/DHT hardware implementation (DCT/DST/DHT 하드웨어 구현을 위한 2차원 시스톨릭 어레이)

  • 판성범;박래홍
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.10
    • /
    • pp.11-20
    • /
    • 1994
  • We propose two architectures using two dimensional systolic arrays for the DCT/DST/DHT. One decomposes the N-point DCT/DST/DHT into even-and odd-numbered frequency samples, and then computes them independently at the same time. In addition, the proposed architecture can be used for the IDCT/IDST/IDHT. Anogher is the modified version for the DHT/IDHT. Two proposed architectures generate outputs sequentially using real multiplications and additions. As compared to the conventional methods the proposed systolic arrays exhibit many advantages in terms of simplicity of the processing element (PE), latency, and throughput. Teh simulation results using VHDL, international standard language for hardware description, show the effectiveness of the proposed architecture.

  • PDF

Motion Estimation and Compensation based on Advanced DCT (변환 영역에서 개선된 DCT를 기반으로 한 움직임 예측 및 보상)

  • Jang, Young;Cho, Hyo-Moon;Cho, Sang-Bock
    • Proceedings of the KIEE Conference
    • /
    • 2007.04a
    • /
    • pp.38-40
    • /
    • 2007
  • In this paper, we propose a novel architecture, which is based on DCT (Discrete Cosine Transform), for ME (Motion Estimation) and MC (Motion Compensation). The traditional algorithms of ME and MC based on DCT did not suffer the advantage of the coarseness of the 2-dimensional DCT (2-D DCT) coefficients to reduce the operational time. Therefore, we derive a recursion equation for transform-domain ME and MC and design the structure by using highly regular, parallel, and pipeline processing elements. The main difference with others is removing the IDCT block by using to transform domain. Therefore, the performance of our algorithm is more efficient in practical image processing such as DVR (Digital Video Recorder) system. We present the simulation result which is compare with the spatial domain methods. it shows reducing the calculation cost. compression ratio. and peak signal to noise ratio (PSNR).

  • PDF

A Design of high throughput IDCT processor in Distrited Arithmetic Method (처리율을 개선시킨 분산연산 방식의 IDCT 프로세서 설계)

  • 김병민;배현덕;조태원
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.40 no.6
    • /
    • pp.48-57
    • /
    • 2003
  • In this paper, An 8${\times}$l ID-IDCT processor with adder-based distributed arithmetic(DA) and bit-serial method Is presented. To reduce hardware cost and to improve operating speed, the proposed 8${\times}$1 ID-IDCT used the bit-serial method and DA method. The transform of coefficient equation results in reduction in hardware cost and has a regularity in implementation. The sign extension computation method reduces operation clock. As a result of logic synthesis, The gate count of designed 8${\times}$1 1D-IDCT is 17,504. The sign extension processing block has gate count of 3,620. That is 20% of total 8${\times}$1 ID-IDCT architecture. But the sign extension processing block improves more than twice in throughput. The designed IDCT processes 50Mpixels per second and at a clock frequency of 100MHz.

Discrete Cosine Transformer with Variable-Length Basis Vector for MPEG-4 Video Codec

  • Kuroda, Ryo;Fujita, Gen;Onoye, Takao;Shirakawa, Isao
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.811-814
    • /
    • 2000
  • It this paper a VLSI architecture of the Shape-Adaptive Discrete Cosine Transform (SA-DCT) is described, which can be employed dedicatedly for MPEG-4 video codec. Adopting a fast DCT algorithm, the number of multipliers can be reduced by half in comparison with a conventional algorithm. This SA-DCT core with a small additional amount of hardware can perform the SA-Inverse DCT (SA-IDCT) by sharing multipliers and a transportation memory. The proposed SA-DCT core is integrated with 40,000 gates by using 0.35$mu$m triple-metal CMOS technology, which operates at 20 Mhz, and hence enables the realtime codec of CIF ($352{\times}288$ pixels) pictures.

  • PDF

A High Throughput Multiple Transform Architecture for H.264/AVC Fidelity Range Extensions

  • Ma, Yao;Song, Yang;Ikenaga, Takeshi;Goto, Satoshi
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.7 no.4
    • /
    • pp.247-253
    • /
    • 2007
  • In this paper, a high throughput multiple transform architecture for H.264 Fidelity Range Extensions (FRExt) is proposed. New techniques are adopted which (1) regularize the $8{\times}8$ integer forward and inverse DCT transform matrices, (2) divide them into four $4{\times}4$ sub-matrices so that simple fast butterfly algorithm can be used, (3) because of the similarity of the sub-matrices, mixed butterflies are proposed that all the sub-matrices of $8{\times}8$ and matrices of $4{\times}4$ forward DCT (FDCT), inverse DCT (IDCT) and Hadamard transform can be merged together. Based on these techniques, a hardware architecture is realized which can achieve throughput of 1.488Gpixel/s when processing either $4{\times}4\;or\;8{\times}8$ transform. With such high throughput, the design can satisfy the critical requirement of the real-time multi-transform processing of High Definition (HD) applications such as High Definition DVD (HD-DVD) ($1920{\times}1080@60Hz$) in H.264/AVC FRExt. This work has been synthesized using Rohm 0.18um library. The design can work on a frequency of 93MHz and throughput of 1.488Gpixel/s with a cost of 56440 gates.

Design of Vector Register Architecture in DSP Processor for Efficient Multimedia Processing

  • Wu, Chou-Pin;Wu, Jen-Ming
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.7 no.4
    • /
    • pp.229-234
    • /
    • 2007
  • In this paper, we present an efficient instruction set architecture using vector register file hardware to accelerate operation of general matrix-vector operations in DSP microprocessor. The technique enables in-situ row-access as well as column access to the register files. It can reduce the number of memory access significantly. The technique is especially useful for block-based video signal processing kernels such as FFT/IFFT, DCT/IDCT, and two-dimensional filtering. We have applied the new instruction set architecture to in-loop deblocking filter processing in H.264 decoder. Performance comparisons show that the required load/store operations for the in-loop deblocking filter can be reduced about 42%. The architecture would improve the processing speed, and code density in DSP microprocessor especially for video signal processing substantially.

Hardwired Distributed Arithmetic for Multiple Constant Multiplications and Its Applications for Transformation (다중 상수 곱셈을 위한 하드 와이어드 분산 연산)

  • Kim, Dae-Won;Choi, Jun-Rim
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.949-952
    • /
    • 2005
  • We propose the hardwired distributed arithmetic which is applied to multiple constant multiplications and the fixed data path in the inner product of fixed coefficient as a result of variable radix-2 multi-bit coding. Variable radix-2 multi-bit coding is to reduce the partial product in constant multiplication and minimize the number of addition and shifts. At results, this procedure reduces the number of partial products that the required multiplication timing is shortened, whereas the area reduced relative to the DA architecture. Also, this architecture shows the best performance for DCT/IDCT and DWT architecture in the point of area reduction up to 20% from reducing the partial products up to 40% maximally.

  • PDF

Implementation of IQ/IDCT in H.264/AVC Decoder Using Mobile Multi-Core GPGPU (모바일 멀티 코어 GP-GPU를 이용한 H.264/AVC 디코더 구현)

  • Kim, Dong-Han;Lee, Kwang-Yeob;Jeong, Jun-Mo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.321-324
    • /
    • 2010
  • There have been lots of researches on a multi-core processor. The enhancement has been performed through parallelization method. Multi-core architecture in the mobile environment has emerged. But, there is a limit to a mobile CPU's performance. GP-GPU(General-Purpose computing on Graphics Processing Units) can improve performance without adding other dedicated hardware. This paper presents the implementation of Inverse Quantization, Inverse DCT and Color Space Conversion module in H.264/AVC decoder using Multi-Core GP-GPU for a mobile environments. The proposed architecture improves approximately 50% of performance when it use all the features.

  • PDF

High Throughput Parallel Design of 2-D $8{\times}8$ Integer Transforms for H.264/AVC (H.264/AVC 를 위한 높은 처리량의 2-D $8{\times}8$ integer transforms 병렬 구조 설계)

  • Sharma, Meeturani;Tiwari, Honey;Cho, Yong-Beom
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.8
    • /
    • pp.27-34
    • /
    • 2012
  • In this paper, the implementation of high throughput two-dimensional (2-D) $8{\times}8$ forward and inverse integer DCT transform for H.264 is presented. The forward and inverse transforms are represented using simple shift and addition operations. Matrix decomposition and matrix operation such as the Kronecker product and direct sum are used to reduce the computation complexity. The proposed design uses integer computations and does not use transpose memory and hence, the resource consumption is also reduced. The maximum operating frequency of the proposed pipelined architecture is 1.184 GHz, which achieves 25.27 Gpixels/sec throughput rate with the hardware cost of 44864 gates. High throughput and low hardware makes the proposed design useful for real time H.264/AVC high definition processing.