• Title/Summary/Keyword: distributed arithmetic(DA)

Search Result 29, Processing Time 0.025 seconds

High Speed 2D Discrete Cosine Transform Processor

  • Kim, Ji-Eun;Hae Kyung SEONG;Kang Hyeon RHEE
    • Proceedings of the IEEK Conference
    • /
    • 2002.07c
    • /
    • pp.1823-1826
    • /
    • 2002
  • On modern computer culture, the high quality data is required in multimedia systems. So, the technology of data compression fur data transmission is necessary now. This paper presents the pipeline architecture for the low and column address generator of 2D DCT/IDCT (Discrete Cosine Transform/Inverse Discrete Cosine Transform. In the proposed architecture, the area of hardware is reduced by using the DA (distributed arithmetic) method and applies the concepts of pipeline to the parallel architecture. As a result the designed pipeline of the low and column address generator for 2D DCT/IDCT architecture is implemented with an efficiency and high speed compared with the non-pipeline architecture.

  • PDF

A Study on the Implementation of Low Power DCT Architecture for MPEG-4 AVC (저전력 DCT를 이용한 MPEG-4 AVC 압축에 관한 연구)

  • Kim, Dong-Hoon;Seo, Sang-Jin;Park, Sang-Bong;Jin, Hyun-Joon;Park, Nho-Kyung
    • Proceedings of the KIEE Conference
    • /
    • 2007.10a
    • /
    • pp.371-372
    • /
    • 2007
  • In this paper we present performance and implementation comparisons of high performance two dimensional forward and inverse Discrete Cosine Transform (2D-DCT/IDCT) algorithm and low power algorithm for $8{\times}8$ 20 DCT and quantization based on partial sum and its corresponding hardware architecture for FPGA in MPEG-4. The architecture used in both low power 20 DCT and 2D IDCT is based on the conventional row-column decomposition method. The use of Fast algorithm and distributed arithmetic(DA) technique to implement the DCT/IDCT reduces the hardware complexity. The design was made using Mentor Graphics Tools for design entry and implementation. Mentor Graphics ModelSim SE6.1f was used for Verilog HDL entry, behavioral Simulation and Synthesis. The 2D DCT/IDCT consumes only 50% of the Operating Power.

  • PDF

High-Speed Radix-8 Butterfly Structure (고속 Radix-8 나비연산기구조)

  • Hur, Eun-Sung;Park, Jin-Su;Han, Kyu-Hoon;Jang, Young-Beom
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.85-86
    • /
    • 2007
  • In this paper, a Radix-8 structure for high-speed FFT is proposed. Even throughput of the Radix-8 FFT is twice than that of the Radix-4 FFT, implementation area of the Radix-8 is larger than that of Radix-4 FFT. But, implementation area of the proposed Radix-8 FFT was reduced by using DA(Distributed Arithmetic) for multiplication. The Verilog-HDL coding results for the proposed FFT structure show 49.2% cell area increment comparison with those of the conventional Radix-4 FFT structure. Namely, to speed up twice, 49.2% of area cost is required. In case of same throughput, power consumption of the proposed structure is reduced by 25.4%.

  • PDF

Optimization Design Method for Inner Product Using CSHM Algorithm and its Application to 1-D DCT Processor (연산공유 승산 알고리즘을 이용한 내적의 최적화 및 이를 이용한 1차원 DCT 프로세서 설계)

  • 이태욱;조상복
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.2
    • /
    • pp.86-93
    • /
    • 2004
  • The DCT algorithm needs an efficient hardware architecture to compute inner product. The conventional design method, like ROM-based DA(Distributed Arithmetic), has large hardware complexity. Because of this reason, a CSHM(Computation Sharing Multiplication) was proposed for implementing inner product by Park. However, the Park's CSHM has inefficient hardware architecture in the precomputer and select units. Therefore it degrades the performance of the multiplier. In this paper, we presents the optimization design method for inner product using CSHM algorithm and applied it to implementation of 1-D DCT processor. The experimental results show that the proposed multiplier is more efficient than Park's when hardware architectures and logic synthesis results were compared. The designed 1-D DCT processor by using proposed design method is more high performance than typical methods.

Design on Pipeline Architecture for the Low and Column Address Generator of 2D DCT/IDCT (2D DCT/IDCT의 행, 열 주소생성기를 위한 파이프라인 구조 설계)

  • 노진수;박종태;문규성;성해경;이강현
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2003.05b
    • /
    • pp.14-18
    • /
    • 2003
  • This paper presents the pipeline architecture for the low and column address generator of 2D DCT/IDCT(Discrete Cosine Transform/Inverse Discrete Cosine Transform). For the real time process of image data, it is required that high speed operation and small size hardware In the proposed architecture, the area of hardware is reduced by using the DA(distributed arithmetic) method and applying the concepts of pipeline on the parallel architecture. As a results, the designed pipeline of the low and column address generator for 2D DCT/IDCT architecture is implemented with an efficiency and high speed compared as the non-pipeline architecture. And the operation speed is improved about 50% up. The design for the proposed pipeline architecture of DCT/IDCT is coded using VHDL.

  • PDF

Variable Radix-Two Multibit Coding and Its VLSI Implementation of DCT/IDCT (가변길이 다중비트 코딩을 이용한 DCT/IDCT의 설계)

  • 김대원;최준림
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.12
    • /
    • pp.1062-1070
    • /
    • 2002
  • In this paper, variable radix-two multibit coding algorithm is presented and applied in the implementation of discrete cosine transform(DCT) and inverse discrete cosine transform(IDCT). Variable radix-two multibit coding means the 2k SD (signed digit) representation of overlapped multibit scanning with variable shift method. SD represented by 2k generates partial products, which can be easily implemented with shifters and adders. This algorithm is most powerful for the hardware implementation of DCT/IDCT with constant coefficient matrix multiplication. This paper introduces the suggested algorithm, it's proof and the implementation of DCT/IDCT The implemented IDCT chip with 8 PEs(Processing Elements) and one transpose memory runs at a tate of 400 Mpixels/sec at 54MHz frequency for high speed parallel signal processing, and it's verified in HDTV and MPEG decoder.

Design and Implementation of Low-Power DWT Processor for JPEG2000 Compression of Medical Images (의료영상의 JPEG2000 압축을 위한 저전력 DWT 프로세서의 설계 및 구현)

  • Jang Young-Beom;Lee Won-Sang;Yoo Sun-Kook
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.2
    • /
    • pp.124-130
    • /
    • 2005
  • In this paper, low-power design and implementation techniques for DWT(Discrete Wavelet Transform) of the JPEG2000 compression are proposed. In DWT block of the JPEG2000, linear phase 9 tap and 7 tap filters are used. For low-power implementation of those filters, processor technique for DA(Distributed Arithmetic) filter and minimization technique for number of addition in CSD(Canonic Signed Digit) filter are utilized. Proposed filter structure consists of 3 blocks. In the first CSD coefficient block, every possible 4 bit CSD coefficients are calculated and stored. In second processor block, multiplication is done by MUX and addition processor in terms of the binary values of filter coefficient. Finally, in third block, multiplied values are output and stored in flip-flop train. For comparison of the implementation area and power dissipation, proposed and conventional structures are implemented by using Verilog-HDL coding. In simulation, it is shown that 53.1% of the implementation area can be reduced comparison with those of the conventional structure.

Design of DCT/IDCT Core Processor using Module Generator Technique (모듈생성 기법을 이용한 DCT/IDCT 코어 프로세서의 설계)

  • 황준하;한택돈
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.10
    • /
    • pp.1433-1443
    • /
    • 1993
  • DCT(Discrete Cosine Transform) / IDCT(Inverse DCT) is widely used in various image compression and decompression systems as well as in DSP(Digital Signal Processing) applications. Since DCT/ IDCT is one of the most complicated part of the compression system, the performance of the system can be greatly enchanced by improving the speed of DCT/IDCT operation. In this thesis, we designed a DCT/IDCT core processor using module generator technique. By utilizing the partial sum and DA(Distributed Arithmetic) techniques, the DCT/ IDCT core processor is designed within small area. It is also designed to perform the IDCT(Inverse DCT) operation with little additional circuitry. The pipeline structure of the core processor enables the high performance, and the high accuracy of the DCT/IDCT operation is obtained by having fewer rounding stages. The proposed design is independent of design rules, and the number of the input bits and the accuracy of the internal calculation coa be easily adjusted due to the module generator technique. The accuracy of the processor satisfies the specifications in CCITT recommendation H, 261.

  • PDF

A VLSI Implementation of Real-time 8$\times$8 2-D DCT Processor for the Subprimary Rate Video Codec (저 전송률 비디오 코덱용 실시간 8$\times$8 이차원 DCT 처리기의 VLSI 구현)

  • 권용무;김형곤
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.15 no.1
    • /
    • pp.58-70
    • /
    • 1990
  • This paper describes a VLSI implementation of real-time two dimensional DCT processor for the subprimary rate video codec system. The proposed architecture exploits the parallelism and concurrency of the distributes architecture for vector inner product operation of DCT and meets the CCITT performance requirements of video codec for full CSIF 30 frames/sec. It is also shown that this architecture satisfies all the CCITT IDCT accuracy specification by simulating the suggested architecture in bit level. The efficient VLSI disign methodology to design suggested architecture is considered and the module generator oriented design environments are constructed based on SUN 3/150C workstation. Using the constructed design environments. the suggensted architecture have been designed by double metal 2micron CMOS technology. The chip area fo designed 8x8 2-D DA-DCT (Distributed Arithmetic DCT) processor is about 3.9mmx4.8mm.

  • PDF