• Title/Summary/Keyword: adder sharing

Search Result 8, Processing Time 0.021 seconds

An Area Optimization Method for Digital Filter Design

  • Yoon, Sang-Hun;Chong, Jong-Wha;Lin, Chi-Ho
    • ETRI Journal
    • /
    • v.26 no.6
    • /
    • pp.545-554
    • /
    • 2004
  • In this paper, we propose an efficient design method for area optimization in a digital filter. The conventional methods to reduce the number of adders in a filter have the problem of a long critical path delay caused by the deep logic depth of the filter due to adder sharing. Furthermore, there is such a disadvantage that they use the transposed direct form (TDF) filter which needs more registers than those of the direct form (DF) filter. In this paper, we present a hybrid structure of a TDF and DF based on the flattened coefficients method so that it can reduce the number of flip-flops and full-adders without additional critical path delay. We also propose a resource sharing method and sharing-pattern searching algorithm to reduce the number of adders without deepening the logic depth. Simulation results show that the proposed structure can save the number of adders and registers by 22 and 26%, respectively, compared to the best one used in the past.

  • PDF

Low-power/high-speed DCT structure using common sub-expression sharing (Common sub-expression sharing을 이용한 고속/저전력 DCT 구조)

  • Jang, Young-Beom;Yang, Se-Jung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.119-128
    • /
    • 2004
  • In this paper, a low-power 8-point DCT structure is proposed using add and shift operations. Proposed structure adopts 4 cycles for complete 8-point DCT in order to minimize size of hardware and to enable high-speed processing. In the structure, hardware for the first cycle can be shared in the next 3 cycles since all columns in the DCT coefficient matrix are common except sign. Conventional DCT structures implemented with only add and shift operation use CSD(Canonic Signed Digit) form coefficients to reduce the number of adders. To reduce the number of adders further, we propose a new structure using common sub-expression sharing techniques. With this techniques, the proposed 8-point DCT structure achieves 19.5% adder reduction comparison to the conventional structure using only CSD coefficient form.

The Optimal Extraction Method of Adder Sharing Component for Inner Product and its Application to DCT Design (내적연산을 위한 가산기 공유항의 최적 추출기법 제안 및 이를 이용한 DCT 설계)

  • Im, Guk-Chan;Jang, Yeong-Jin;Lee, Hyeon-Su
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.7
    • /
    • pp.503-512
    • /
    • 2001
  • The general DSP algorithm, like orthogonal transform or filter processing, needs efficient hardware architecture to compute inner product. The typical MAC architecture has high cost of silicon. Because of this reason, the distributed arithmetic without multiplier is widely used for implementing inner product. This paper presents the optimization to reduce required hardware in distributed arithmetic by using extraction method of adder sharing component. The optimization process uses Boltzmann-machine which is one of the neural network. This proposed method can solve problem that is increasing complexity depending on depth of inner product and compose optimal summation-network with the minimum FA and FF in a few time. The designed DCT by using Proposed method is more efficient than a ROM-based distributed arithmetic.

  • PDF

A Single-Chip Design of Two-Dimensional Digital Riler with CSD Coefficients (CSD 계수에 의한 이차원 디지탈필터의 단일칩설계)

  • 문종억;송낙운;김창민
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.1
    • /
    • pp.241-250
    • /
    • 1996
  • In this work, an improved architecture of two-dimensional digital filter(2D DF) is suggested, and then the filter is simulated by C, VHDL language and related layouts are designed by Berkeley CAD tools. The 2D DF consists of one-dimensional digital filters and delay lines. For one-dimensional digital filter(1D DF) case, once filter coefficients are represented by canonical signed digit formats, multiplications are exected by hardwired-shifting methods. The related bit numbers are handled to prevent picture quality degradation and pipelined adder architectures are adopted in each tap and output stage to speed up the filter. For delay line case, line-sharing DRAM is adopted to improve power dissipation and speed. The filter layout is designed by semi/full custom methods considering regularity and speed improvement, and normal operation is confirmed by simulation.

  • PDF

A Clipping-free Multi-bit $\Sigma\Delta$ Modulator with Digital-controlled Analog Integrators (디지털 제어 적분형의 차단 현상이 없는 A/D 다중 비트 $\Sigma\Delta$ 변조기)

  • 이동연;김원찬
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.4
    • /
    • pp.26-35
    • /
    • 1997
  • This paper proposes a multi-bit $\Sigma\Delta$ modulator arcitecture which eliminates signal clipping problem. To avoid signal clipping, the output values of intgrators are monitored and modified by a reference value. This oepration is recorded as a digital code to restore actual signal value. Due to the digital code, the substraction of feedback value from the multi-bit quantizer can be calculated by a digital adder and this simplifies dAC operation making the accurate DAC of conventional multi-bit $\Sigma\Delta$ modulator scheme unnecessary. These features make N-th modulator can be implemented by sharing an integrator among N stages to decrease the required chip area. As an experimental example, a 4th order .sum..DELTA. modulator with oversampling ratio of 64 was simulated to show over 130 DB SNR at rail-to-rail input sinusoidal signal.

  • PDF

DCT/IDCT Processor Design using Adder-based Distributed Arithmetic (가산기-기반 분산 연산을 이용한 DCT/IDCT 프로세서 설계)

  • 임국찬;장영진;이현수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04a
    • /
    • pp.30-32
    • /
    • 2000
  • 내적을 계산하는데 있어서 Distributed Arithmetic(DA)을 사용하면 곱셈기를 사용하는 것보다 소비전력 및 크기를 효율적으로 줄일 수 있고, 고속동작이 가능한 회로구현이 쉽기 때문에 신호처리 시스템 설계에 많이 사용하고 있다. DA에는 롬-기반 DA와 가산기-기반 DA를 이용한 방법이 있는데, 가산기-기반 DA는 Sharing property와 계수의 Spare non-zero bit property를 최대한 이용하여 설계가 가능하기 때문에 크기 및 동작속도 측면에서 효율적인 구현이 가능하다. 본 논문에서는 가산기-기반 DA의 이러한 특성을 최대한 이용하여 멀티미디어 신호처리에 적합한 DCT/IDCT 프로세서를 설계하였고 다른 구조 및 롬-기반 DA와 비교 평가해본 결과 크기 및 속도 측면에서 효율적인 결과를 얻었다.

  • PDF

Full-Search Block-Matching Motion Estimation Circuit with Hybrid Architecture for MPEG-4 Encoder (하이브리드 구조를 갖는 MPEG-4 인코더용 전역 탐색 블록 정합 움직임 추정 회로)

  • Shim, Jae-Oh;Lee, Seon-Young;Cho, Kyeong-Soon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.2
    • /
    • pp.85-92
    • /
    • 2009
  • This paper proposes a full-search block-matching motion estimation circuit with hybrid architecture combining systolic arrays and adder trees for an MPEG-4 encoder. The proposed circuit uses systolic arrays for motion estimation with a small number of clock cycles and adder trees to reduce required circuit resources. The interpolation circuit for 1/2 pixel motion estimation consists of six adders, four subtracters and ten registers. We improved the circuit performance by resource sharing and efficient scheduling techniques. We described the motion estimation circuit for integer and 1/2 pixels at RTL in Verilog HDL. The logic-level circuit synthesized by using 130nm standard cell library contains 218,257 gates and can process 94 D1($720{\times}480$) image frames per second.

Low-Complexity Deeply Embedded CPU and SoC Implementation (낮은 복잡도의 Deeply Embedded 중앙처리장치 및 시스템온칩 구현)

  • Park, Chester Sungchung;Park, Sungkyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.3
    • /
    • pp.699-707
    • /
    • 2016
  • This paper proposes a low-complexity central processing unit (CPU) that is suitable for deeply embedded systems, including Internet of things (IoT) applications. The core features a 16-bit instruction set architecture (ISA) that leads to high code density, as well as a multicycle architecture with a counter-based control unit and adder sharing that lead to a small hardware area. A co-processor, instruction cache, AMBA bus, internal SRAM, external memory, on-chip debugger (OCD), and peripheral I/Os are placed around the core to make a system-on-a-chip (SoC) platform. This platform is based on a modified Harvard architecture to facilitate memory access by reducing the number of access clock cycles. The SoC platform and CPU were simulated and verified at the C and the assembly levels, and FPGA prototyping with integrated logic analysis was carried out. The CPU was synthesized at the ASIC front-end gate netlist level using a $0.18{\mu}m$ digital CMOS technology with 1.8V supply, resulting in a gate count of merely 7700 at a 50MHz clock speed. The SoC platform was embedded in an FPGA on a miniature board and applied to deeply embedded IoT applications.