• Title/Summary/Keyword: Distributed Arithmetic

Search Result 72, Processing Time 0.024 seconds

A VLSI Architecture of Systolic Array for FET Computation (고속 퓨리어 변환 연산용 VLSI 시스토릭 어레이 아키텍춰)

  • 신경욱;최병윤;이문기
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.9
    • /
    • pp.1115-1124
    • /
    • 1988
  • A two-dimensional systolic array for fast Fourier transform, which has a regular and recursive VLSI architecture is presented. The array is constructed with identical processing elements (PE) in mesh type, and due to its modularity, it can be expanded to an arbitrary size. A processing element consists of two data routing units, a butterfly arithmetic unit and a simple control unit. The array computes FFT through three procedures` I/O pipelining, data shuffling and butterfly arithmetic. By utilizing parallelism, pipelining and local communication geometry during data movement, the two-dimensional systolic array eliminates global and irregular commutation problems, which have been a limiting factor in VLSI implementation of FFT processor. The systolic array executes a half butterfly arithmetic based on a distributed arithmetic that can carry out multiplication with only adders. Also, the systolic array provides 100% PE activity, i.e., none of the PEs are idle at any time. A chip for half butterfly arithmetic, which consists of two BLC adders and registers, has been fabricated using a 3-um single metal P-well CMOS technology. With the half butterfly arithmetic execution time of about 500 ns which has been obtained b critical path delay simulation, totla FFT execution time for 1024 points is estimated about 16.6 us at clock frequency of 20MHz. A one-PE chip expnsible to anly size of array is being fabricated using a 2-um, double metal, P-well CMOS process. The chip was layouted using standard cell library and macrocell of BLC adder with the aid of auto-routing software. It consists of around 6000 transistors and 68 I/O pads on 3.4x2.8mm\ulcornerarea. A built-i self-testing circuit, BILBO (Built-In Logic Block Observation), was employed at the expense of 3% hardware overhead.

  • PDF

Multiplierless Digital PID Controller Using FPGA

  • Chivapreecha, Sorawat;Ronnarongrit, Narison;Yimman, Surapan;Pradabpet, Chusit;Dejhan, Kobchai
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.758-761
    • /
    • 2004
  • This paper proposes a design and implementation of multiplierless digital PID (Proportional-Integral-Derivative) controller using FPGA (Field Programmable Gate Array) for controlling the speed of DC motor in digital system. The multiplierless PID structure is based on Distributed Arithmetic (DA). The DA is an efficient way to compute an inner product using partial products, each can be obtained by using look-up table. The PID controller is designed using MATLAB program to generate a set of coefficients associated with a desired controller characteristics. The controller coefficients are then included in VHDL (Very high speed integrated circuit Hardware Description Language) that implements the PID controller onto FPGA. MATLAB program is used to activate the PID controller, calculate and plot the time response of the control system. In addition, the hardware implementation uses VHDL and synthesis using FLEX10K Altera FPGA as target technology and use MAX+plusII program for overall development. Results in design are shown the speed performance and used area of FPGA. Finally, the experimental results can be shown when compared with the simulation results from MATLAB.

  • PDF

High-throughput and low-area implementation of orthogonal matching pursuit algorithm for compressive sensing reconstruction

  • Nguyen, Vu Quan;Son, Woo Hyun;Parfieniuk, Marek;Trung, Luong Tran Nhat;Park, Sang Yoon
    • ETRI Journal
    • /
    • v.42 no.3
    • /
    • pp.376-387
    • /
    • 2020
  • Massive computation of the reconstruction algorithm for compressive sensing (CS) has been a major concern for its real-time application. In this paper, we propose a novel high-speed architecture for the orthogonal matching pursuit (OMP) algorithm, which is the most frequently used to reconstruct compressively sensed signals. The proposed design offers a very high throughput and includes an innovative pipeline architecture and scheduling algorithm. Least-squares problem solving, which requires a huge amount of computations in the OMP, is implemented by using systolic arrays with four new processing elements. In addition, a distributed-arithmetic-based circuit for matrix multiplication is proposed to counterbalance the area overhead caused by the multi-stage pipelining. The results of logic synthesis show that the proposed design reconstructs signals nearly 19 times faster while occupying an only 1.06 times larger area than the existing designs for N = 256, M = 64, and m = 16, where N is the number of the original samples, M is the length of the measurement vector, and m is the sparsity level of the signal.

Hardware Implementation of Discrete-Time Cellular Neural Networks Using Distributed Arithmetic (분산연산 방식을 이용한 이산시간 Cellular 신경회로망의 하드웨어 구현)

  • Park, Sung-Jun;Lim, Joon-Ho;Chae, Soo-Ik
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.1
    • /
    • pp.153-160
    • /
    • 1996
  • In this paper, we propose an efficient digital architecture for the discrete-time cellular neural networks (DTCNN's). DTCNN's have the locality and the translation invariance in the templates which determine the patterns of the connection between the cells. Using distributed arithmetic (DA) and the characteristics of DTCNN, we propose a simple implementation of DTCNN. The bus width in the cell-to-cell interconnection is reduced to one bit because of DA's bitwise operation. We implemented the reconfigurable architecture of DTCNN using programmable FPGA.

  • PDF

Statistical Inference for an Arithmetic Process

  • Francis, Leung Kit-Nam
    • Industrial Engineering and Management Systems
    • /
    • v.1 no.1
    • /
    • pp.87-92
    • /
    • 2002
  • A stochastic process {$A_n$, n = 1, 2, ...} is an arithmetic process (AP) if there exists some real number, d, so that {$A_n$ + (n-1)d, n =1, 2, ...} is a renewal process (RP). AP is a stochastically monotonic process and can be used for modeling a point process, i.e. point events occurring in a haphazard way in time (or space), especially with a trend. For example, the vents may be failures arising from a deteriorating machine; and such a series of failures id distributed haphazardly along a time continuum. In this paper, we discuss estimation procedures for an AP, similar to those for a geometric process (GP) proposed by Lam (1992). Two statistics are suggested for testing whether a given process is an AP. If this is so, we can estimate the parameters d, ${\mu}_{A1}$ and ${\sigma}^{2}_{A1}$ of the AP based on the techniques of simple linear regression, where ${\mu}_{A1}$ and ${\sigma}^2_{A1}$ are the mean and variance of the first random variable $A_1$ respectively. In this paper, the procedures are, for the most part, discussed in reliability terminology. Of course, the methods are valid in any area of application, in which case they should be interpreted accordingly.

An FPGA Implementation of the Synthesis Filter for MPEG-1 Audio Layer III by a Distributed Arithmetic Lookup Table (분산산술연산방식을 이용한 MPEG-1 오디오 계층 3 합성필터의 FPGA 군현)

  • Koh Sung-Shik;Choi Hyun-Yong;Kim Jong-Bin;Ku Dae-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.8
    • /
    • pp.554-561
    • /
    • 2004
  • As the technologies of semiconductor and multimedia communication have been improved. the high-quality video and the multi-channel audio have been highlighted. MPEG Audio Layer 3 decoder has been implemented as a Processor using a standard. Since the synthesis filter of MPEG-1 Audio Layer 3 decoder requires the most outstanding operation in the entire decoder. the synthesis filter that can reduce the amount of operation is needed for the design of the high-speed processor. Therefore, in this paper, the synthesis filter. the most important part of MPEG Audio, is materialized in FPGA using the method of DAULT (distributed arithemetic look-up table). For the design of high-speed synthesis filter, the DAULT method is used instead of a multiplier and a Pipeline structure is used. The Performance improvement by 30% is obtained by additionally making the result of multiplication of data with cosine function into the table. All hardware design of this Paper are described using VHDL (VHIC Hardware Description Language) Active-HDL 6.1 of ALDEC is used for VHDL simulation and Synplify Pro 7.2V is used for Model-sim and synthesis. The corresponding library is materialized by XC4013E and XC4020EX. XC4052XL of XILINX and XACT M1.4 is used for P&R tool. The materialized processor operates from 20MHz to 70MHz.

The Real-Time Implementation of Two-Dimensional FIR Digital Filter using PiPe-Line Method (파이프라인 방법을 이용한 이차원 FIR 디지털 필터의 실시간 구현)

  • 윤형태;이근영
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.30B no.5
    • /
    • pp.27-33
    • /
    • 1993
  • This paper describes the hardware implementation of 2-D FIR digital filter for a real-time image processing. Generally, the most time-consuming operation in signal processing is the multiplication operation. To avoid it in digital filter. Pelid and Liu proposed the distributed arithmetic method for the one-dimensional case. The implementation method proposed in this paper is to extend Pelid's method to two-dimensional FIR filter using simple ROM lookup table and to use the technique of pipe lining two main operations of memory access and arithmetic. As a result, the speed of our proposed hardware implementation is two times faster than that of conventional methods and can be close to the real time speed.

  • PDF

Adder-based Distributed Arithmetic DWT Processor Design (가산기-기반 분산연산 DWT 프로세서 설계)

  • 김영진;장영진;이현수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04a
    • /
    • pp.16-18
    • /
    • 2001
  • DWT(Discrete Wavelet Transform) 연산을 하는데 있어서, 가장 많은 연산을 수행하는 부분은 계수(Coefficient)값과 입력값의 내적 연산을 하는 부분이다. 내적 연산을 효율적으로 줄이기 위해서 시스톨릭, 파이프라인, 병렬구조등이 연구되었으나, 이러한 기존의 방법들은 내적 연산에 들어가는 곱셈의 수는 줄이지 못했다. 본 연구에서 가산기 기반 분산연산을 이용하여 곱셈연산을 제거하고, 동일한 연산과정을 공유함으로써 가산기의 수를 최대한 줄일 수 있었다. 또한, 한 개의 1-레벨 분해 모듈을 재사용하기 위해서 스케줄링을 사용하였다. 그 결과 기존의 구조보다 게이트 수를 50%이상 줄일 수 있었으며, 속도의 향상을 얻을 수 있었다.

Impact of Different Green-Ampt Model Parameters on the Distributed Rainfall-Runoff Model FLO-2D owing to Scale Heterogeneity (분포형 강우-유출 모형에서 토양도 격자크기 효과가 Green-Ampt 모형의 매개변수와 모의된 강우손실에 미치는 영향)

  • Hwang, Ji-hyeong;Lee, Khil-Ha
    • Journal of Environmental Science International
    • /
    • v.29 no.1
    • /
    • pp.15-23
    • /
    • 2020
  • The determination of soil characteristics is important in the simulation of rainfall runoff using a distributed FLO-2D model in catchment analysis. Digital maps acquired using remote sensing techniques have been widely used in modern hydrology. However, the determination of a representative parameter with spatial scaling mismatch is difficult. In this investigation, the FLO-2D rainfall-runoff model is utilized in the Yongdam catchment to test sensitivity based on three different methods (mosaic, arithmetic, and predominant) that describe soil surface characteristics in real systems. The results show that the mosaic method is costly, but provides a reasonably realistic description and exhibits superior performance compared to other methods in terms of both the amount and time to peak flow.

Direct Methods for Linear System on Distributed Memory Parallel Computers

  • Nishimura, S.;Shigehara, T.;Mizoguchi, H.;Mishima, T.;Kobayashi, H.
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.333-336
    • /
    • 2000
  • We discuss the direct methods (Gauss-Jordan and Gaussian eliminations) to solve linear systems on distributed memory parallel computers. It will be shown that the so-called row-cyclic storage gives rise to the best performance among the standard three (row-cyclic, column-cyclic and cyclic-cyclic) data storages. We also show that Gauss-Jordan elimination, rather than Gaussian elimination, is highly efficient for the direct solution of linear systems in parallel processing, though Gauss-Jordan elimination requires a larger number of arithmetic operations than Gaussian elimination. Numerical experiment is performed on HITACHI SR12201 with the standard libraries MPI and BLAS.

  • PDF