• Title/Summary/Keyword: systolic array

Search Result 144, Processing Time 0.024 seconds

A pipeline synthesis for a trace-back systolic array viterbi decoder (역추적 시스토릭 어레이 구조 비터비 복호기의 파이프라인 합성)

  • 정희도;김종태
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.3
    • /
    • pp.24-31
    • /
    • 1998
  • This paper presents a pipeline high-level synthesis tool for designing trace-back systolic array viterbi decoder. It consists of a dta flow graph(DFG) generator and a pipeline data path synthesis tool. First, the DFG of the vitrebi decoder is generated in the from of VHDL netlist. The inputs to the DFG generator are parameters of the convolution encoder. Next, the pipeline scheduling and allocationare performed. The synthesis tool explores the design space efficiently, synthesizes various designs which meet the given constraints, and choose the best one.

  • PDF

Fast Modular Exponentiation on a Systolic Array (시스톨릭 어레이상에서 고속 모듈러 지수 연산)

  • 이건직
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.8 no.1
    • /
    • pp.39-52
    • /
    • 1998
  • 본 논문에서는 모듈러 지수승시에 요구되는 모듈러 곱셈의 반복 횟수를 줄이기 위해 SM(m)기법을 제안하며 지수를 SM(m)표현과 시스톨릭 SM(m) 표현으로 변환한다.그리고 변환된 스스톨릭 SM(m) 표현으로부터 모듈러 지수연산을 위한 선형시스톨릭 어레이를 제시한다. 제안된 기법은 기존의 방법보다 소프트웨어로 구현시에 선 계산기에 필요한 기업 장소의 크기를 줄였으며, 선형 시스톨릭 어레이로 구현시에 기존의 방법들보다 처리기의 개수를 감소시키며, 처리기내에 필요한 기억 장소의 크기를 줄였다. 수정된 부호화 디지트 기법과 비교하면 처리기의 개수를 24%정도 줄일 수 있다.

Design and Implementation Systolic Array FFT Processor Based on Shared Memory (공유 메모리 기반 시스토릭 어레이 FFT 프로세서 설계 및 구현)

  • Jeong, Dongmin;Roh, yunseok;Son, Hanna;Jung, Yongchul;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.797-802
    • /
    • 2020
  • In this paper, we presents the design and implementation results of the FFT processor, which supports 4096 points of operation with less memory by sharing several memory used in the base-4 systolic array FFT processor into one memory. Sharing memory provides the advantage of reducing the area, and also simplifies the flow of data as I/O of the data progresses in one memory. The presented FFT processor was implemented and verified on the FPGA device. The implementation resulted in 51,855 CLB LUTs, 29,712 CLB registers, 8 block RAM tiles and 450 DSPs, and confirmed that the memory area could be reduced by 65% compared to the existing base-4 systolic array structure.

Efficient Architecture of an n-bit Radix-4 Modular Multiplier in Systolic Array Structure (시스톨릭 어레이 구조를 갖는 효율적인 n-비트 Radix-4 모듈러 곱셈기 구조)

  • Park, Tae-geun;Cho, Kwang-won
    • The KIPS Transactions:PartA
    • /
    • v.10A no.4
    • /
    • pp.279-284
    • /
    • 2003
  • In this paper, we propose an efficient architecture for radix-4 modular multiplication in systolic array structure based on the Montgomery's algorithm. We propose a radix-4 modular multiplication algorithm to reduce the number of iterations, so that it takes (3/2)n+2 clock cycles to complete an n-bit modular multiplication. Since we can interleave two consecutive modular multiplications for 100% hardware utilization and can start the next multiplication at the earliest possible moment, it takes about only n/2 clock cycles to complete one modular multiplication in the average. The proposed architecture is quite regular and scalable due to the systolic array structure so that it fits in a VLSI implementation. Compared to conventional approaches, the proposed architecture shows shorter period to complete a modular multiplication while requiring relatively less hardware resources.

Design and Implementation of Hi-speed/Low-power Extended QRD-RLS Equalizer using Systolic Array and CORDIC (시스톨릭 어레이 구조와 CORDIC을 사용한 고속/저전력 Extended QRD-RLS 등화기 설계 및 구현)

  • Moon, Dae-Won;Jang, Young-Beom;Cho, Yong-Hoon
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.6
    • /
    • pp.1-9
    • /
    • 2010
  • In this paper, we propose a hi-speed/low-power Extended QRD-RLS(QR-Decomposition Recursive Least Squares) equalizer with systolic array structure. In the conventional systolic array structure, vector mode CORDIC on the boundary cell calculates angle of input vector, and the rotation mode CORDIC on the internal cell rotates vector. But, in the proposed structure, it is shown that implementation complexity can be reduced using the rotation direction of vector mode CORDIC and rotation mode CORDIC. Furthermore, calculation time can be reduced by 1/2 since vector mode and rotation mode CORDIC operate at the same time. Through HDL coding and chip implementation, it is shown that implementation area is reduced by 23.8% compared with one of conventional structure.

Performance Analysis of Extended QRD-RLS Equalizer (Extended QRD-RLS 등화기의 성능 분석)

  • Jang, Jin-Kyu;Jang, Young-Beom
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.48 no.8
    • /
    • pp.27-35
    • /
    • 2011
  • In this paper, performances of the extended QRD-RLS equalizer is analyzed. Since the extended QRD-RLS equalizer is efficiently implemented by systolic array architecture, we analyze performances of this structure with signals of different lengths. By multiplying the frequency responses of the unknown channel and proposed equalizer, we observed the flatness of the overall system function. Through the simulation, it is shown that the performance of the extended QRD-RLS equalizer is remarkably increased with input signals of length 16.

A New N-time Systolic Array Architecture for the Vector Median Filter (N-time 시스톨릭 어레이 구조를 가지는 벡터 미디언 필터의 하드웨어 아키텍쳐)

  • Yang, Yeong-Yil
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.8 no.4
    • /
    • pp.293-296
    • /
    • 2007
  • In this paper, we propose the systolic array architecture for the vector median filter. In the color image processing, the vector signal (i.e. the color) consists of three elements, red, green and blue. The vector median filter is very effective to utilize the correlation among red, green and blue elements. The computational complexity of the proposed architecture for computing the vector median of N vector signals is (N+2) clock periods compared to the (3N+1) clock periods in the previous method. In addition to, the input vector signals can be loaded in serial in the proposed architecture. In the previous method, N input vector signals should be loaded to the vector median filter in parallel at the first clock. The proposed architecture is implemented with FPGA.

  • PDF

Conservative Approximation-Based Full-Search Block Matching Algorithm Architecture for QCIF Digital Video Employing Systolic Array Architecture

  • Ganapathi, Hegde;Amritha, Krishna R.S.;Pukhraj, Vaya
    • ETRI Journal
    • /
    • v.37 no.4
    • /
    • pp.772-779
    • /
    • 2015
  • This paper presents a power-efficient hardware realization for a motion estimation technique that is based on the full-search block matching algorithm (FSBMA). The considered input is the quarter common intermediate format of digital video. The mean of absolute difference (MAD) is the distortion criteria employed for the block matching process. The conventional architecture considered for the hardware realization of FSBMA is that of the shift register-based 2-D systolic array. For this architecture, a conservative approximation technique is adapted to eliminate unnecessary MAD computations involved in the block matching process. Upon introducing the technique to the conventional architecture, the power and complexity of its implantation is reduced, while the accuracy of the motion vector extracted from the block matching process is preserved. The proposed architecture is verified for its functional specifications. A performance evaluation of the proposed architecture is carried out using parameters such as power, area, operating frequency, and efficiency.

Design of a systolic array for forward-backward propagation of back-propagation algorithm (역전파 알고리즘의 전방향, 역방향 동시 수행을 위한 스스톨릭 배열의 설계)

  • 장명숙;유기영
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.9
    • /
    • pp.49-61
    • /
    • 1996
  • Back-propagation(BP) algorithm needs a lot of time to train the artificial neural network (ANN) to get high accuracy level in classification tasks. So there have been extensive researches to process back-propagation algorithm on parallel processors. This paper prsents a linear systolic array which calculates forward-backward propagation of BP algorithm at the same time using effective space-time transformation and PE structure. First, we analyze data flow of forwared and backward propagations and then, represent the BP algorithm into data dapendency graph (DG) which shows parallelism inherent in the BP algorithm. Next, apply space-time transformation on the DG of ANN is turn with orthogonal direction projection. By doing so, we can get a snakelike systolic array. Also we calculate the interval of input for parallel processing, calculate the indices to make the right datas be used at the right PE when forward and bvackward propagations are processed in the same PE. And then verify the correctness of output when forward and backward propagations are executed at the same time. By doing so, the proposed system maximizes parallelism of BP algorithm, minimizes th enumber of PEs. And it reduces the execution time by 2 times through making idle PEs participate in forward-backward propagation at the same time.

  • PDF