• Title/Summary/Keyword: systolic array structure

Search Result 33, Processing Time 0.02 seconds

Design of an Efficient VLSI Architecture of SADCT Based on Systolic Array (시스톨릭 어레이에 기반한 SADCT의 효율적 VLSl 구조설계)

  • Gang, Tae-Jun;Jeong, Ui-Yun;Gwon, Sun-Gyu;Ha, Yeong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.3
    • /
    • pp.282-291
    • /
    • 2001
  • In this paper, an efficient VLSI architecture of Shape Adaptive Discrete Cosine Transform(SADCT) based on systolic array is proposed. Since transform size in SADCT is varied according to the shape of object in each block, it are dropped that both usability of processing elements(PE´s) and throughput rate in time-recursive SADCT structure. To overcome these disadvantages, it is proposed that the architecture based on a systolic way structure which doesn´t need memory. In the proposed architecture, throughput rate is improved by consecutive processing of one-dimensional SADCT without memory and PE´s in the first column are connected to that in the last one for improvement of usability of PE. And input data are put into each column of PE in parallel according to the maximum data number in each rearranged block. The proposed architecture is described by VHDL. Also, its function is evaluated by MentorTM. Even though the hardware complexity is somewhat increased, the throughput rate is improved about twofold.

  • PDF

An Optimal Circuit Structure for Implementing SEED Cipher Algorithm with Verilog HDL (SEED 암호알고리즘의 Verilog HDL 구현을 위한 최적화 회로구조)

  • Lee, Haeng Woo
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.1
    • /
    • pp.107-115
    • /
    • 2012
  • This paper proposes on the structure for reducing the circuit area and increasing the computation speed in implementing to hardware using the SEED algorithm of a 128-bit block cipher. SEED cipher can be implemented with S/W or H/W method. It should be important that we have minimize the area and computation time in H/W implementation. To increase the computation speed, we used the structure of the pipelined systolic array, and this structure is a simple thing without including any buffer at the input and output circuit. This circuit can record the encryption rate of 320 Mbps at 10 MHz clock. We have designed the circuit with the Verilog HDL coding showing the circuit performances in the figures and the table.

Design of Adaptive Filter for Muscle Response Suppression and FPGA Implementation (근 반응제거를 위한 적응필터 설계와 FPGA 구현)

  • 염호준;박영철;윤형로
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.52 no.12
    • /
    • pp.708-716
    • /
    • 2003
  • The surface EMG signal detected from voluntarily activated muscles can be used as a control signal for functional electrical stimulation. To use the voluntary EMG signal, it is necessary to eliminate the muscle response evoked by the electrical stimulation and enable to process the algorithm in real time. In this paper, we propose the Gram-Schmidt(GS) algorithm and implement it in FPGA(field programmable gate array). GS algorithm is efficient to eliminate periodic signals like muscle response, and is more stable and suitable to FPGA implementations than the conventional least-square approach, due to the systolic array structure.

Trends in AI Processor Technology (인공지능프로세서 기술 동향)

  • Lee, M.Y.;Chung, J.;Lee, J.H.;Han, J.H.;Kwon, Y.S.
    • Electronics and Telecommunications Trends
    • /
    • v.35 no.3
    • /
    • pp.66-75
    • /
    • 2020
  • As the increasing expectations of a practical AI (Artificial Intelligence) service makes AI algorithms more complicated, an efficient processor to process AI algorithms is required. To meet this requirement, processors optimized for parallel processing, such as GPUs (Graphics Processing Units), have been widely employed. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted. This paper briefly introduces an AI processor especially for inference acceleration, developed by the Electronics and Telecommunications Research Institute, South Korea., and other global vendors for mobile and server platforms. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted.

A Decoder Design for High-Speed RS code (RS 코드를 이용한 복호기 설계)

  • 박화세;김은원
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.35T no.1
    • /
    • pp.59-66
    • /
    • 1998
  • In this paper, the high-speed decoder for RS(Reed-Solomon) code, one of the most popular error correcting code, is implemented using VHDL. This RS decoder is designed in transform domain instead of most time domain. Because of the simplicity in structure, transform decoder can be easily realized VLSI chip. Additionally the pipeline architecture, which is similar to a systolic array is applied for all design. Therefore, This transform RS decoder is suitable for high-rate data transfer. After synthesis with FPGA technology, the decoding rate is more 43 Mbytes/s and the area is 1853 LCs(Logic Cells). To compare with other product with pipeline architecture, this result is admirable. Error correcting ability and pipeline performance is certified by computer simulation.

  • PDF

A Systolic Array Structured Decision Feedback Equalizer based on Extended QR-RLS Algorithm (확장 QR-RLS 알고리즘을 이용한 시스토릭 어레이 구조의 결정 궤환 등화기)

  • Lee Won Cheol
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.11C
    • /
    • pp.1518-1526
    • /
    • 2004
  • In this paper, an algorithm using wavelet transform for detecting a cut that is a radical scene transition point, and fade and dissolve that are gradual scene transition points is proposed. The conventional methods using wavelet transform for this purpose is using features in both spatial and frequency domain. But in the proposed algorithm, the color space of an input image is converted to YUV and then luminance component Y is transformed in frequency domain using 2-level lifting. Then, the histogram of only low frequency subband that may contain some spatial domain features is compared with the previous one. Edges obtained from other higher bands can be divided into global, semi-global and local regions and the histogram of each edge region is compared. The experimental results show the performance improvement of about 17% in recall and 18% in precision and also show a good performance in fade and dissolve detection.

Design of Systolic Multipliers in GF(2$^{m}$ ) Using an Irreducible All One Polynomial (기약 All One Polynomial을 이용한 유한체 GF(2$^{m}$ )상의 시스톨릭 곱셈기 설계)

  • Gwon, Sun Hak;Kim, Chang Hun;Hong, Chun Pyo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.8C
    • /
    • pp.1047-1054
    • /
    • 2004
  • In this paper, we present two systolic arrays for computing multiplications in CF(2$\^$m/) generated by an irreducible all one polynomial (AOP). The proposed two systolic mays have parallel-in parallel-out structure. The first systolic multiplier has area complexity of O(㎡) and time complexity of O(1). In other words, the multiplier consists of m(m+1)/2 identical cells and produces multiplication results at a rate of one every 1 clock cycle, after an initial delay of m/2+1 cycles. Compared with the previously proposed related multiplier using AOP, our design has 12 percent reduced hardware complexity and 50 percent reduced computation delay time. The other systolic multiplier, designed for cryptographic applications, has area complexity of O(m) and time complexity of O(m), i.e., it is composed of m+1 identical cells and produces multiplication results at a rate of one every m/2+1 clock cycles. Compared with other linear systolic multipliers, we find that our design has at least 43 percent reduced hardware complexity, 83 percent reduced computation delay time, and has twice higher throughput rate Furthermore, since the proposed two architectures have a high regularity and modularity, they are well suited to VLSI implementations. Therefore, when the proposed architectures are used for GF(2$\^$m/) applications, one can achieve maximum throughput performance with least hardware requirements.

Characteristic analysis of Modular Multipliers and Squarers for GF($2^m$) (유한 필드 GF($2^m$)상의 모듈러 곱셈기 및 제곱기 특성 분석)

  • 한상덕;김창훈;홍춘표
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.7 no.5
    • /
    • pp.167-174
    • /
    • 2002
  • This paper analyzes the characteristics of three multipliers and squarers in finite fields GF(2/sup m/) from the point of view of processing time and area complexity. First, we analyze structures of three multipliers and squarers: 1) Systolic array structure, 2), LFSR structure, and 3) CA structure. To make performance analysis, each multiplier and squarer was modeled in VHDL and was synthesized for FPGA implementation. The simulation results show that CA structure is the best from the point view of processing time, and LFSR structure is the best from the point of view of area complexity.

  • PDF

Implementation of a LSB-First Digit-Serial Multiplier for Finite Fields GF(2m) (유한 필드 GF(2m)상에서의 LSB 우선 디지트 시리얼 곱셈기 구현)

  • Kim, Chang-Hun;Hong, Chun-Pyo;U, Jong-Jeong
    • The KIPS Transactions:PartA
    • /
    • v.9A no.3
    • /
    • pp.281-286
    • /
    • 2002
  • In this paper we, implement LSB-first digit-serial systolic multiplier for computing modular multiplication $A({\times})B$mod G ({\times})in finite fields GF $(2^m)$. If input data come in continuously, the implemented multiplier can produce multiplication results at a rate of one every [m/L] clock cycles, where L is the selected digit size. The analysis results show that the proposed architecture leads to a reduction of computational delay time and it has more simple structure than existing digit-serial systolic multiplier. Furthermore, since the propose architecture has the features of regularity, modularity, and unidirectional data flow, it shows good extension characteristics with respect to m and L.

Design of MSB-First Digit-Serial Multiplier for Finite Fields GF(2″) (유한 필드 $GF(2^m)$상에서의 MSB 우선 디지트 시리얼 곱셈기 설계)

  • 김창훈;한상덕;홍춘표
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.6C
    • /
    • pp.625-631
    • /
    • 2002
  • This paper presents a MSB-first digit-serial systolic array for computing modular multiplication of A(x)B(x) mod G(x) in finite fields $GF(2^m)$. From the MSB-first multiplication algorithm in $GF(2^m)$, we obtain a new data dependence graph and design an efficient digit-serial systolic multiplier. For circuit synthesis, we obtain VHDL code for multiplier, If input data come in continuously, the implemented multiplier can produce multiplication results at a rate of one every [m/L] clock cycles, where L is the selected digit size. The analysis results show that the proposed architecture leads to a reduction of computational delay time and it has much more simple structure than existing digit-serial systolic multiplier. Furthermore, since the propose architecture has the features of unidirectional data flow and regularity, it shows good extension characteristics with respect to m and L.