• Title/Summary/Keyword: Systolic Parallel Processing

Search Result 34, Processing Time 0.019 seconds

A Design of Cellular Array Parallel Multiplier on Finite Fields GF(2m) (유한체 GF(2m)상의 셀 배열 병렬 승산기의 설계)

  • Seong, Hyeon-Kyeong
    • The KIPS Transactions:PartA
    • /
    • v.11A no.1
    • /
    • pp.1-10
    • /
    • 2004
  • A cellular array parallel multiplier with parallel-inputs and parallel-outputs for performing the multiplication of two polynomials in the finite fields GF$(2^m)$ is presented in this paper. The presented cellular way parallel multiplier consists of three operation parts: the multiplicative operation part (MULOP), the irreducible polynomial operation part (IPOP), and the modular operation part (MODOP). The MULOP and the MODOP are composed if the basic cells which are designed with AND Bates and XOR Bates. The IPOP is constructed by XOR gates and D flip-flops. This multiplier is simulated by clock period l${\mu}\textrm{s}$ using PSpice. The proposed multiplier is designed by 24 AND gates, 32 XOR gates and 4 D flip-flops when degree m is 4. In case of using AOP irreducible polynomial, this multiplier requires 24 AND gates and XOR fates respectively. and not use D flip-flop. The operating time of MULOP in the presented multiplier requires one unit time(clock time), and the operating time of MODOP using IPOP requires m unit times(clock times). Therefore total operating time is m+1 unit times(clock times). The cellular array parallel multiplier is simple and regular for the wire routing and have the properties of concurrency and modularity. Also, it is expansible for the multiplication of two polynomials in the finite fields with very large m.

VLSI Implementation of Low-Power Motion Estimation Using Reduced Memory Accesses and Computations (메모리 호출과 연산횟수 감소기법을 이용한 저전력 움직임추정 VLSI 구현)

  • Moon, Ji-Kyung;Kim, Nam-Sub;Kim, Jin-Sang;Cho, Won-Kyung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.5A
    • /
    • pp.503-509
    • /
    • 2007
  • Low-power motion estimation is required for video coding in portable information devices. In this paper, we propose a low-power motion estimation algorithm and 1-D systolic may VLSI architecture using full search block matching algorithm (FSBMA). Main power dissipation sources of FSBMA are complex computations and frequent memory accesses for data in the search area. In the proposed algorithm, memory accesses and computations are reduced by using 1D PE (processing array) array architecture performing motion estimation of two neighboring blocks in parallel and by skipping unnecessary computations during motion estimation. The VLSI implementation results of the algorithm show that the proposed VLSI architecture can save 9.3% power dissipation and can operate two times faster than an existing low-power motion estimator.

A Study on Motion Estimation Encoder Supporting Variable Block Size for H.264/AVC (H.264/AVC용 가변 블록 크기를 지원하는 움직임 추정 부호기의 연구)

  • Kim, Won-Sam;Sohn, Seung-Il
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.10
    • /
    • pp.1845-1852
    • /
    • 2008
  • The key elements of inter prediction are motion estimation(ME) and motion compensation(MC). Motion estimation is to find the optimum motion vectors, not only by using a distance criteria like the SAD, but also by taking into account the resulting number of 비트s in the 비트 stream. Motion compensation is compensate for movement of blocks of current frame. Inter-prediction Encoding is always the main bottleneck in high-quality streaming applications. Therefore, in real-time streaming applications, dedicated hardware for executing Inter-prediction is required. In this paper, we studied a motion estimator(ME) for H.264/AVC. The designed motion estimator is based on 2-D systolic array and it connects processing elements for fast SAD(Sum of Absolute Difference) calculation in parallel. By providing different path for the upper and lower lesion of each reference data and adjusting the input sequence, consecutive calculation for motion estimation is executed without pipeline stall. With data reuse technique, it reduces memory access, and there is no extra delay for finding optimal partitions and motion vectors. The motion estimator supports variable-block size and takes 328 cycles for macro-block calculation. The proposed architecture is local memory-free different from paper [6] using local memory. This motion estimation encoder can be applicable to real-time video processing.

Implementation of RSA modular exponentiator using Division Chain (나눗셈 체인을 이용한 RSA 모듈로 멱승기의 구현)

  • 김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.21-34
    • /
    • 2002
  • In this paper we propos a new hardware architecture of modular exponentiation using a division chain method which has been proposed in (2). Modular exponentiation using the division chain is performed by receding an exponent E as a mixed form of multiplication and addition with divisors d=2 or $d=2^I +1$ and respective remainders r. This calculates the modular exponentiation in about $1.4log_2$E multiplications on average which is much less iterations than $2log_2$E of conventional Binary Method. We designed a linear systolic array multiplier with pipelining and used a horizontal projection on its data dependence graph. So, for k-bit key, two k-bit data frames can be inputted simultaneously and two modular multipliers, each consisting of k/2+3 PE(Processing Element)s, can operate in parallel to accomplish 100% throughput. We propose a new encoding scheme to represent divisors and remainders of the division chain to keep regularity of the data path. When it is synthesized to ASIC using Samsung 0.5 um CMOS standard cell library, the critical path delay is 4.24ns, and resulting performance is estimated to be abort 140 Kbps for a 1024-bit data frame at 200Mhz clock In decryption process, the speed can be enhanced to 560kbps by using CRT(Chinese Remainder Theorem). Futhermore, to satisfy real time requirements we can choose small public exponent E, such as 3,17 or $2^{16} +1$, in encryption and verification process. in which case the performance can reach 7.3Mbps.