Search | Korea Science

Bit-level 1-dimensional systolic modular multiplication (비트 레벨 일차원 시스톨릭 모듈러 승산)

최성욱;우종호
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.33B no.9
- /
- pp.62-69
- /
- 1996
In this paper, the bit-level 1-dimensional systolic array for modular multiplication is designed. First of all, the parallel algorithm and data dependence graph from walter's method based on montgomery algorithm suitable for array design for modular multiplication is derived. By the systematic procedure for systolic array design, four 1-dimensional systolic arrays are obtained and then are evaluated by various criteria. As it is modified the array which is derived form [0,1] projection direction by adding a control logic and it is serialized the communication paths of data A, optimal 1-dimensional systolic array is designed. It has constant I/O channels for expansile module and it is easy for fault tolerance due to unidirectional paths. It is suitable for RSA cryptosystem which deals iwth the large size and many consecutive message blocks.
PDF

A Systolic Array for High-Speed Computing of Full Search Block Matching Algorithm

Jung, Soon-Ho;Woo, Chong-Ho
- Journal of Korea Multimedia Society
- /
- v.14 no.10
- /
- pp.1275-1286
- /
- 2011
This paper proposes a high speed systolic array architecture for full search block matching algorithm (FBMA). The pixels of the search area for a reference block are input only one time to find the matched candidate block and reused to compute the sum of absolute difference (SAD) for the adjacent candidate blocks. Each row of designed 2-dimensional systolic array compares the reference block with the adjacent blocks of the same row in search area. The lower rows of the designed array get the pixels from the upper row and compute the SAD with reusing the overlapped pixels of the candidate blocks within same column of the search area. This designed array has no data broadcasting and global paths. The comparison with existing architectures shows that this array is superior in terms of throughput through it requires a little more hardware.
https://doi.org/10.9717/kmms.2011.14.10.1275 인용 PDF KSCI

Systolic Arrays for Constructing Static and Dynamic Voronoi Diagrams (두 형의 Voronoi Diagram 구축을 위한 Systolic Arrays)

O, Seong-Jun
- ETRI Journal
- /
- v.10 no.3
- /
- pp.125-140
- /
- 1988
Computational geometry has wide applications in pattern recognition, image processing, VLSI design, and computer graphics. Voronoi diagrams in computational geometry possess many important properites which are related to other geometric structures of a set of point. In this pater the design of systolic algorithms for the static and the dynamic Voronoi diagrams is considered. The major motivation for developing the systolic architecture is for VLSI implementation. A new systematic transform technique for designing systolic arrays, in particular, for the problem in computational geometry has been proposed. Following this procedure, a type T systolic array architecture and associated systolic algorithms have been designed for constructing Voronoi diagrams. The functions of the cells in the array are also specified. The resulting systolic array achieves the maximal throughput with O(n) computational complexity.
PDF

Systolic Arrays for Lattice-Reduction-Aided MIMO Detection

Wang, Ni-Chun;Biglieri, Ezio;Yao, Kung
- Journal of Communications and Networks
- /
- v.13 no.5
- /
- pp.481-493
- /
- 2011
Multiple-input multiple-output (MIMO) technology provides high data rate and enhanced quality of service for wireless communications. Since the benefits from MIMO result in a heavy computational load in detectors, the design of low-complexity suboptimum receivers is currently an active area of research. Lattice-reduction-aided detection (LRAD) has been shown to be an effective low-complexity method with near-maximum-likelihood performance. In this paper, we advocate the use of systolic array architectures for MIMO receivers, and in particular we exhibit one of them based on LRAD. The "Lenstra-Lenstra-Lov$\acute{a}$sz (LLL) lattice reduction algorithm" and the ensuing linear detections or successive spatial-interference cancellations can be located in the same array, which is considerably hardware-efficient. Since the conventional form of the LLL algorithm is not immediately suitable for parallel processing, two modified LLL algorithms are considered here for the systolic array. LLL algorithm with full-size reduction-LLL is one of the versions more suitable for parallel processing. Another variant is the all-swap lattice-reduction (ASLR) algorithm for complex-valued lattices, which processes all lattice basis vectors simultaneously within one iteration. Our novel systolic array can operate both algorithms with different external logic controls. In order to simplify the systolic array design, we replace the Lov$\acute{a}$sz condition in the definition of LLL-reduced lattice with the looser Siegel condition. Simulation results show that for LR-aided linear detections, the bit-error-rate performance is still maintained with this relaxation. Comparisons between the two algorithms in terms of bit-error-rate performance, and average field-programmable gate array processing time in the systolic array are made, which shows that ASLR is a better choice for a systolic architecture, especially for systems with a large number of antennas.
PDF KSCI

A linear systolic array based architecture for full-search block matching motion estimator (선형 시스토릭 어레이를 이용한 완전탐색 블럭정합 이동 예측기의 구조)

김기현;이기철
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.2
- /
- pp.313-325
- /
- 1996
This paper presents a new architecture for full-search block-matching motion estimation. The architecture is based on linear systolic arrays. High speed operation is obtained by feeding reference data, search data, and control signals into the linear systolic array in a pipelined fashion. Input data are fed into the linear systolic array at a half of the processor speed, reducing the required data bandwidth to half. The proposed architecture has a good scalability with respect to the number of processors and input bandwidth when the size of reference block and search range change.
PDF

Content-Addressable Systolic Array for Solving Tridiagonal Linear Equation Systems (삼중대각행렬 선형방정식의 해를 구하기 위한 내용-주소법 씨스톨릭 어레이)

이병홍;김정선;채수환
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.16 no.6
- /
- pp.556-565
- /
- 1991
Using the WDZ decomposition algorithm, a parallel algorithm is presented for solving the linear system Ax=b which has an nxn nonsingular tridiagonal matrix. For implementing this algorithm a CAM systolic arrary is proposed, and each processing element of this array has its own CAM to store the nonzero elements of the tridiagonal matrix. In order to evaluate this array the algorithm presented is compared to theis compared to the LU decomposition algorithm. It is found that the execution time of the algorithm presented is reduced to about 1/4 than that of the LU decomposition algorithm. If each computation process step can be dome in one time unit, the system of eqations is solved in a systolic fashion without central control is obtained in 2n+1 time steps.
PDF

Design of Problem Size-Independent Systolic Array for Polyadic-Nonserial Dynamic Programming (Polyadic nonserial 동적 프로그래밍을 위한 문제크기에 독립적인 시스톨릭 어레이의 설계)

우창호;신동석;정신일;권대형
- Journal of the Korean Institute of Telematics and Electronics A
- /
- v.30A no.3
- /
- pp.67-75
- /
- 1993
In many practical applications of systolic array, it is common that the problem size(n) is larger than the array size(M). In this case, the problem has to be partitioned into block to fit into the array before it is processed. This paper presents a problem partition method for dynamic programming and 2-dimensional systolic array suitable for it. Designed array has two types of array configur-ation for processing the partitioned problem. The queue is designed for storing and recirculating the intermediate results in the correct location and time. The number of processing elements and queues required are M(3M+1)/2, 4M respectively. The total processing time is 2(M+1)+(n+10M+3)(n/M-1)(n/M-1)/6.
PDF

Comparison of High Speed Modular Multiplication and Design of Expansible Systolic Array (고속 모듈러 승산의 비교와 확장 가능한 시스톨릭 어레이의 설계)

Chu, Bong-Jo;Choe, Seong-Uk
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.5
- /
- pp.1219-1224
- /
- 1999
This paper derived Montgomery's parallel algorithms for modular multiplication based on Walter's and Iwamura's method, and compared data dependence graph of each parallel algorithm. Comparing the result, Walter's parallel algorithm has small computational index in data dependence graph, so it is selected and used to computed spatial and temporal pipelining diagrams with each projection direction for designing expansible bit-level systolic array. We also evaluated internal operation of proposed expansible systolic array C++ language.
PDF

Design 3×3 Convolution Calculator with Systolic Array (Systolic Array를 이용한 3×3 Convolution 연산기 설계)

Kim, Hyeong-Sun;Lee, Jun-Hee;Seo, Young-Ho
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- fall
- /
- pp.221-222
- /
- 2021
본 연구는 Convolution Neural Network에서 사용되는 Convolution 연산기를 Systolic Array를 이용하여 구현한다. 두 개의 층으로 나뉜 연산기에 고정 소수점 값을 가지는 커널 값과 연속적인 입력을 넣고 정확한 출력이 나오는지 확인한다. 연산기 구현은 Verilog HDL로 하였으며 대조 연산은 Python에서 진행하였다.
PDF

Design of High-Speed Correlator for a Binary CDMA (Binary CDMA를 위한 고속 코릴레이터 설계)

구군서;정우경;문장식;류승문;이용석
- Proceedings of the IEEK Conference
- /
- 2003.07b
- /
- pp.787-790
- /
- 2003
This paper describes a high speed correlator that can acquire synchronization quickly. The existing addition algorithm is a binary adder tree architecture that will result in extremely slow speed of operation due to many levels of logic required for computation of correlation[2][3]. This paper suggests the new various architectures, which are systolic array architecture, simple pipeline architecture and block systolic array architecture[4][5]. The acquisition performance of the proposed architectures is analyzed and compared with the existing architecture. The comparison results show that the systolic array architecture and the block systolic array architecture reduce the timing delay up to 73% and 31%, respectively. And the results show that the simple pipeline architecture reduces the timing delay up to 53%..
PDF

Search Result 144, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)