• Title/Summary/Keyword: multiplication algorithm

Search Result 427, Processing Time 0.02 seconds

The Montgomery Multiplier Using Scalable Carry Save Adder (분할형 CSA를 이용한 Montgomery 곱셈기)

  • 하재철;문상재
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.3
    • /
    • pp.77-83
    • /
    • 2000
  • This paper presents a new modular multiplier for Montgomery multiplication using iterative small carry save adder. The proposed multiplier is more flexible and suitable for long bit multiplication due to its scalable property according to design area and required computing time. We describe the word-based Montgomery algorithm and design architecture of the multiplier. Our analysis and simulation show that the proposed multiplier provides area/time tradeoffs in limited design area such as IC cards.

Parallel Modular Multiplication Algorithm to Improve Time and Space Complexity in Residue Number System (RNS상에서 시간 및 공간 복잡도 향상을 위한 병렬 모듈러 곱셈 알고리즘)

  • 박희주;김현성
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.9
    • /
    • pp.454-460
    • /
    • 2003
  • In this paper, we present a novel method of parallelization of the modular multiplication algorithm to improve time and space complexity on RNS (Residue Number System). The parallel algorithm executes modular reduction using new table lookup based reduction method. MRS (Mixed Radix number System) is used because algebraic comparison is difficult in RNS which has a non-weighted number representation. Conversion from residue number system to certain MRS is relatively fast in residue computer. Therefore magnitude comparison is easily Performed on MRS. By the analysis of the algorithm, it is known that it requires only 1/2 table size than previous approach. And it requires 0(ι) arithmetic operations using 2ㅣ processors.

Probability distribution-based approximation matrix multiplication simplification algorithm (확률분포 생성을 통한 근사 행렬 곱셈 간소화 방법)

  • Kwon, Oh-Young;Seo, Kyoung-Taek
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1623-1629
    • /
    • 2022
  • Matrix multiplication is a fundamental operation widely used in science and engineering. There is an approximate matrix multiplication method as a way to reduce the amount of computation of matrix multiplication. Approximate matrix multiplication determines an appropriate probability distribution for selecting columns and rows of matrices, and performs approximate matrix multiplication by selecting columns and rows of matrices according to this distribution. Probability distributions are generated by considering both matrices A and B participating in matrix multiplication. In this paper, we propose a method to generate a probability distribution that selects columns and rows of matrices to be used for approximate matrix multiplication, targeting only matrix A. Approximate matrix multiplication was performed on 1000×1000 ~ 5000×5000 matrices using existing and proposed methods. The approximate matrix multiplication applying the proposed method compared to the conventional method has been shown to be closer to the original matrix multiplication result, averaging 0.02% to 2.34%.

Analysis of Reduced-Width Truncated Mitchell Multiplication for Inferences Using CNNs

  • Kim, HyunJin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.15 no.5
    • /
    • pp.235-242
    • /
    • 2020
  • This paper analyzes the effect of reduced output width of the truncated logarithmic multiplication and application to inferences using convolutional neural networks (CNNs). For small hardware overhead, output width is reduced in the truncated Mitchell multiplier, so that fractional bits in multiplication output are minimized in error-resilient applications. This analysis shows that when reducing output width in the truncated Mitchell multiplier, even though worst-case relative error increases, average relative error can be kept small. When adopting 8 fractional bits in multiplication output in the evaluations, there is no significant performance degradation in target CNNs compared to existing exact and original Mitchell multipliers.

Sound Field Effect Implementation Using East Algorithm (고속 알고리즘을 이용한 음장 효과 구현)

  • Son Sung Young;Seo Joung Il;Hahn Minsoo
    • MALSORI
    • /
    • no.47
    • /
    • pp.85-96
    • /
    • 2003
  • It is difficult to implement sound field effect on real time using linear convolution in time domain because linear convolution needs much multiply operations. In this paper three ways is introduced to reduce multiplication operations. Firstly, linear convolution in time domain is replaced with circular convolution in frequency domain. It means that it operates multiplication in place of convolution. Secondly, one frame will be divided into several frames. It will reduce the multiplication operation in processing that transforms time domain into frequency domain. Finally, QFT will be used in place of FFT. Three ways result much reduction in multiplication operations. The reduction of the multiplication operation makes the real time implementation possible.

  • PDF

Implementation of Modular Multiplication and Communication Adaptor for Public Key Crytosystem (공개키 암호체계를 위한 Modular 곱셈개선과 통신회로 구현에 관한 연구)

  • 한선경;이선복;유영갑
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.7
    • /
    • pp.651-662
    • /
    • 1991
  • An improved modular multiplication algorithm for RSA type public key cryptosystem and its application to a serial communication cricuit are presented. Correction on a published fast modular multiplication algorithm is proposed and verified thru simulation. Cryptosystem for RS 232C communication protocol isdesigned and prototyped for low speed data exchange between computers. The system adops the correct algoroithm and operates successfully using a small size key.

  • PDF

High-Speed Array Multipliers Based on On-the-Fly Conversion

  • Moh, Sang-Man;Yoon, Suk-Han
    • ETRI Journal
    • /
    • v.19 no.4
    • /
    • pp.317-325
    • /
    • 1997
  • A new on-the-fly conversion algorithm is proposed, and high-speed array multipliers with the on-the-fly conversion are presented. The new on-the-fly conversion logic is used to speed up carry-propagate addition at the last stage of multiplication, and provides constant delay independent of the number of input bits. In this paper, the multiplication architecture and the on-the-fly conversion algorithm are presented and discussed in detail. The proposed architecture has multiplication time of (n +1)$t_{FA}$, Where n is the number of input bits and $t_{FA}$ is the delay of a full adder. According to our comparative performance evaluation, the proposed architecture has shorter delay and requires less area than the conventional array multiplier with on-the-fly conversion.

  • PDF

A Low-Complexity 128-Point Mixed-Radix FFT Processor for MB-OFDM UWB Systems

  • Cho, Sang-In;Kang, Kyu-Min
    • ETRI Journal
    • /
    • v.32 no.1
    • /
    • pp.1-10
    • /
    • 2010
  • In this paper, we present a fast Fourier transform (FFT) processor with four parallel data paths for multiband orthogonal frequency-division multiplexing ultra-wideband systems. The proposed 128-point FFT processor employs both a modified radix-$2^4$ algorithm and a radix-$2^3$ algorithm to significantly reduce the numbers of complex constant multipliers and complex booth multipliers. It also employs substructure-sharing multiplication units instead of constant multipliers to efficiently conduct multiplication operations with only addition and shift operations. The proposed FFT processor is implemented and tested using 0.18 ${\mu}m$ CMOS technology with a supply voltage of 1.8 V. The hardware- efficient 128-point FFT processor with four data streams can support a data processing rate of up to 1 Gsample/s while consuming 112 mW. The implementation results show that the proposed 128-point mixed-radix FFT architecture significantly reduces the hardware cost and power consumption in comparison to existing 128-point FFT architectures.

Bit-Level Systolic Array for Modular Multiplication (모듈러 곱셈연산을 위한 비트레벨 시스토릭 어레이)

  • 최성욱
    • Proceedings of the Korea Institutes of Information Security and Cryptology Conference
    • /
    • 1995.11a
    • /
    • pp.163-172
    • /
    • 1995
  • In this paper, the bit-level 1-dimensionl systolic array for modular multiplication are designed. First of all, the parallel algorithms and data dependence graphs from Walter's Iwamura's methods based on Montgomery Algorithm for modular multiplication are derived and compared. Since Walter's method has the smaller computational index points in data dependence graph than Iwamura's, it is selected as the base algorithm. By the systematic procedure for systolic array design, four 1-dimensional systolic arrays ale obtained and then are evaluated by various criteria. Modifying the array derived from 〔0,1〕 projection direction by adding a control logic and serializing the communication paths of data A, optimal 1-dimensional systolic array is designed. It has constant I/O channels for modular expandable and is good for fault tolerance due to unidirectional paths. And so, it is suitable for RSA Cryptosystem which deals with the large size and many consecutive message blocks.

  • PDF

Bit-level 1-dimensional systolic modular multiplication (비트 레벨 일차원 시스톨릭 모듈러 승산)

  • 최성욱;우종호
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.9
    • /
    • pp.62-69
    • /
    • 1996
  • In this paper, the bit-level 1-dimensional systolic array for modular multiplication is designed. First of all, the parallel algorithm and data dependence graph from walter's method based on montgomery algorithm suitable for array design for modular multiplication is derived. By the systematic procedure for systolic array design, four 1-dimensional systolic arrays are obtained and then are evaluated by various criteria. As it is modified the array which is derived form [0,1] projection direction by adding a control logic and it is serialized the communication paths of data A, optimal 1-dimensional systolic array is designed. It has constant I/O channels for expansile module and it is easy for fault tolerance due to unidirectional paths. It is suitable for RSA cryptosystem which deals iwth the large size and many consecutive message blocks.

  • PDF