Search | Korea Science

Montgomery Multiplier with Very Regular Behavior

Yoo-Jin Baek
- International Journal of Internet, Broadcasting and Communication
- /
- v.16 no.1
- /
- pp.17-28
- /
- 2024
As listed as one of the most important requirements for Post-Quantum Cryptography standardization process by National Institute of Standards and Technology, the resistance to various side-channel attacks is considered very critical in deploying cryptosystems in practice. In fact, cryptosystems can easily be broken by side-channel attacks, even though they are considered to be secure in the mathematical point of view. The timing attack(TA) and the simple power analysis attack(SPA) are such side-channel attack methods which can reveal sensitive information by analyzing the timing behavior or the power consumption pattern of cryptographic operations. Thus, appropriate measures against such attacks must carefully be considered in the early stage of cryptosystem's implementation process. The Montgomery multiplier is a commonly used and classical gadget in implementing big-number-based cryptosystems including RSA and ECC. And, as recently proposed as an alternative of building blocks for implementing post quantum cryptography such as lattice-based cryptography, the big-number multiplier including the Montgomery multiplier still plays a role in modern cryptography. However, in spite of its effectiveness and wide-adoption, the multiplier is known to be vulnerable to TA and SPA. And this paper proposes a new countermeasure for the Montgomery multiplier against TA and SPA. Briefly speaking, the new measure first represents a multiplication operand without 0 digits, so the resulting multiplication operation behaves in a very regular manner. Also, the new algorithm removes the extra final reduction (which is intrinsic to the modular multiplication) to make the resulting multiplier more timing-independent. Consequently, the resulting multiplier operates in constant time so that it totally removes any TA and SPA vulnerabilities. Since the proposed method can process multi bits at a time, implementers can also trade-off the performance with the resource usage to get desirable implementation characteristics.
https://doi.org/10.7236/IJIBC.2024.16.1.17 인용 PDF

A 32${\times}$32-b Multiplier Using a New Method to Reduce a Compression Level of Partial Products (부분곱 압축단을 줄인 32${\times}$32 비트 곱셈기)

홍상민;김병민;정인호;조태원
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.40 no.6
- /
- pp.447-458
- /
- 2003
A high speed multiplier is essential basic building block for digital signal processors today. Typically iterative algorithms in Signal processing applications are realized which need a large number of multiply, add and accumulate operations. This paper describes a macro block of a parallel structured multiplier which has adopted a 32$\times$32-b regularly structured tree (RST). To improve the speed of the tree part, modified partial product generation method has been devised at architecture level. This reduces the 4 levels of compression stage to 3 levels, and propagation delay in Wallace tree structure by utilizing 4-2 compressor as well. Furthermore, this enables tree part to be combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, multiplier architecture can be regularly laid out with same modules composed of Booth selectors, compressors and Modified Partial Product Generators (MPPG). At the circuit level new Booth selector with less transistors and encoder are proposed. The reduction in the number of transistors in Booth selector has a greater impact on the total transistor count. The transistor count of designed selector is 9 using PTL(Pass Transistor Logic). This reduces the transistor count by 50% as compared with that of the conventional one. The designed multiplier in 0.25${\mu}{\textrm}{m}$ technology, 2.5V, 1-poly and 5-metal CMOS process is simulated by Hspice and Epic. Delay is 4.2㎱ and average power consumes 1.81㎽/MHz. This result is far better than conventional multiplier with equal or better than the best one published.
PDF KSCI

Error Detection Architecture for Modular Operations (Modular 연산에 대한 오류 탐지)

Kim, Chang Han;Chang, Nam Su
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.27 no.2
- /
- pp.193-199
- /
- 2017
In this paper, we proposed an architecture of error detection in $Z_N$ operations using $Z_{(2^r-1)N}$. The error detection can be simply constructed in hardware. The hardware overheads are only 50% and 1% with respectively space and time complexity. The architecture is very efficient because it is detection 99% for 1 bit fault. For 2 bit fault, it is detection 99% and 50% with respective r=2 and r=3.
https://doi.org/10.13089/JKIISC.2017.27.2.193 인용 PDF KSCI HTML

Design of a Parallel Multiplier for Irreducible Polynomials with All Non-zero Coefficients over GF($p^m$) (GF($p^m$)상에서 모든 항의 계수가 0이 아닌 기약다항식에 대한 병렬 승산기의 설계)

Park, Seung-Yong;Hwang, Jong-Hak;Kim, Heung-Soo
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.39 no.4
- /
- pp.36-42
- /
- 2002
In this paper, we proposed a multiplicative algorithm for two polynomials with all non-zero coefficients over finite field GF($P^m$). Using the proposed multiplicative algorithm, we constructed the multiplier of modular architecture with parallel in-output. The proposed multiplier is composed of $(m+1)^2$ identical cells, each cell consists of one mod(p) additional gate and one mod(p) multiplicative gate. Proposed multiplier need one mod(p) multiplicative gate delay time and m mod(p) additional gate delay time not clock. Also, our architecture is regular and possesses the property of modularity, therefore well-suited for VLSI implementation.
PDF KSCI

Efficient Architecture of an n-bit Radix-4 Modular Multiplier in Systolic Array Structure (시스톨릭 어레이 구조를 갖는 효율적인 n-비트 Radix-4 모듈러 곱셈기 구조)

Park, Tae-geun;Cho, Kwang-won
- The KIPS Transactions:PartA
- /
- v.10A no.4
- /
- pp.279-284
- /
- 2003
In this paper, we propose an efficient architecture for radix-4 modular multiplication in systolic array structure based on the Montgomery's algorithm. We propose a radix-4 modular multiplication algorithm to reduce the number of iterations, so that it takes (3/2)n+2 clock cycles to complete an n-bit modular multiplication. Since we can interleave two consecutive modular multiplications for 100% hardware utilization and can start the next multiplication at the earliest possible moment, it takes about only n/2 clock cycles to complete one modular multiplication in the average. The proposed architecture is quite regular and scalable due to the systolic array structure so that it fits in a VLSI implementation. Compared to conventional approaches, the proposed architecture shows shorter period to complete a modular multiplication while requiring relatively less hardware resources.
https://doi.org/10.3745/KIPSTA.2003.10A.4.279 인용 PDF KSCI

Scalable RSA public-key cryptography processor based on CIOS Montgomery modular multiplication Algorithm (CIOS 몽고메리 모듈러 곱셈 알고리즘 기반 Scalable RSA 공개키 암호 프로세서)

Cho, Wook-Lae;Shin, Kyung-Wook
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.22 no.1
- /
- pp.100-108
- /
- 2018
This paper describes a design of scalable RSA public-key cryptography processor supporting four key lengths of 512/1,024/2,048/3,072 bits. The modular multiplier that is a core arithmetic block for RSA crypto-system was designed with 32-bit datapath, which is based on the CIOS (Coarsely Integrated Operand Scanning) Montgomery modular multiplication algorithm. The modular exponentiation was implemented by using L-R binary exponentiation algorithm. The scalable RSA crypto-processor was verified by FPGA implementation using Virtex-5 device, and it takes 456,051/3,496347/26,011,947/88,112,770 clock cycles for RSA computation for the key lengths of 512/1,024/2,048/3,072 bits. The RSA crypto-processor synthesized with a $0.18{\mu}m$ CMOS cell library occupies 10,672 gate equivalent (GE) and a memory bank of $6{\times}3,072$ bits. The estimated maximum clock frequency is 147 MHz, and the RSA decryption takes 3.1/23.8/177/599.4 msec for key lengths of 512/1,024/2,048/3,072 bits.
https://doi.org/10.6109/jkiice.2018.22.1.100 인용 PDF KSCI

Efficient Architectures for Modular Exponentiation Using Montgomery Multiplier (Montgomery 곱셈기를 이용한 효율적인 모듈라 멱승기 구조)

하재철;문상재
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.11 no.5
- /
- pp.63-74
- /
- 2001
Modular exponentiation is an essential operation required for implementations of most public key cryptosystems. This paper presents two architectures for modular exponentiation using the Montgomery modular multiplication algorithm combined with two binary exponentiation methods, L-R(Left to Left) algorithms. The proposed architectures make use of MUXes for efficient pre-computation and post-computation in Montgomery\`s algorithm. For an n-bit modulus, if mulitplication with m carry processing clocks can be done (n+m) clocks, the L-R type design requires (1.5n+5)(n+m) clocks on average for an exponentiation. The R-L type design takes (n+4)(n+m) clocks in the worst case.
https://doi.org/10.13089/JKIISC.2001.11.5.63 인용 PDF HTML

Efficient Bit-Parallel Multiplier for Binary Field Defind by Equally-Spaced Irreducible Polynomials (Equally Spaced 기약다항식 기반의 효율적인 이진체 비트-병렬 곱셈기)

Lee, Ok-Suk;Chang, Nam-Su;Kim, Chang-Han;Hong, Seok-Hie
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.18 no.2
- /
- pp.3-10
- /
- 2008
The choice of basis for representation of element in $GF(2^m)$ affects the efficiency of a multiplier. Among them, a multiplier using redundant representation efficiently supports trade-off between the area complexity and the time complexity since it can quickly carry out modular reduction. So time of a previous multiplier using redundant representation is faster than time of multiplier using others basis. But, the weakness of one has a upper space complexity compared to multiplier using others basis. In this paper, we propose a new efficient multiplier with consideration that polynomial exponentiation operations are frequently used in cryptographic hardwares. The proposed multiplier is suitable fer left-to-right exponentiation environment and provides efficiency between time and area complexity. And so, it has both time delay of $T_A+({\lceil}{\log}_2m{\rceil})T_x$ and area complexity of (2m-1)(m+s). As a result, the proposed multiplier reduces $2(ms+s^2)$ compared to the previous multiplier using equally-spaced polynomials in area complexity. In addition, it reduces $T_A+({\lceil}{\log}_2m+s{\rceil})T_x$ to $T_A+({\lceil}{\log}_2m{\rceil})T_x$ in the time complexity.($T_A$:Time delay of one AND gate, $T_x$:Time delay of one XOR gate, m:Degree of equally spaced irreducible polynomial, s:spacing factor)
https://doi.org/10.13089/JKIISC.2008.18.2.3 인용 PDF KSCI HTML

A Design of Cellular Array Parallel Multiplier on Finite Fields GF(2^m) (유한체 GF(2^m)상의 셀 배열 병렬 승산기의 설계)

Seong, Hyeon-Kyeong
- The KIPS Transactions:PartA
- /
- v.11A no.1
- /
- pp.1-10
- /
- 2004
A cellular array parallel multiplier with parallel-inputs and parallel-outputs for performing the multiplication of two polynomials in the finite fields GF$(2^m)$ is presented in this paper. The presented cellular way parallel multiplier consists of three operation parts: the multiplicative operation part (MULOP), the irreducible polynomial operation part (IPOP), and the modular operation part (MODOP). The MULOP and the MODOP are composed if the basic cells which are designed with AND Bates and XOR Bates. The IPOP is constructed by XOR gates and D flip-flops. This multiplier is simulated by clock period l${\mu}\textrm{s}$ using PSpice. The proposed multiplier is designed by 24 AND gates, 32 XOR gates and 4 D flip-flops when degree m is 4. In case of using AOP irreducible polynomial, this multiplier requires 24 AND gates and XOR fates respectively. and not use D flip-flop. The operating time of MULOP in the presented multiplier requires one unit time(clock time), and the operating time of MODOP using IPOP requires m unit times(clock times). Therefore total operating time is m+1 unit times(clock times). The cellular array parallel multiplier is simple and regular for the wire routing and have the properties of concurrency and modularity. Also, it is expansible for the multiplication of two polynomials in the finite fields with very large m.
https://doi.org/10.3745/KIPSTA.2004.11A.1.001 인용 PDF KSCI

A Study on the Hardware Architecture of Trinomial $GF(2^m)$ Multiplier (Trinomial $GF(2^m)$ 승산기의 하드웨어 구성에 관한 연구)

변기영;윤광섭
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.41 no.5
- /
- pp.29-36
- /
- 2004
This study focuses on the arithmetical methodology and hardware implementation of low-system-complexity multiplier over GF(2$^{m}$ ) using the trinomial of degree a The proposed parallel-in parallel-out operator is composed of MR, PP, and MS modules, each can be established using the regular array structure of AND and XOR gates. The proposed multiplier is composed of $m^2$ 2-input AND gates and $m^2$-1 2-input XOR gates, and the propagation delay is $T_{A}$+(1+[lo $g_2$$^{m}$ ]) $T_{x}$ . Comparison result of the related multipliers of GF(2$^{m}$ ) are shown by table, it reveals that our operator involve more regular and generalized then the others, and therefore well-suited for VLSI implementation. Moreover, our multiplier is more suitable for any other GF(2$^{m}$ ) operational applications.s.
PDF KSCI

Search Result 88, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)