DOI QR코드

DOI QR Code

A 521-bit high-performance modular multiplier using 3-way Toom-Cook multiplication and fast reduction algorithm

3-way Toom-Cook 곱셈과 고속 축약 알고리듬을 이용한 521-비트 고성능 모듈러 곱셈기

  • Yang, Hyeon-Jun (Department of Electronic Engineering, Kumoh National Institute of Technology) ;
  • Shin, Kyung-Wook (School of Electronic Engineering, Kumoh National Institute of Technology)
  • Received : 2021.10.13
  • Accepted : 2021.11.10
  • Published : 2021.12.31

Abstract

This paper describes a high-performance hardware implementation of modular multiplication used as a core operation in elliptic curve cryptography. A 521-bit high-performance modular multiplier for NIST P-521 curve was designed by adopting 3-way Toom-Cook integer multiplication and fast reduction algorithm. Considering the property of the 3-way Toom-Cook algorithm in which the result of integer multiplication is multiplied by 1/3, modular multiplication was implemented on the Toom-Cook domain where the operands were multiplied by 3. The modular multiplier was implemented in the xczu7ev FPGA device to verify its hardware operation, and hardware resources of 69,958 LUTs, 4,991 flip-flops, and 101 DSP blocks were used. The maximum operating frequency on the Zynq7 FPGA device was 50 MHz, and it was estimated that about 4.16 million modular multiplications per second could be achieved.

본 논문은 타원곡선 암호에 핵심 연산으로 사용되는 모듈러 곱셈의 고성능 하드웨어 구현에 대해 기술한다. NIST P-521 곡선에 적합한 521-비트 고성능 모듈러 곱셈기를 3-way Toom-Cook 정수 곱셈과 고속 축약 알고리듬을 적용하여 설계하였다. 정수곱셈 결과에 3이 곱해져 출력되는 3-way Toom-Cook 알고리듬의 속성을 고려하여, 피연산자에 1/3을 곱한 Toom-Cook 도메인 상에서 모듈러 곱셈이 연산되도록 구현하였다. 모듈러 곱셈기를 xczu7ev FPGA 디바이스에 구현하여 하드웨어 동작을 검증하였으며, 69,958개의 LUT와 4,991개의 플립플롭 그리고 101개의 DSP 블록의 하드웨어 자원이 사용되었다. Zynq7 FPGA 디바이스에서 최대 동작주파수는 50 MHz으로 예측되었으며, 초당 약 416만 번의 모듈러 곱셈을 연산할 수 있는 것으로 평가되었다.

Keywords

Acknowledgement

·This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2020R1I1A3A04038083) ·This paper was supported by Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea Government (MOTIE) (P0017011, HRD Program for Industrial Innovation) ·The authors are thankful to IDEC for EDA tool support.

References

  1. S. Sugiyama, H. Awano, and M. Ikeda, "Low Latency 256-bit Fp ECDSA Signature Generation Crypto Processor," IEICE Transaction on Fundamentals of Electronics, Communications and Computer Sciences, vol. E101-A, no. 12, pp. 2290-2296, Dec. 2018. DOI: 10.1587/transfun.E101.A.2290.
  2. M. Knezevic, V. Nikov, and P. Rombouts, "Low-latency ECDSA signature verification-A road toward safer traffic," IEEE Transaction on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 11, pp. 3257-3267, 2016. https://doi.org/10.1109/TVLSI.2016.2557965
  3. P. L. Montgomery, "Modular multiplication without trial division," Mathematics of Computation, vol. 44, no. 170, pp. 519-521, May. 1985. https://doi.org/10.1090/S0025-5718-1985-0777282-X
  4. A. Karatsuba and Y. Ofman, "Multiplication of many-digital numbers by automatic computers," Proceedings of the USSR Academy of Sciences, vol. 145, no. 2, pp. 293-294, 1962.
  5. A. L. Toom, "The complexity of a scheme of functional elements realizing the multiplication of integers," Soviet Math. Doklady, vol. 3, no. 4, pp. 714-716, 1963.
  6. S. A. Cook and S. O. Aanderaa, "On the minimum computation time of functions," Transaction of the American Mathematical Society, vol. 142, pp. 291-314, Aug. 1969. https://doi.org/10.2307/1995359
  7. S. Li and Z. Gu, "Lazy Reduction and Multi-Precision Division Based on Modular Reductions," 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Chengdu, pp. 407-410, 2018.
  8. P. Barrett, "Implementing the Rivest Shamirand Adleman public key encryption algorithm on a standard digital signal processor," Conference on the Theory and Application of Cryptographic Techniques, Springer, vol. 263, pp. 311-323, Aug. 1986.
  9. D. Hankerson, A. Menezes, and S. Vanstone, Guide to elliptic curve cryptography, Springer Science & Business Media, 2006.
  10. J. M. B. Mera, A. Karmakar, and I. Verbauwhed, "Timememory trade-off in Toom-Cook multiplication: an application to module-lattice based cryptography," International Association for Cryptologic Research (IACR) Transactions on Cryptographic Hardware and Embedded Systems, vol. 2020, no. 2, pp. 222-244, Mar. 2020. DOI: 10.13154/tches.v2020.i2.222-244.
  11. J. Ding, S. Li, and Z. Gu, "High-speed ECC processor over NIST prime fields applied with Toom-Cook multiplication," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 3, pp. 1003-1016, Mar. 2019. https://doi.org/10.1109/TCSI.2018.2878598
  12. M. Bodrato and A. Zanoni, "Integer and polynomial multiplication: Towards optimal black Toom-Cook matrices," Proceedings of the 2007 international symposium on Symbolic and algebraic computation, pp. 17-24, Jul./Aug. 2007.
  13. L. Chen, D. Moody, A. Regenscheid, and K. Randall, "Recommendations for discrete logarithm-based cryptography: elliptic curve domain parameters," Computer Security Recource Center, SP 800-186 (draft), Oct. 2019.
  14. J. Y. Choi and K. Y. Shin, "A High Performance Modular Multiplier for ECC," Journal of Institute of Korean Electrical and Electronics Engineers, vol. 24, no. 4, pp. 961-968, Dec. 2020.
  15. M. Islam, S. Hossain, Shahjalal, K. Hasan, and Y. M. Jang, "Area-Time Efficient Hardware Implementation of Modular Multiplication for Elliptic Curve Cryptography," IEEE Access, vol. 8, pp. 73898-73906, Apr. 2020. https://doi.org/10.1109/access.2020.2988379
  16. H. Alrimeih and D. Rakhmatov, "Pipelined modular multiplier supporting multiple standard prime fields," 2014 IEEE 25th International Conference on Application- Specific Systems, Architectures and Processors, Jun. 2014.