Accelerating Symmetric and Asymmetric Cryptographic Algorithms with Register File Extension for Multi-words or Long-word Operation

다수 혹은 긴 워드 연산을 위한 레지스터 파일 확장을 통한 대칭 및 비대칭 암호화 알고리즘의 가속화

  • Lee Sang-Hoon (Department of Electronics and Computer Engineering, Korea University) ;
  • Choi Lynn (Department of Electronics Engineering, Korea University)
  • 이상훈 (고려대학교 전자컴퓨터공학과) ;
  • 최린 (고려대학교 전자공학과)
  • Published : 2006.03.01

Abstract

In this paper, we propose a new register file architecture called the Register File Extension for Multi-words or Long-word Operation (RFEMLO) to accelerate both symmetric and asymmetric cryptographic algorithms. Based on the idea that most of cryptographic algorithms heavily use multi-words or long-word operations, RFEMLO allows multiple contiguous registers to be specified as a single operand. Thus, a single instruction can specify a SIMD-style multi-word operation or a long-word operation. RFEMLO can be applied to general purpose processors by adding instruction set for multi-words or long-word operands and functional units for additional instruction set. To evaluate the performance of RFEMLO, we use Simplescalar/ARM 3.0 (with gcc 2.95.2) and run detailed simulations on various symmetric and asymmetric cryptographic algorithms. By applying RFEMLO, we could get maximum 62% and 70% reductions in the total instruction count of symmetric and asymmetric cryptographic algorithms respectively. Also, performance results show that a speedup of 1.4 to 2.6 can be obtained in symmetric cryptographic algorithms and a speedup of 2.5 to 3.3 can be obtained for asymmetric cryptographic algorithms when we apply RFEMLO to a processor with an in-order pipeline. We also found that RFEMLO can effectively improve the performance of these cryptographic algorithms with much less cost compared to issue-width increase available in Superscalar implementations. Moreover, the RFEMLO can also be applied to Superscalar processor, leading to additional 83% and 138% performance gain in symmetric and asymmetric cryptographic algorithms.

본 연구에서는 대칭 및 비대칭 암호화 알고리즘을 가속화하기 위해, 다수 혹은 긴 워드 연산을 위한 레지스터 파일 확장 구조 (Register File Extension for Multi-words or Long-word Operation: RFEMLO)라는 새로운 레지스터 파일 구조를 제안한다. 암호화 알고리즘은 긴 워드 피연산자에 대한 명령어를 통하여 가속화 할 수 있다는 점에 착안하여, RFEMLO는 하나의 레지스터 명을 통해 여러 개의 레지스터에 접근할 수 있도록 하여 여러 연산자에 대해 동일한 연산을 수행할 수 있도록 하거나, 여러 개의 레지스터를 하나의 데이터로 사용할 수 있게 한다. RFEMLO는 긴 워드 피연산자에 대한 명령어 집합의 추가와 이를 지원하는 기능 유닛을 추가함으로서 범용 프로세서에 적용할 수 있다. 제안된 하드웨어 구조와 명령어 집합의 효율성을 평가하기 위해 Simplescalar/ARM 3.0을 사용하여 대칭 및 비대칭의 다양한 암호화 알고리즘에 적용하였다. 실험 결과, RFEMLO을 적용한 순차적 파이프라인을 가진 프로세서에서 대칭 암호화 알고리즘의 경우 $40%{\sim}160%$의 성능 향상을, 비대칭 암호화 알고리즘의 경우 $150%{\sim}230%$의 높은 성능향상을 얻을 수 있었다. RFEMLO의 적용을 통한 성능 항상은 이슈 폭의 증가를 이용한 슈퍼스칼라 구현에 따른 성능 향상과 비교할 때, 훨씬 적은 하드웨어 비용으로 효과적인 성능 향상을 얻을 수 있음을 확인하였으며 슈퍼스칼라 프로세서에 RFEMLO를 적용하는 경우에도 대칭 암호화 알고리즘에서는 최대 83.6%, 비대칭 암호화 알고리즘에서는 최대 138.6%의 추가적인 성능향상을 얻을 수 있었다.

Keywords

References

  1. R. Atkinson. 'Security architecture for the internet protocol.' IETF Draft Architecture ipsec-arch-sec00, 1996
  2. Robert Moskowitz, 'What is a Virtual Private Network?' http://www.networkcomputing.com/905/905colmoskowitz.html
  3. The SSL Protocol, version 3.0, Netscape, Inc, http://home.netscape.com/eng/ ss13/draft302.txt,1999
  4. Lisa Wu, Chris Weaver, Todd Austin, 'CryptoManiac: a fast flexible architecture for secure communication', International Conference on Computer Architecture, Proceedings of the 28th annual International Symposium on Computer Architecture, pp. 110-119, 2001 https://doi.org/10.1109/ISCA.2001.937439
  5. Rainer Buchty, Nevin Heintze, Dino Oliva, 'Cryptonite - A Programmable Crypto Processor Architecture for High-Bandwidth Applications', International Conference on Architecture of Computing Systems, ARCS 2004, LNCS 2981, pp. 184-198, 2004
  6. J. Burke, J. McDonald, and T Austin. 'Architectural Support for Fast Symmetric-Key Cryptography'. Proceedings of ASPLOS, 2000 https://doi.org/10.1145/356989.357006
  7. Alfred J. Menezes, Paul C. van Oorschotand, Scott Al Vanstone, Handbook of Applied Cryptography, CRC press
  8. M. Arlitt, C. Williamson, 'Web server workload characterization: The search for invariants', Proceedings of the ACM SIGMETRICS '96 Conference, April, 1996 https://doi.org/10.1145/233013.233034
  9. Cristian Coarfa, Peter Druschel and Dan S. Wallach, 'Performance Analysis of TLS Web Servers', In Proceedings of The Ninth Network and Distributed System Security Symposium (NDSS 02), February, 2002
  10. Stephen Moore, 'Enhancing Security Performance Through IA-64 Architecture', Intel Corporation
  11. Hans Eberle, Sheueling Chang Shantz, Vipul Gupta, Nils Gura, 'Accelerating Next-generation Public-key Cryptography on General-purpose CPUs', Hot Chips 16, Stanford, Aug, 2004 https://doi.org/10.1109/MM.2005.24
  12. J. H. Hong, C. W. Wu, 'Radix-4 modular multiplication and exponentiation algorithms for the RSA public-key cryptosystern', Design Automation Conference (ASP-DAC 2000), pages 565-570, 2000 https://doi.org/10.1109/ASPDAC.2000.835164
  13. Subbarao Palacharla, Norman P. Jouppi, J. E. Smith, 'Complexity-Effective Superscalar Processors', In 24th International Symposium on Computer Architecture, pages 206-218, June 1997 https://doi.org/10.1145/264107.264201
  14. A. Satoh, K Takano, 'A Scalable Dual-Field Elliptic Curve Cryptographic Processor', IEEE Transactions on Computers, vol. 52, no.4, April 2003, pp 449-460 https://doi.org/10.1109/TC.2003.1190586
  15. E. Savas, A. F. Tenca, C. K. Koc, 'Dual-field multiplier architecture for cryptographic applications', Thirty-Seventh Asilomar Conference on Signals, Systems, and Computers, pp 374-378, IEEE Press, Pacific Grove, California, November 9-12, 2003 https://doi.org/10.1109/ACSSC.2003.1291938
  16. Johann Grobschadl, Guy- Armand Kamendje, 'Instruction set extension for fast elliptic curve cryptography over binary finite fields GF(2m)', IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP'03), June 2003 https://doi.org/10.1109/ASAP.2003.1212868
  17. J. Grobschadl, 'Instruction set extension for long integer modulo arithmetic on RISC-based smart cards', 14th Symposium on Computer Architecture and High Performance Computing (SCAB-PAD'02), October 2002 pp 13-19 https://doi.org/10.1109/CAHPC.2002.1180754
  18. P. S. Ahuja, D. W. Clark, and A. Rogers. 'The performance impact of incomplete bypassing in processor pipelines', In Proceedings of the 28th Annual International Symposium on Microarchitecture, 1995
  19. SimpleScalar Toolset ver. 3.0 http://www.simplescalar.com