• Title/Summary/Keyword: ECC병렬 처리

Search Result 7, Processing Time 0.025 seconds

High Speed and Robust Processor based on Parallelized Error Correcting Code Module (병렬화된 에러 보정 코드 모듈 기반 프로세서 속도 및 신뢰도 향상)

  • Kang, Myeong-jin;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.9
    • /
    • pp.1180-1186
    • /
    • 2020
  • One of the Embedded systems Tiny Processing Unit (TPU) usually acts in harsh environments like external shock or insufficient power. In these cases, data could be polluted, and cause critical problems. As a solution to data pollution, many embedded systems are using Error Correcting Code (ECC) to protect and restore data. However, ECC processing in TPU increases the overall processing time by increasing the time of instruction fetch which is the bottleneck. In this paper, we propose an architecture of parallelized ECC block to the reduce bottleneck of TPU. The proposed architecture results in the reduction of time 10% compared to the original model, although memory usage increased slightly. The test is evaluated with a matrix product that has various instructions. TPU with proposed parallelized ECC block shows 7% faster than the original TPU with ECC and was able to perform the proposed test accurately.

A Scalable ECC Processor for Elliptic Curve based Public-Key Cryptosystem (타원곡선 기반 공개키 암호 시스템 구현을 위한 Scalable ECC 프로세서)

  • Choi, Jun-Baek;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.8
    • /
    • pp.1095-1102
    • /
    • 2021
  • A scalable ECC architecture with high scalability and flexibility between performance and hardware complexity is proposed. For architectural scalability, a modular arithmetic unit based on a one-dimensional array of processing element (PE) that performs finite field operations on 32-bit words in parallel was implemented, and the number of PEs used can be determined in the range of 1 to 8 for circuit synthesis. A scalable algorithms for word-based Montgomery multiplication and Montgomery inversion were adopted. As a result of implementing scalable ECC processor (sECCP) using 180-nm CMOS technology, it was implemented with 100 kGEs and 8.8 kbits of RAM when NPE=1, and with 203 kGEs and 12.8 kbits of RAM when NPE=8. The performance of sECCP with NPE=1 and NPE=8 was analyzed to be 110 PSMs/sec and 610 PSMs/sec, respectively, on P256R elliptic curve when operating at 100 MHz clock.

Scalable multiplier and inversion unit on normal basis for ECC operation (ECC 연산을 위한 가변 연산 구조를 갖는 정규기저 곱셈기와 역원기)

  • 이찬호;이종호
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.12
    • /
    • pp.80-86
    • /
    • 2003
  • Elliptic curve cryptosystem(ECC) offers the highest security per bit among the known publick key system. The benefit of smaller key size makes ECC particularly attractive for embedded applications since its implementation requires less memory and processing power. In this paper, we propose a new multiplier structure with configurable output sizes and operation cycles. The number of output bits can be freely chosen in the new architecture with the performance-area trade-off depending on the application. Using the architecture, a 193-bit normal basis multiplier and inversion unit are designed in GF(2$^{m}$ ). It is implemented using HDL and 0.35${\mu}{\textrm}{m}$ CMOS technology and the operation is verified by simulation.

Parallel BCH Encoding/decoding Method and VLSI Design for Nonvolatile Memory (비휘발성 메모리를 위한 병렬 BCH 인코딩/디코딩 방법 및 VLSI 설계)

  • Lee, Sang-Hyuk;Baek, Kwang-Hyun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.5
    • /
    • pp.41-47
    • /
    • 2010
  • This paper has proposed parallel BCH, one of error correction coding methods which has been used to NAND flash memory for SSD(solid state disk). To alter error correction capability, the proposed design improved reliability on data block has higher error rate as used frequency increasingly. Decoding parallel process bit width is as two times as encoding parallel process bit width, that could reduce decoding processing time, accordingly resulting in one half reduction over conventional ECC.

VLSI Design of an Improved Structure of a $GF(2^m)$ Divider (확장성에 유리한 병렬 알고리즘 방식에 기반한 $GF(2^m)$나눗셈기의 VLSI 설계)

  • Moon San-Gook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.3
    • /
    • pp.633-637
    • /
    • 2005
  • In this contribution, we developed and improved an existing GF (Galois field) dividing algorithm by suggesting a novel architecture for a finite field divider, which is frequently required for the error correction applications and the security-related applications such as the Reed-Solomon code, elliptic curve encryption/ decryption, is proposed. We utilized the VHDL language to verify the design methodology, and implemented the architecture on an FPGA chip. We suggested the n-bit lookup table method to obtain the throughput of 2m/n cycles, where m is the order of the division polynomial and n is the number of the most significant lookup-bits. By doing this, we extracted the advantages in achieving both high-throughput and less cost of the gate areaon the chip. A pilot FPGA chip was implemented with the case of m=4, n=2. We successfully utilized the Altera's EP20K30ETC144-1 to exhibit the maximum operating clock frequency of 77 MHz.

A Scalable Montgomery Modular Multiplier (확장 가능형 몽고메리 모듈러 곱셈기)

  • Choi, Jun-Baek;Shin, Kyung-Wook
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.625-633
    • /
    • 2021
  • This paper describes a scalable architecture for flexible hardware implementation of Montgomery modular multiplication. Our scalable modular multiplier architecture, which is based on a one-dimensional array of processing elements (PEs), performs word parallel operation and allows us to adjust computational performance and hardware complexity depending on the number of PEs used, NPE. Based on the proposed architecture, we designed a scalable Montgomery modular multiplier (sMM) core supporting eight field sizes defined in SEC2. Synthesized with 180-nm CMOS cell library, our sMM core was implemented with 38,317 gate equivalents (GEs) and 139,390 GEs for NPE=1 and NPE=8, respectively. When operating with a 100 MHz clock, it was evaluated that 256-bit modular multiplications of 0.57 million times/sec for NPE=1 and 3.5 million times/sec for NPE=8 can be computed. Our sMM core has the advantage of enabling an optimized implementation by determining the number of PEs to be used in consideration of computational performance and hardware resources required in application fields, and it can be used as an IP (intellectual property) in scalable hardware design of elliptic curve cryptography (ECC).

A Hardware Implementation of the Underlying Field Arithmetic Processor based on Optimized Unit Operation Components for Elliptic Curve Cryptosystems (타원곡선을 암호시스템에 사용되는 최적단위 연산항을 기반으로 한 기저체 연산기의 하드웨어 구현)

  • Jo, Seong-Je;Kwon, Yong-Jin
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.88-95
    • /
    • 2002
  • In recent years, the security of hardware and software systems is one of the most essential factor of our safe network community. As elliptic Curve Cryptosystems proposed by N. Koblitz and V. Miller independently in 1985, require fewer bits for the same security as the existing cryptosystems, for example RSA, there is a net reduction in cost size, and time. In this thesis, we propose an efficient hardware architecture of underlying field arithmetic processor for Elliptic Curve Cryptosystems, and a very useful method for implementing the architecture, especially multiplicative inverse operator over GF$GF (2^m)$ onto FPGA and futhermore VLSI, where the method is based on optimized unit operation components. We optimize the arithmetic processor for speed so that it has a resonable number of gates to implement. The proposed architecture could be applied to any finite field $F_{2m}$. According to the simulation result, though the number of gates are increased by a factor of 8.8, the multiplication speed We optimize the arithmetic processor for speed so that it has a resonable number of gates to implement. The proposed architecture could be applied to any finite field $F_{2m}$. According to the simulation result, though the number of gates are increased by a factor of 8.8, the multiplication speed and inversion speed has been improved 150 times, 480 times respectively compared with the thesis presented by Sarwono Sutikno et al. [7]. The designed underlying arithmetic processor can be also applied for implementing other crypto-processor and various finite field applications.