• Title/Summary/Keyword: standard multiplication algorithm

Search Result 50, Processing Time 0.025 seconds

Implementation of a pipelined Scalar Multiplier using Extended Euclid Algorithm for Elliptic Curve Cryptography(ECC) (확장 유클리드 알고리즘을 이용한 파이프라인 구조의 타원곡선 암호용 스칼라 곱셈기 구현)

  • 김종만;김영필;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.11 no.5
    • /
    • pp.17-30
    • /
    • 2001
  • In this paper, we implemented a scalar multiplier needed at an elliptic curve cryptosystem over standard basis in $GF(2^{163})$. The scalar multiplier consists of a radix-16 finite field serial multiplier and a finite field inverter with some control logics. The main contribution is to develop a new fast finite field inverter, which made it possible to avoid time consuming iterations of finite field multiplication. We used an algorithmic transformation technique to obtain a data-independent computational structure of the Extended Euclid GCD algorithm. The finite field multiplier and inverter shown in this paper have regular structure so that they can be easily extended to larger word size. Moreover they can achieve 100% throughput using the pipelining. Our new scalar multiplier is synthesized using Hyundai Electronics 0.6$\mu\textrm{m}$ CMOS library, and maximum operating frequency is estimated about 140MHz. The resulting data processing performance is 64Kbps, that is it takes 2.53ms to process a 163-bit data frame. We assure that this performance is enough to be used for digital signature, encryption & decryption and key exchange in real time embedded-processor environments.

A Study on Performance Improvement of Non-Profiling Based Power Analysis Attack against CRYSTALS-Dilithium (CRYSTALS-Dilithium 대상 비프로파일링 기반 전력 분석 공격 성능 개선 연구)

  • Sechang Jang;Minjong Lee;Hyoju Kang;Jaecheol Ha
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.1
    • /
    • pp.33-43
    • /
    • 2023
  • The National Institute of Standards and Technology (NIST), which is working on the Post-Quantum Cryptography (PQC) standardization project, announced four algorithms that have been finalized for standardization. In this paper, we demonstrate through experiments that private keys can be exposed by Correlation Power Analysis (CPA) and Differential Deep Learning Analysis (DDLA) attacks on polynomial coefficient-wise multiplication algorithms that operate in the process of generating signatures using CRYSTALS-Dilithium algorithm. As a result of the experiment on ARM-Cortex-M4, we succeeded in recovering the private key coefficient using CPA or DDLA attacks. In particular, when StandardScaler preprocessing and continuous wavelet transform applied power traces were used in the DDLA attack, the minimum number of power traces required for attacks is reduced and the Normalized Maximum Margines (NMM) value increased by about 3 times. Conseqently, the proposed methods significantly improves the attack performance.

A Study on the Parallel Multiplier over $GF(3^m)$ Using AOTP (AOTP를 적용한 $GF(3^m)$ 상의 병렬승산기 설계에 관한 연구)

  • Han, Sung-Il;Hwang, Jong-Hak
    • Journal of IKEEE
    • /
    • v.8 no.2 s.15
    • /
    • pp.172-180
    • /
    • 2004
  • In this paper, a parallel Input/Output modulo multiplier, which is applied to AOTP(All One or Two Polynomials) multiplicative algorithm over $GF(3^m)$, has been proposed using neuron-MOS Down-literal circuit on voltage mode. The three-valued input of the proposed multiplier is modulated by using neuron-MOS Down-literal circuit and the multiplication and Addition gates are implemented by the selecting of the three-valued input signals transformed by the module. The proposed circuits are simulated with the electrical parameter of a standard $0.35{\mu}m$CMOS N-well doubly-poly four-metal technology and a single +3V supply voltage. In the simulation result, the multiplier shows 4 uW power consumption and 3 MHzsampling rate and maintains output voltage level in ${\pm}0.1V$.

  • PDF

High-performance Pipeline Architecture for Modified Booth Multipliers (Modified Booth 곱셈기를 위한 고성능 파이프라인 구조)

  • Kim, Soo-Jin;Cho, Kyeong-Soon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.12
    • /
    • pp.36-42
    • /
    • 2009
  • This paper proposes the high-performance pipeline architecture for modified Booth multipliers. The proposed multiplier circuits are based on modified Booth algorithm and pipeline architecture which are the most widely used techniques to accelerate the multiplication speed. In order to implement the optimally pipelined multipliers, many kinds of experiments have been conducted. The experimental results show that the speed improvement gain exceeds the area penalty and this trend is manifested as the number of pipeline stages increases. It is also important to insert the pipeline registers at the proper positions. We described the proposed modified Booth multiplier circuits in Verilog HDL and synthesized the gate-level circuits using 0.13um standard cell library. The resultant multiplier circuits show better performance than others. Since they operate at GHz ranges, they can be used in the application systems requiring extremely high performance such as optical communication systems.

High Performance Elliptic Curve Cryptographic Processor for $GF(2^m)$ ($GF(2^m)$의 고속 타원곡선 암호 프로세서)

  • Kim, Chang-Hoon;Kim, Tae-Ho;Hong, Chun-Pyo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.3
    • /
    • pp.113-123
    • /
    • 2007
  • This paper presents a high-performance elliptic curve cryptographic processor over $GF(2^m)$. The proposed design adopts Lopez-Dahab Montgomery algorithm for elliptic curve point multiplication and uses Gaussian normal basis for $GF(2^m)$ field arithmetic operations. We select m=163 which is the smallest value among five recommended $GF(2^m)$ field sizes by NIST and it is Gaussian normal basis of type 4. The proposed elliptic curve cryptographic processor consists of host interface, data memory, instruction memory, and control. We implement the proposed design using Xilinx XCV2000E FPGA device. Based on the FPGA implementation results, we can see that our design is 2.6 times faster and requires significantly less hardware resources compared with the previously proposed best hardware implementation.

Design of high-speed RSA processor based on radix-4 Montgomery multiplier (래딕스-4 몽고메리 곱셈기 기반의 고속 RSA 연산기 설계)

  • Koo, Bon-Seok;Ryu, Gwon-Ho;Chang, Tae-Joo;Lee, Sang-Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.17 no.6
    • /
    • pp.29-39
    • /
    • 2007
  • RSA is one of the most popular public-key crypto-system in various applications. This paper addresses a high-speed RSA crypto-processor with modified radix-4 modular multiplication algorithm and Chinese Remainder Theorem(CRT) using Carry Save Adder(CSA). Our design takes 0.84M clock cycles for a 1024-bit modular exponentiation and 0.25M cycles for a 512-bit exponentiations. With 0.18um standard cell library, the processor achieves 365Kbps for a 1024-bit exponentiation and 1,233Kbps for two 512-bit exponentiations at a 300MHz clock rate.

A Lightweight Hardware Accelerator for Public-Key Cryptography (공개키 암호 구현을 위한 경량 하드웨어 가속기)

  • Sung, Byung-Yoon;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1609-1617
    • /
    • 2019
  • Described in this paper is a design of hardware accelerator for implementing public-key cryptographic protocols (PKCPs) based on Elliptic Curve Cryptography (ECC) and RSA. It supports five elliptic curves (ECs) over GF(p) and three key lengths of RSA that are defined by NIST standard. It was designed to support four point operations over ECs and six modular arithmetic operations, making it suitable for hardware implementation of ECC- and RSA-based PKCPs. In order to achieve small-area implementation, a finite field arithmetic circuit was designed with 32-bit data-path, and it adopted word-based Montgomery multiplication algorithm, the Jacobian coordinate system for EC point operations, and the Fermat's little theorem for modular multiplicative inverse. The hardware operation was verified with FPGA device by implementing EC-DH key exchange protocol and RSA operations. It occupied 20,800 gate equivalents and 28 kbits of RAM at 50 MHz clock frequency with 180-nm CMOS cell library, and 1,503 slices and 2 BRAMs in Virtex-5 FPGA device.

Implementation of High-radix Modular Exponentiator for RSA using CRT (CRT를 이용한 하이래딕스 RSA 모듈로 멱승 처리기의 구현)

  • 이석용;김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.81-93
    • /
    • 2000
  • In a methodological approach to improve the processing performance of modulo exponentiation which is the primary arithmetic in RSA crypto algorithm, we present a new RSA hardware architecture based on high-radix modulo multiplication and CRT(Chinese Remainder Theorem). By implementing the modulo multiplier using radix-16 arithmetic, we reduced the number of PE(Processing Element)s by quarter comparing to the binary arithmetic scheme. This leads to having the number of clock cycles and the delay of pipelining flip-flops be reduced by quarter respectively. Because the receiver knows p and q, factors of N, it is possible to apply the CRT to the decryption process. To use CRT, we made two s/2-bit multipliers operating in parallel at decryption, which accomplished 4 times faster performance than when not using the CRT. In encryption phase, the two s/2-bit multipliers can be connected to make a s-bit linear multiplier for the s-bit arithmetic operation. We limited the encryption exponent size up to 17-bit to maintain high speed, We implemented a linear array modulo multiplier by projecting horizontally the DG of Montgomery algorithm. The H/W proposed here performs encryption with 15Mbps bit-rate and decryption with 1.22Mbps, when estimated with reference to Samsung 0.5um CMOS Standard Cell Library, which is the fastest among the publications at present.

A Folksonomy Ranking Framework: A Semantic Graph-based Approach (폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근)

  • Park, Hyun-Jung;Rho, Sang-Kyu
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.89-116
    • /
    • 2011
  • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.

Implementation of RSA modular exponentiator using Division Chain (나눗셈 체인을 이용한 RSA 모듈로 멱승기의 구현)

  • 김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.21-34
    • /
    • 2002
  • In this paper we propos a new hardware architecture of modular exponentiation using a division chain method which has been proposed in (2). Modular exponentiation using the division chain is performed by receding an exponent E as a mixed form of multiplication and addition with divisors d=2 or $d=2^I +1$ and respective remainders r. This calculates the modular exponentiation in about $1.4log_2$E multiplications on average which is much less iterations than $2log_2$E of conventional Binary Method. We designed a linear systolic array multiplier with pipelining and used a horizontal projection on its data dependence graph. So, for k-bit key, two k-bit data frames can be inputted simultaneously and two modular multipliers, each consisting of k/2+3 PE(Processing Element)s, can operate in parallel to accomplish 100% throughput. We propose a new encoding scheme to represent divisors and remainders of the division chain to keep regularity of the data path. When it is synthesized to ASIC using Samsung 0.5 um CMOS standard cell library, the critical path delay is 4.24ns, and resulting performance is estimated to be abort 140 Kbps for a 1024-bit data frame at 200Mhz clock In decryption process, the speed can be enhanced to 560kbps by using CRT(Chinese Remainder Theorem). Futhermore, to satisfy real time requirements we can choose small public exponent E, such as 3,17 or $2^{16} +1$, in encryption and verification process. in which case the performance can reach 7.3Mbps.