Search | Korea Science

Optimized Implementation of Scalable Multi-Precision Multiplication Method on RISC-V Processor for High-Speed Computation of Post-Quantum Cryptography (차세대 공개키 암호 고속 연산을 위한 RISC-V 프로세서 상에서의 확장 가능한 최적 곱셈 구현 기법)

Seo, Hwa-jeong;Kwon, Hyeok-dong;Jang, Kyoung-bae;Kim, Hyunjun
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.31 no.3
- /
- pp.473-480
- /
- 2021
To achieve the high-speed implementation of post-quantum cryptography, primitive operations should be tailored to the architecture of the target processor. In this paper, we present the optimized implementation of multiplier operation on RISC-V processor for post-quantum cryptography. Particularly, the column-wise multiplication algorithm is optimized with the primitive instruction of RISC-V processor, which improved the performance of 256-bit and 512-bit multiplication by 19% and 8% than previous works, respectively. Lastly, we suggest the instruction extension for the high-speed multiplication on the RISC-V processor.
https://doi.org/10.13089/JKIISC.2021.31.3.473 인용 PDF KSCI HTML

Adder-based Distributed Arithmetic DWT Processor Design (가산기-기반 분산연산 DWT 프로세서 설계)

김영진;장영진;이현수
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04a
- /
- pp.16-18
- /
- 2001
DWT(Discrete Wavelet Transform) 연산을 하는데 있어서, 가장 많은 연산을 수행하는 부분은 계수(Coefficient)값과 입력값의 내적 연산을 하는 부분이다. 내적 연산을 효율적으로 줄이기 위해서 시스톨릭, 파이프라인, 병렬구조등이 연구되었으나, 이러한 기존의 방법들은 내적 연산에 들어가는 곱셈의 수는 줄이지 못했다. 본 연구에서 가산기 기반 분산연산을 이용하여 곱셈연산을 제거하고, 동일한 연산과정을 공유함으로써 가산기의 수를 최대한 줄일 수 있었다. 또한, 한 개의 1-레벨 분해 모듈을 재사용하기 위해서 스케줄링을 사용하였다. 그 결과 기존의 구조보다 게이트 수를 50%이상 줄일 수 있었으며, 속도의 향상을 얻을 수 있었다.

Low-area FFT Processor Structure using $Radix-4^2$ Algorithm ($Radix-4^2$알고리즘을 사용한 저면적 FFT 프로세서 구조)

Kim, Han-Jin;Jang, Young-Beom
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.49 no.3
- /
- pp.8-14
- /
- 2012
In this paper, a low-area FFT structure using $Radix-4^2$ algorithm is proposed. The large point FFT structure consists of cascade connection of the many stages. In implementation of large point FFT using $Radix-4^2$ algorithm, stages which number of different coefficients are only 3 appear in every 2 stages. For example, in the 4096-point FFT, the stages that number of different coefficients are 3 appear in stage 1, 3, and 5 among 6 stages. Multiplication block area of these 3 stages can be reduced using CSD(Canonic Signed Digit) and common sub-expression sharing techniques. Using the proposed structure, the 256-point FFT is implemented with the Verilog-HDL coding and synthesized by $1.971mm^2$ cell area in tsmc $0.18{\mu}m$CMOS library. This result shows 23% cell area reduction compared with the conventional structure.
PDF KSCI

An analysis of the connections of mathematical thinking for multiplicative structures by second, fourth, and sixth graders (곱셈적 구조에 대한 2, 4, 6학년 학생들의 수학적 사고의 연결성 분석)

Kim, YuKyung;Pang, JeongSuk
- The Mathematical Education
- /
- v.53 no.1
- /
- pp.57-73
- /
- 2014
This study investigated the connections of mathematical thinking of students at the second, fourth, and sixth grades with regard to multiplication, fraction, and proportion, all of which have multiplicative structures. A paper-and-pencil test and subsequent interviews were conducted. The results showed that mathematical thinking including vertical thinking and relational thinking was commonly involved in multiplication, fraction, and proportion. On one hand, the insufficient understanding of preceding concepts had negative impact on learning subsequent concepts. On the other hand, learning the succeeding concepts helped students solve the problems related to the preceding concepts. By analyzing the connections between the preceding concepts and the succeeding concepts, this study provides instructional implications of teaching multiplication, fraction, and proportion.
https://doi.org/10.7468/mathedu.2014.53.1.57 인용 PDF KSCI

A Scalar Multiplication Method and its Hardware with resistance to SPA(Simple Power Analysis) (SPA에 견디는 스칼라 곱셈 방법과 하드웨어)

윤중철;정석원;임종인
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.13 no.3
- /
- pp.65-70
- /
- 2003
In this paper, we propose a scalar multiplication method and its hardware architecture which is resistant to SPA while its computation speed is faster than Colon's. There were SPA-resistant scalar multiplication method which has performance problem. Due to this reason, the research about an efficient SPA-resistant scalar multiplication is one of important topics. The proposed architecture resists to SPA and is faster than Colon's method under the assumption that Colon's and the proposed method use same fmite field arithmetic units(multiplier and inverter). With n-bit scalar multiple, the computation cycle of the proposed is 2n·(Inversion cycle)+3(Aultiplication cycle).
https://doi.org/10.13089/JKIISC.2003.13.3.65 인용 PDF KSCI HTML

Full-Custom Design of a Compact 17x-17b Multiplier and its Efficient Test Methodology (풀커스텀(full-custom)방식의 17x-17b 곱셈기의 설계와 효율적인 테스트)

문상국;문병인;이용석
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.26 no.3B
- /
- pp.362-368
- /
- 2001
본 논문에서는 두 개의 17비트 오퍼랜드를 radix-4 Booths 알고리즘을 이용하여 곱셈 연산을 수행하는 곱셈기를 설계하고 효율적인 풀커스팀 디자인에 대한 테스트 방법을 제안하였다. 클럭 속도를 빠르게 하기 위하여 2단파이프라인 구조로 설계하고 규칙적인 레이아웃을 위해 4:2 CSA(Carry Save Adder)를 사용하였다. 회로는 LG 반도체의 0.6-um 3-Metal N-well CMOS 공정을 사용하여 칩으로 제작되었다. 새로운 개념의 모듈레벨 고착 고장 모델을 제안하였고 제안한 테스트 방법을 사용하여 관찰해야 하는 노드의 수를 약 88% 줄여 효율적인 고장 시뮬레이션을 수행하였다. 설계된 곱셈기는 9115개의 트랜지스터로 구성되며 코어 부분의 레이아웃 면적은 약 1135*1545 um2 이다. 제작된 칩은 전원접압 5V에서 약 24MHz의 클럭 주파수로 동작한다.
PDF

Study on Implementation of a High-Speed Montgomery Modular Exponentiator (고속의 몽고메리 모듈라 멱승기의 구현에 관한 연구)

Kim, In-Seop;Kim, Young-Chul
- Proceedings of the Korea Information Processing Society Conference
- /
- 2002.11b
- /
- pp.901-904
- /
- 2002
정보의 암호화와 인증, 디지털 서명등에 효율적인 공개키 암호 시스템의 주 연산은 모듈라 멱승 연산이며 이는 모듈라 곱셈의 연속적인 반복 수행으로 표현될 수 있다. 본 논문에서는 Montgomery 모듈라 곱셈 알고리즘을 사용하여 모듈라 곱셈을 효율적으로 수행하기 위한 모듈라 멱승 연산기를 구현하였으며 Montgomery 모듈라 곱셈시 발생하는 케리 진파 문제를 해결하기 위하여 CPA을 대신하는 CSA를 사용함으로써 멱승 연산시 발생하는 지연시간을 최소화시키는 결과가 얻어짐을 보였다. 본 논문에서는 Montgomery 모듈라 멱승 연산기 구현을 위하여 VHDL 구조적 모델링을 통하여 Synopsys사의 VSS와 Design analyzer를 이용한 논리 합성을 하였고 Mentor Graphics사 Model sim 및 Xilinx사 Design manager의 FPGA 시뮬레이션을 수행하여 성능을 검증 하였다.
PDF

Design of fast 16-bit multiplier with $0.35\mu m $ CMOS technology (fullcustom $0.35\mu m $ CMOS 공정을 이용한 16*16 bit 고속 승산기의 설계)

박현규;신현철;김종진
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2000.12a
- /
- pp.229-232
- /
- 2000
각종 범용 컴퓨터 및 디지탈 신호처리에서 중요한 역할을 하는 16비트 정수형, 2의 보수 형태의 곱셈연산을 수행하기 위한 고속 승산기구조를 설계하고 시뮬레이션 하였다. 부분곱을 합하는 부분은 일반적으로 전체 곱셈기 처리 지연시간의 절반정도를 차지하므로 이 부분의 설계방법이 곱셈기의 궁극적인 속도향상에 직접적인 영향을 미친다. 부분곱의 개수를 줄이기 위하여 Booth encoder를 사용하였고, partial product(부분곱)의 덧셈시간을 줄이기 위하여 4:2 CSA(can save adder)와 3:2 CSA로 CSA tree를 구성 하였으며, 최종결과는 carry look- ahead tree로 얻어진다. Hyundai CMOS 0.35$\mu\textrm{m}$ 1-poly 4-metal 공정으로 layout하여 설계하였으며, 곱셈시간은 2.7ns(tipical case)이하로 측정되었다.
PDF

Design of a 64×64-Bit Modified Booth Multiplier Using Current-Mode CMOS Quarternary Logic Circuits (전류모드 CMOS 4치 논리회로를 이용한 64×64-비트 변형된 Booth 곱셈기 설계)

Kim, Jeong-Beom
- The KIPS Transactions:PartA
- /
- v.14A no.4
- /
- pp.203-208
- /
- 2007
This paper proposes a $64{\times}64$ Modified Booth multiplier using CMOS multi-valued logic circuits. The multiplier based on the radix-4 algorithm is designed with current mode CMOS quaternary logic circuits. Designed multiplier is reduced the transistor count by 64.4% compared with the voltage mode binary multiplier. The multiplier is designed with Samsung $0.35{\mu}m$ standard CMOS process at a 3.3V supply voltage and unit current $5{\mu}m$. The validity and effectiveness are verified through the HSPICE simulation. The voltage mode binary multiplier is achieved the occupied area of $7.5{\times}9.4mm^2$, the maximum propagation delay time of 9.8ns and the average power consumption of 45.2mW. This multiplier is achieved the maximum propagation delay time of 11.9ns and the average power consumption of 49.7mW. The designed multiplier is reduced the occupied area by 42.5% compared with the voltage mode binary multiplier.
https://doi.org/10.3745/KIPSTA.2007.14-A.4.203 인용 PDF KSCI

Low Space Complexity Bit-Parallel Shifted Polynomial Basis Multipliers using Irreducible Trinomials (삼항 기약다항식 기반의 저면적 Shifted Polynomial Basis 비트-병렬 곱셈기)

Chang, Nam-Su;Kim, Chang-Han
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.20 no.5
- /
- pp.11-22
- /
- 2010
Recently, Fan and Dai introduced a Shifted Polynomial Basis and construct a non-pipeline bit-parallel multiplier for $F_{2^n}$. As the name implies, the SPB is obtained by multiplying the polynomial basis 1, ${\alpha}$, ${\cdots}$, ${\alpha}^{n-1}$ by ${\alpha}^{-\upsilon}$. Therefore, it is easy to transform the elements PB and SPB representations. After, based on the Modified Shifted Polynomial Basis(MSPB), SPB bit-parallel Mastrovito type I and type II multipliers for all irreducible trinomials are presented. In this paper, we present a bit-parallel architecture to multiply in SPB. This multiplier have a space complexity efficient than all previously presented architecture when n ${\neq}$ 2k. The proposed multiplier has more efficient space complexity than the best-result when 1 ${\leq}$ k ${\leq}$ (n+1)/3. Also, when (n+2)/3 ${\leq}$ k < n/2 the proposed multiplier has more efficient space complexity than the best-result except for some cases.
https://doi.org/10.13089/JKIISC.2010.20.5.11 인용 PDF KSCI HTML

Search Result 342, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)