Search | Korea Science

Bit-slice Modular multiplication algorithm (비트 슬라이스 모듈러 곱셈 알고리즘)

류동렬;조경록;유영갑
- The Journal of Information Technology
- /
- v.3 no.1
- /
- pp.61-72
- /
- 2000
In this paper, we propose a bit-sliced modular multiplication algorithm and a bit-sliced modular multiplier design meeting the increasing crypto-key size for RSA public key cryptosystem. The proposed bit-sliced modular multiplication algorithm was designed by modifying the Walter's algorithm. The bit-sliced modular multiplier is easy to expand to process large size operands, and can be immediately applied to RSA public key cryptosystem.
PDF

Radix-2 Booth-based Variable Precision Multiplier for Lightweight CNN Accelerators (경량 CNN 가속기를 위한 Radix-2 Booth 기반 가변 정밀도 곱셈기)

Guem, Duck-Hyun;Jeon, Seung-Jin;Choi, Jae-Young;Kim, Ji-Hyeok;Kim, Sunhee
- Proceedings of the Korea Information Processing Society Conference
- /
- 2022.05a
- /
- pp.494-496
- /
- 2022
엣지 디바이스에서 딥러닝을 활용하기 위하여 CNN 경량화 연구들이 진행되고 있다. 경량 CNN 은 대부분 고정 소수점을 사용하며, 계층에 따라 정밀도는 달라진다. 본 논문에서는 경량 CNN 을 지원하기 위하여, 사용 계층에 따라 정밀도를 선택할 수 있는 가변 정밀도 곱셈기를 제안한다. 제안하는 가변 정밀도 곱셈기는 낮은 정밀도 곱셈기를 병합하는 구조로, 정밀도가 낮을 때는 병렬 처리를 통해 효율을 높인다. 제안하는 곱셈기를 Verilog HDL로 설계하고 ModelSim 에서 동작을 확인하였다. 설계된 곱셈기는 계층별로 정밀도가 다른 CNN 가속기에서 효율적으로 적용될 것으로 기대된다.
https://doi.org/10.3745/PKIPS.y2022m05a.494 인용 PDF

Design of A High Performance 1-D Discrete Wavelet Transform Filter Using Pipelined Architecture (파이프라인 구조를 이용한 고성능 1 차원 이산 웨이블렛 변환 필터 설계)

Park, Tae-Geun;Song, Chang-Joo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2001.10a
- /
- pp.711-714
- /
- 2001
본 논문에서는 파이프라인 구조를 이용하여 고성능 1 차원 이산 웨이블렛 변환 필터를 설계하였다. 각 레벨에서 입력이 다운샘플링(downsampling, decimation)되므로 각 레벨의 하드웨어를 폴딩(folding) 기법을 이용하여 곱셈기와 덧셈기를 공유함으로써 복잡도를 개선하였다. 즉, 제안한 구조에서는 레벨 2 와 레벨 3 에서 폴딩된 구조의 C.S.R(Circular Shift Register)곱셈기와 덧셈기를 사용함으로써 하드웨어 효율(hardware utilization)을 각 레벨에서 100%로 높일 수 있다. 또한, 홀수와 짝수의 샘플을 병렬로 입력함으로써 단일 입력의 시스템과 비교할 때, 동일 시간에 병렬화 만큼의 이득을 얻을 수 있었고, 필터 계수는 미러 필터(mirror filter)의 특성을 이용하여 쳐대한 고역 필터(high pass filter)와 저역 필터(low pass filter)의 계수들을 공유함으로써 곱셈기와 덧셈기의 수를 반으로 줄였다. 그리고 임계 경로(critical path)를 줄이기 위한 파이프라인 레지스터를 삽입하여 고성능 시스템을 구현하였다.
PDF

Low-Cost Elliptic Curve Cryptography Processor Based On Multi-Segment Multiplication (멀티 세그먼트 곱셈 기반 저비용 타원곡선 암호 프로세서)

LEE Dong-Ho
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.42 no.8 s.338
- /
- pp.15-26
- /
- 2005
In this paper, we propose an efficient $GF(2^m)$ multi-segment multiplier architecture and study its application to elliptic curve cryptography processors. The multi-segment based ECC datapath has a very small combinational multiplier to compute partial products, most of its internal data buses are word-sized, and it has only a single m bit multiplexer and a single m bit register. Hence, the resource requirements of the proposed ECC datapath can be minimized as the segment number increases and word-size is decreased. Hence, as compared to the ECC processor based on digit-serial multiplication, the proposed ECC datapath is more efficient in resource usage. The resource requirement of ECC Processor implementation depends not only on the number of basic hardware components but also on the complexity of interconnection among them. To show the realistic area efficiency of proposed ECC processors, we implemented both the ECC processors based on the proposed multi-segment multiplication and digit serial multiplication and compared their FPGA resource usages. The experimental results show that the Proposed multi-segment multiplication method allows to implement ECC coprocessors, requiring about half of FPGA resources as compared to digit serial multiplication.
PDF KSCI

New High Speed Parallel Multiplier for Real Time Multimedia Systems (실시간 멀티미디어 시스템을 위한 새로운 고속 병렬곱셈기)

Cho, Byung-Lok;Lee, Mike-Myung-Ok
- The KIPS Transactions:PartA
- /
- v.10A no.6
- /
- pp.671-676
- /
- 2003
In this paper, we proposed a new First Partial product Addition (FPA) architecture with new compressor (or parallel counter) to CSA tree built in the process of adding partial product for improving speed in the fast parallel multiplier to improve the speed of calculating partial product by about 20% compared with existing parallel counter using full Adder. The new circuit reduces the CLA bit finding final sum by N/2 using the novel FPA architecture. A 5.14nS of multiplication speed of the $16{\times}16$ multiplier is obtained using $0.25\mu\textrm{m}$ CMOS technology. The architecture of the multiplier is easily opted for pipeline design and demonstrates high speed performance.
https://doi.org/10.3745/KIPSTA.2003.10A.6.671 인용 PDF KSCI

Hardware Design of Efficient Montgomery Multiplier for Low Area RSA (저면적 RSA를 위한 효율적인 Montgomery 곱셈기 하드웨어 설계)

Nti, Richard B.;Ryoo, Kwangki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2017.10a
- /
- pp.575-577
- /
- 2017
In public key cryptography such as RSA, modular exponentiation is the most time-consuming operation. RSA's modular exponentiation can be computed by repeated modular multiplication. To attain high efficiency for RSA, fast modular multiplication algorithms have been proposed to speed up decryption/encryption. Montgomery multiplication is limited by the carry propagation delay from the addition of long operands. In this paper, we propose a hardware structure that reduces the area of the Montgomery multiplication implementation for lightweight applications of RSA. Experimental results showed that the new design can achieve higher performance and reduce hardware area. A frequency of 884.9MHz and 250MHz were achieved with 84K and 56K gates respectively using the 90nm technology.
PDF

Characteristic analysis of Modular Multipliers and Squarers for GF($2^m$) (유한 필드 GF($2^m$)상의 모듈러 곱셈기 및 제곱기 특성 분석)

한상덕;김창훈;홍춘표
- Journal of Korea Society of Industrial Information Systems
- /
- v.7 no.5
- /
- pp.167-174
- /
- 2002
This paper analyzes the characteristics of three multipliers and squarers in finite fields GF(2/sup m/) from the point of view of processing time and area complexity. First, we analyze structures of three multipliers and squarers: 1) Systolic array structure, 2), LFSR structure, and 3) CA structure. To make performance analysis, each multiplier and squarer was modeled in VHDL and was synthesized for FPGA implementation. The simulation results show that CA structure is the best from the point view of processing time, and LFSR structure is the best from the point of view of area complexity.
PDF

Efficient Architectures for Modular Exponentiation Using Montgomery Multiplier (Montgomery 곱셈기를 이용한 효율적인 모듈라 멱승기 구조)

하재철;문상재
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.11 no.5
- /
- pp.63-74
- /
- 2001
Modular exponentiation is an essential operation required for implementations of most public key cryptosystems. This paper presents two architectures for modular exponentiation using the Montgomery modular multiplication algorithm combined with two binary exponentiation methods, L-R(Left to Left) algorithms. The proposed architectures make use of MUXes for efficient pre-computation and post-computation in Montgomery\`s algorithm. For an n-bit modulus, if mulitplication with m carry processing clocks can be done (n+m) clocks, the L-R type design requires (1.5n+5)(n+m) clocks on average for an exponentiation. The R-L type design takes (n+4)(n+m) clocks in the worst case.
https://doi.org/10.13089/JKIISC.2001.11.5.63 인용 PDF HTML

An Efficient Integer Division Algorithm for High Speed FPGA (고속 FPGA 구현에 적합한 효율적인 정수 나눗셈 알고리즘)

Hong, Seung-Mo;Kim, Chong-Hoon
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.44 no.2
- /
- pp.62-68
- /
- 2007
This paper proposes an efficient integer division algorithm for high speed FPGAs' which support built-in RAMs' and multipliers. The integer division algorithm is iterative with RAM-based LUT and multipliers, which minimizes the usage of logic fabric and connection resources. Compared with some popular division algorithms such as division by subtraction or division by multiply-subtraction, the number of iteration is much smaller, so that very low latency can be achieved with pipelined implementations. We have implemented our algorithm in the Xilinx virtex-4 FPGA with VHDL coding and have achieved 300MSPS data rate in 17bit integer division. The algorithm used less than 1/6 of logic slices, 1/4 of the built-in multiply-accumulation units, and 1/3 of the latencies compared with other popular algorithms.
PDF KSCI

Design of the Adaptive Systolic Array Architecture for Efficient Sparse Matrix Multiplication (희소 행렬 곱셈을 효율적으로 수행하기 위한 유동적 시스톨릭 어레이 구조 설계)

Seo, Juwon;Kong, Joonho
- Proceedings of the Korea Information Processing Society Conference
- /
- 2022.11a
- /
- pp.24-26
- /
- 2022
시스톨릭 어레이는 DNN training 등 인공지능 연산의 대부분을 차지하는 행렬 곱셈을 수행하기 위한 하드웨어 구조로 많이 사용되지만, sparsity 가 높은 행렬을 연산할 때 불필요한 동작으로 인해 효율성이 크게 떨어진다. 본 논문에서 제안된 유동적 시스톨릭 어레이는 matrix condensing, weight switching, 그리고 direct output path 의 방법과 구조를 통해 sparsity 가 높은 행렬 곱셈의 수행 사이클을 줄일 수 있다. 시뮬레이션을 통해 기존 시스톨릭 어레이와 유동적 시스톨릭 어레이의 성능을 비교하였으며 8×8, 16×16, 32×32 의 크기를 가진 행렬을 동일 크기의 시스톨릭 어레이로 연산하였을 때 필요 사이클 수를 최대 12 사이클 절감할 수 있는 것을 확인하였다.
https://doi.org/10.3745/PKIPS.y2022m11a.24 인용 PDF

Search Result 342, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)