Search | Korea Science

Low-Power Multiplication Processing Element Hardware to Support Parallel Convolutional Neural Network Processing (합성곱 신경망 병렬 연산처리를 지원하는 저전력 곱셈 프로세싱 엘리먼트 설계)

Eunpyoung Park;Jongsu Park
- Journal of Platform Technology
- /
- v.12 no.2
- /
- pp.58-63
- /
- 2024
CNNs tend to take a long time to learn and consume a lot of power due to lack of system resources with many data processing units when there are repetitive handles that do not have high performance in the image field. In this paper, we propose a handling method based on a low-power bus that can increase the exchange rate of multipliers and multiplicands within the convolution mixer, which is a tendency activity that occurs when a convolution mixer has multiplication, which is the core element of combination. Convolutional neural networks have proprietary low-power shared processor support and the design was implemented on an Intel DE1-SoC FPGA board using Verilog-HDL. The experiments validated the performance by comparing it with the exchange rate of the multiplier originally proposed by Shen on MNIST's numeric image database.
PDF

Design of Systolic Multiplier/Squarer over Finite Field GF($2^m$) (유한 필드 GF($2^m$)상의 시스톨릭 곱셈기/제곱기 설계)

Yu, Gi-Yeong;Kim, Jeong-Jun
- Journal of KIISE:Computer Systems and Theory
- /
- v.28 no.6
- /
- pp.289-300
- /
- 2001
본 논문에서는 유한 필드 GF(2$_{m}$ ) 상에서 모듈러 곱셈 A($\chi$)B($\chi$) mod P($\chi$)을 수행하는 새로운 선형 문제-크기(full-size) 시스톨릭 어레이 구조인 LSB-first 곱셈기를 제안한다. 피연산자 B($\chi$)의 LSB(least significant bit)를 먼저 사용하는 LSB-first 모듈러 곱셈 알고리즘으로부터 새로운 비트별 순환 방정식을 구한다. 데이터의 흐름이 규칙적인 순환 방정식을 공간-시간 변환으로 새로운 시스톨릭 곱셈기를 설계하고 분석한다. 기존의 곱셈기와 비교할 때 제안한 곱셈기의 면적-시간 성능이 각각 10%와 18% 향상됨을 보여준다. 또한 같은 설계방법으로 곱셈과 제곱연산을 동시에 수행하는 새로운 시스톨릭 곱셈/제곱기를 제안한다. 유한 필드상의 지수연산을 위해서 제안한 시스톨릭 곱셈/제곱기를 사용할 때 곱셈기만을 사용 할 때보다 면적-시간 성능이 약 26% 향상됨을 보여준다.
PDF

Study on Implementation of a High-Speed Montgomery Modular Exponentiator (고속의 몽고메리 모듈라 멱승기의 구현에 관한 연구)

Kim, In-Seop;Kim, Young-Chul
- Proceedings of the Korea Information Processing Society Conference
- /
- 2002.11b
- /
- pp.901-904
- /
- 2002
정보의 암호화와 인증, 디지털 서명등에 효율적인 공개키 암호 시스템의 주 연산은 모듈라 멱승 연산이며 이는 모듈라 곱셈의 연속적인 반복 수행으로 표현될 수 있다. 본 논문에서는 Montgomery 모듈라 곱셈 알고리즘을 사용하여 모듈라 곱셈을 효율적으로 수행하기 위한 모듈라 멱승 연산기를 구현하였으며 Montgomery 모듈라 곱셈시 발생하는 케리 진파 문제를 해결하기 위하여 CPA을 대신하는 CSA를 사용함으로써 멱승 연산시 발생하는 지연시간을 최소화시키는 결과가 얻어짐을 보였다. 본 논문에서는 Montgomery 모듈라 멱승 연산기 구현을 위하여 VHDL 구조적 모델링을 통하여 Synopsys사의 VSS와 Design analyzer를 이용한 논리 합성을 하였고 Mentor Graphics사 Model sim 및 Xilinx사 Design manager의 FPGA 시뮬레이션을 수행하여 성능을 검증 하였다.
PDF

Scalable ECC Processor supporting multiple elliptic curves over prime field (소수체 상의 다중 타원곡선을 지원하는 Scalable ECC 프로세서)

Park, Byung-Gwan;Shin, Kyung-Wook
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2017.10a
- /
- pp.247-249
- /
- 2017
NIST에서 표준으로 정의된 P-192, P-224, P-256, P-384 타원곡선 상의 스칼라 곱셈(scalar multiplication) 연산을 지원하는 Scalable 타원곡선 암호(Elliptic Curve Cryptography; ECC) 프로세서의 설계에 대해 기술한다. 투영(projective) 좌표계를 이용하여 하드웨어 자원 소모가 큰 나눗셈 연산을 제거하였으며, GF(p) 상의 덧셈, 뺄셈, 곱셈 등의 유한체 연산을 지원한다. 워드 기반 몽고메리 곱셈기를 이용하여 다양한 크기의 필드(field)에서 고정된 하드웨어 자원을 통하여 곱셈 연산을 수행하도록 하였으며, 필드의 크기에 따라 연산 사이클이 증가하거나 감소한다. 설계된 Scalable ECC 프로세서는 Verilog HDL로 모델링 되었으며, Modelsim을 이용한 기능검증을 하였다. Xilinx Virtex5 FPGA 디바이스 합성결과 5,376-비트 RAM과 970 슬라이스로 구현되었으며, 최대 55 MHz의 동작 주파수를 갖는다.
PDF

A Design of Efficient Modular Multiplication based on Montgomery Algorithm (효율적인 몽고메리 모듈러 곱셈기의 설계)

Park, Hye-Young;Yoo, Kee-Young
- Proceedings of the Korea Information Processing Society Conference
- /
- 2004.05a
- /
- pp.1003-1006
- /
- 2004
본 논문에서는 몽고메리 모듈러 곱셈(Montgomery Modular Multiplication) 알고리즘을 이용하여 효율적인 모듈러 곱셈기를 제안한다. 본 논문에서 제안한 곱셈기는 프로그램 가능한 셀룰라 오토마타(Programmable Cellular Automata, PCA)를 기반의 구조로 설계되어 하드웨어 복잡도를 줄이고, 곱셈시 몽고메리 알고리즘을 이용하여 일반적인 나눗셈 없이 모듈러 연산을 수행하여 시간 복잡도를 최소화 한다. 제안된 곱셈기는 시간적, 공간적인 면에서 간단하고 효과적으로 구성되어 지수연산을 위한 하드웨어의 하부구조나 오류 수정 코드(Error Correcting Code)의 연산에서 효율적으로 이용될 수 있을 것이다.
PDF

GPU-based Sparse Matrix-Vector Multiplication Schemes for Random Walk with Restart: A Performance Study (랜덤워크 기법을 위한 GPU 기반 희소행렬 벡터 곱셈 방안에 대한 성능 평가)

Yu, Jae-Seo;Bae, Hong-Kyun;Kang, Seokwon;Yu, Yongseung;Park, Yongjun;Kim, Sang-Wook
- Proceedings of the Korea Information Processing Society Conference
- /
- 2020.11a
- /
- pp.96-97
- /
- 2020
랜덤워크 기반 노드 랭킹 방식 중 하나인 RWR(Random Walk with Restart) 기법은 희소행렬 벡터 곱셈 연산과 벡터 간의 합 연산을 반복적으로 수행하며, RWR 의 수행 시간은 희소행렬 벡터 곱셈 연산 방법에 큰 영향을 받는다. 본 논문에서는 CSR5(Compressed Sparse Row 5) 기반 희소행렬 벡터 곱셈 방식과 CSR-vector 기반 희소행렬 곱셈 방식을 채택한 GPU 기반 RWR 기법 간의 비교 실험을 수행한다. 실험을 통해 데이터 셋의 특징에 따른 RWR 의 성능 차이를 분석하고, 적합한 희소행렬 벡터 곱셈 방안 선택에 관한 가이드라인을 제안한다.
https://doi.org/10.3745/PKIPS.y2020m11a.96 인용 PDF

Efficient Implementation of Optimal Extension Fields Using Parallel Computation (병렬 연산을 이용한 최적 확장체의 효율적 구현)

이문규;박근수
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.04a
- /
- pp.269-271
- /
- 2003
본 논문에서는 타원 곡선 암호의 성능을 향상시키기 위한 효율적인 최적 확장체 연산 알고리즘을 제안한다. 제안하는 알고리즘은 CPU에서 제공되는 정수 곱셈 명령 1회 실행에 두 개의 하위체 연산을 병렬적으로 수행하도록 함으로써 최적 확장체에서의 곱셈, 제곱, 역원 연산의 속도를 향상시킨다.
PDF

Adder-based Distributed Arithmetic DWT Processor Design (가산기-기반 분산연산 DWT 프로세서 설계)

김영진;장영진;이현수
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04a
- /
- pp.16-18
- /
- 2001
DWT(Discrete Wavelet Transform) 연산을 하는데 있어서, 가장 많은 연산을 수행하는 부분은 계수(Coefficient)값과 입력값의 내적 연산을 하는 부분이다. 내적 연산을 효율적으로 줄이기 위해서 시스톨릭, 파이프라인, 병렬구조등이 연구되었으나, 이러한 기존의 방법들은 내적 연산에 들어가는 곱셈의 수는 줄이지 못했다. 본 연구에서 가산기 기반 분산연산을 이용하여 곱셈연산을 제거하고, 동일한 연산과정을 공유함으로써 가산기의 수를 최대한 줄일 수 있었다. 또한, 한 개의 1-레벨 분해 모듈을 재사용하기 위해서 스케줄링을 사용하였다. 그 결과 기존의 구조보다 게이트 수를 50%이상 줄일 수 있었으며, 속도의 향상을 얻을 수 있었다.

Montgomery Multiplier Supporting Dual-Field Modular Multiplication (듀얼 필드 모듈러 곱셈을 지원하는 몽고메리 곱셈기)

Kim, Dong-Seong;Shin, Kyung-Wook
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.24 no.6
- /
- pp.736-743
- /
- 2020
Modular multiplication is one of the most important arithmetic operations in public-key cryptography such as elliptic curve cryptography (ECC) and RSA, and the performance of modular multiplier is a key factor influencing the performance of public-key cryptographic hardware. An efficient hardware implementation of word-based Montgomery modular multiplication algorithm is described in this paper. Our modular multiplier was designed to support eleven field sizes for prime field GF(p) and binary field GF(2k) as defined by SEC2 standard for ECC, making it suitable for lightweight hardware implementations of ECC processors. The proposed architecture employs pipeline scheme between the partial product generation and addition operation and the modular reduction operation to reduce the clock cycles required to compute modular multiplication by 50%. The hardware operation of our modular multiplier was demonstrated by FPGA verification. When synthesized with a 65-nm CMOS cell library, it was realized with 33,635 gate equivalents, and the maximum operating clock frequency was estimated at 147 MHz.
https://doi.org/10.6109/jkiice.2020.24.6.736 인용 PDF KSCI

A New Multiplication Method Using Montgomery Algorithm (Montgomery 알고리듬을 이용한 새로운 곱셈 방식)

오중효;하재철;이국희;문상재
- Proceedings of the Korea Institutes of Information Security and Cryptology Conference
- /
- 1996.11a
- /
- pp.160-164
- /
- 1996
Montgomery 알고리듬은 모듈라 연산을 고속으로 수행하는 방법이다. 그러나 이는 연산할 수를 n-residue로 변환하는 전처리 단계가 필요하다. 이러한 residue 변환에 필요한 오버헤드로 인해 한번의 곱셈에는 비효율적이다. 본 논문에서는 Montgomery 알고리듬을 사용하여 한번의 곱셈을 효율적으로 수행하는 방법을 제안한다.
PDF

Search Result 554, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)