통합 검색 | Korea Science

딥러닝을 하드웨어 가속기를 위한 저전력 BSPE Core 구현 (Implementation of low power BSPE Core for deep learning hardware accelerators)

조철원;이광엽;남기훈
- 전기전자학회논문지
- /
- 제24권3호
- /
- pp.895-900
- /
- 2020
본 논문에서 BSPE는 전력이 많이 소모되는 기존의 곱셈 알고리즘을 대체했다. Bit-serial Multiplier를 이용해 하드웨어 자원을 줄였으며, 메모리 사용량을 줄이기 위해 가변적인 정수 형태의 데이터를 사용한다. 또한, 부분 합을 더하는 MOA(Multi Operand Adder)에 LOA(Lower-part OR Approximation)를 적용해서 MOA의 자원 사용량 및 전력사용량을 줄였다. 따라서 기존 MBS(Multiplication by Barrel Shifter)보다 하드웨어 자원과 전력이 각각 44%와 42%가 감소했다. 또한, BSPE Core를 위한 hardware architecture design을 제안한다.
https://doi.org/10.7471/ikeee.2020.24.3.895 인용 PDF KSCI

저전력 고속 VLSI를 위한 Fast-Relocking과 Duty-Cycle Correction 구조를 가지는 DLL 기반의 다중 클락 발생기 (A DLL-Based Multi-Clock Generator Having Fast-Relocking and Duty-Cycle Correction Scheme for Low Power and High Speed VLSIs)

황태진;연규성;전치훈;위재경
- 대한전자공학회논문지SD
- /
- 제42권2호
- /
- pp.23-30
- /
- 2005
이 논문에서는 낮은 stand-by power 및 DLL의 재동작 후 fast relocking 구조를 가지는 저전력, 고속 VISI 칩용 DLL(지연 고정 루프) 기반의 다중 클락 발생기를 제안하였다. 제안된 구조는 주파수 곱셈기를 이용하여 주파수 체배가 가능하며 시스템 클락의 듀티비에 상관없이 항상 50:50 듀티비를 위한 Duty-Cycle Correction 구조를 가지고 있다. 또한 DAC를 이용한 디지털 컨트롤 구조를 클락 시스템이 standby-mode에서 operation-mode 전환 후 빠른 relocking 동작을 보장하고 아날로그 locking 정보를 레지스터에 디지털 코드로 저장하기 위해 사용하였다. 클락 multiplication을 위한 주파수 곱셈기 구조로는 multiphase를 이용한 feed-forward duty correction 구조를 이용하여 지연 시간 없이 phase mixing으로 출력 클락의 duty error를 보정하도록 설계하였다. 본 논문에서 제안된 DLL 기반 다중 클락 발생기는 I/O 데이터 통신을 위한 외부 클락의 동기 클락과 여러 IP들을 위한 고속 및 저속 동작의 다중 클락을 제공한다. 제안된 DLL기반의 다중 클락 발생기는 $0.35-{\mu}m$ CMOS 공정으로 $1796{\mu}m\times654{\mu}m$ 면적을 가지며 동작 전압 2.3v에서 $75MHz\~550MHz$ lock 범위와 800 MHz의 최대 multiplication 주파수를 가지고 20psec 이하의 static skew를 가지도록 설계되었다.
PDF KSCI

소비전력 인지형 곱셈 연산 누적기의 설계 및 구현 (Design and Implementation of a Power Aware Scalable Pipelined Booth Multiply & Accumulate Unit)

신민혁;이한호
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2006년도 하계종합학술대회
- /
- pp.573-574
- /
- 2006
A low-power power-aware scalable pipelined Booth recoded multiply & Accumulate unit (PA-MAC) detects the input operands for their dynamic range and accordingly implements a 16-bit, 8-bit or 4-bit multiplication and accumulation operation. The multiplication mode is determined by the dynamic - range detection unit. For the computations, although an area of the proposed PA-MAC is lager than a non-scalable MAC respectively, the proposed PA-MAC proves to be globally more power efficient than a non-scalable MAC.
PDF

Low-Power and Low-Hardware Bit-Parallel Polynomial Basis Systolic Multiplier over GF(2^m) for Irreducible Polynomials

Mathe, Sudha Ellison;Boppana, Lakshmi
- ETRI Journal
- /
- 제39권4호
- /
- pp.570-581
- /
- 2017
Multiplication in finite fields is used in many applications, especially in cryptography. It is a basic and the most computationally intensive operation from among all such operations. Several systolic multipliers are proposed in the literature that offer low hardware complexity or high speed. In this paper, a bit-parallel polynomial basis systolic multiplier for generic irreducible polynomials is proposed based on a modified interleaved multiplication method. The hardware complexity and delay of the proposed multiplier are estimated, and a comparison with the corresponding multipliers available in the literature is presented. Of the corresponding multipliers, the proposed multiplier achieves a reduction in the hardware complexity of up to 20% when compared to the best multiplier for m = 163. The synthesis results of application-specific integrated circuit and field-programmable gate array implementations of the proposed multiplier are also presented. From the synthesis results, it is inferred that the proposed multiplier achieves low power consumption and low area complexitywhen compared to the best of the corresponding multipliers.
https://doi.org/10.4218/etrij.17.0116.0770 인용 PDF KSCI

단정도/배정도 승산을 위한 $200-MHz{\circled}a2.5-V$ 이중 모드 승산기 (A $200-MHz{\circled}a2.5-V$ Dual-Mode Multiplier for Single/Double-Precision Multiplications)

이종남;박종화;신경욱
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2000년도 하계종합학술대회 논문집(2)
- /
- pp.149-152
- /
- 2000
A dual-mode multiplier (DMM) that performs single- and double-precision multiplications has been designed. An algorithm for efficiently implementing double-precision multiplication with a single-precision multiplier was proposed, which is based on partitioning double-precision multiplication into four single-precision sub-multiplications and computing them with sequential accumulations. When compared with conventional double-precision multipliers, our approach reduces the hardware complexity by about one third resulting in small silicon area and low-power dissipation at the expense of increased latency and throughput cycles.
PDF

A Low-area and Low-power 512-point Pipelined FFT Design Using Radix-2⁴-2³ for OFDM Applications

Yu, Jian;Cho, Kyung-Ju
- 한국정보전자통신기술학회논문지
- /
- 제11권5호
- /
- pp.475-480
- /
- 2018
In OFDM-based systems, FFT is a critical component since it occupies large area and consumes more power. In this paper, we present a low hardware-cost and low power 512-point pipelined FFT design method for OFDM applications. To reduce the number of twiddle factors and to choose simple design architecture, the radix-$2^4-2^3$ algorithm are exploited. For twiddle factor multiplication, we propose a new canonical signed digit (CSD) complex multiplier design method to minimize the hardware-cost. In hardware implementation with Intel FPGA, the proposed FFT design achieves more than about 28% reduction in gate count and 18% reduction in power consumption compared to the previous approaches.
https://doi.org/10.17661/jkiiect.2018.11.5.475 인용 PDF KSCI

ADSL용 4D TCM Decoder 저전력 구조 설계 연구 (A low-power VLSI architecture of 4D TCM decoder for ADSL)

이금형;김재석
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 1999년도 추계종합학술대회 논문집
- /
- pp.871-874
- /
- 1999
We propose a low complexity M-D(multidimensional) TCM decoder VLSI architecture for ADSL System. We use the shared subset decoder module by modifying the whole decoding procedure. We reduce power consumption by using the MSA (modulo set area) operation, which removes multiplication in 4D metric calculation. Also the proposed TCM decoder reduces chip area. It can be adopted in high-speed xDSL system.
PDF

저전압 저전력 아날로그 멀티플라이어 설계 (Design of a Analog Multiplier for low-voltage low-power)

이근호;설남오
- 대한전기학회:학술대회논문집
- /
- 대한전기학회 2005년도 제36회 하계학술대회 논문집 D
- /
- pp.3058-3060
- /
- 2005
In this paper, the CMOS four-quadrant analog multipliers for low-voltage low-power applications are presented. The circuit approach is based on the characteristic of the LV (Low-Voltage) composite transistor which is one of the useful analog building blocks. SPICE simulations are carried out to examine the performances of the designed multipliers. Simulation results are obtained by $0.25{\mu}m$ CMOS parameters with 2V power supply. The LV composite transistor can easily be extended to perform a four-quadrant multiplication. The multiplier has a linear input range up to ${\pm}0.5V$ with a linearity error of less than 1%. The measured -3dB bandwidth is 290MHz and the power dissipation is $37{\mu}W$. The proposed multiplier is expected to be suitable for analog signal processing applications such as portable communication equipment, radio receivers, and hand-held movie cameras.
PDF

A Low-Complexity 128-Point Mixed-Radix FFT Processor for MB-OFDM UWB Systems

Cho, Sang-In;Kang, Kyu-Min
- ETRI Journal
- /
- 제32권1호
- /
- pp.1-10
- /
- 2010
In this paper, we present a fast Fourier transform (FFT) processor with four parallel data paths for multiband orthogonal frequency-division multiplexing ultra-wideband systems. The proposed 128-point FFT processor employs both a modified radix-$2^4$ algorithm and a radix-$2^3$ algorithm to significantly reduce the numbers of complex constant multipliers and complex booth multipliers. It also employs substructure-sharing multiplication units instead of constant multipliers to efficiently conduct multiplication operations with only addition and shift operations. The proposed FFT processor is implemented and tested using 0.18 ${\mu}m$ CMOS technology with a supply voltage of 1.8 V. The hardware- efficient 128-point FFT processor with four data streams can support a data processing rate of up to 1 Gsample/s while consuming 112 mW. The implementation results show that the proposed 128-point mixed-radix FFT architecture significantly reduces the hardware cost and power consumption in comparison to existing 128-point FFT architectures.
https://doi.org/10.4218/etrij.10.0109.0232 인용 PDF KSCI

Highly Accurate Approximate Multiplier using Heterogeneous Inexact 4-2 Compressors for Error-resilient Applications

Lee, Jaewoo;Kim, HyunJin
- 대한임베디드공학회논문지
- /
- 제16권5호
- /
- pp.233-240
- /
- 2021
We propose a novel, highly accurate approximate multiplier using different types of inexact 4-2 compressors. The importance of low hardware costs leads us to develop approximate multiplication for error-resilient applications. Several rules are developed when selecting a topology for designing the proposed multiplier. Our highly accurate multiplier design considers the different error characteristics of adopted compressors, which achieves a good error distribution, including a low relative error of 0.02% in the 8-bit multiplication. Our analysis shows that the proposed multiplier significantly reduces power consumption and area by 45% and 26%, compared with the exact multiplier. Notably, a trade-off relationship between error characteristics and hardware costs can be achieved when considering those of existing highly accurate approximate multipliers. In the image blending, edge detection and image sharpening applications, the proposed 8-bit approximate multiplier shows better performance in terms of image quality metrics compared with other highly accurate approximate multipliers.
https://doi.org/10.14372/IEMEK.2021.16.5.233 인용 PDF KSCI

검색결과 75건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)