Search | Korea Science

An Optimized Hybrid Radix MAC Design (최적화된 4진18진 혼합 MAC 설계)

정진우;김승철;이용주;이용석
- Proceedings of the IEEK Conference
- /
- 2002.06b
- /
- pp.173-176
- /
- 2002
This paper is about a high-speed MAC (multiplier and accumulator) design applying radix-4 and radix-8 Booth's algorithm at the same time. The optimized hybrid radix design for high speed MAC has taken advantage of both a radix-4 and a radix-8 architectures. A radix-4 architecture meets high-speed, but it takes much more power and chip area than a radix-8 architecture. A radix-8 architecture needs less power and chip area than the other, but it has a bottleneck of generating three times the multiplicand problem. An optimized hybrid architecture performs the radix-4 multiplication partially in parallel with the generation of three times the multiplicand for use of the radix-8 multiplication. It reduces the concerned bit width of multiplier in radix-8 multiplication.
PDF

An Optimized Hybrid Radix MAC Design (최적화된 4진/8진 혼합 MAC 설계)

정진우;김승철;이용주;이용석
- Proceedings of the IEEK Conference
- /
- 2002.06a
- /
- pp.125-128
- /
- 2002
This paper is about a high-speed MAC (multiplier and accumulator) design applying radix-4 and radix-8 Booth's algorithm at the same time. The optimized hybrid radix design for high speed MAC has taken advantage of both a radix-4 and a radix-8 architectures. A radix-4 architecture meets high-speed, but it takes much more power and chip area than a radix-8 architecture. A radix-8 architecture needs less power and chip area than the other, but it has a bottleneck of generating three times the multiplicand problem. An optimized hybrid architecture performs tile radix-4 multiplication partially in parallel with the generation of three times the multiplicand for use of tile radix-8 multiplication. It reduces the concerned bit width of multiplier in radix-8 multiplication.
PDF

A Low-Complexity 128-Point Mixed-Radix FFT Processor for MB-OFDM UWB Systems

Cho, Sang-In;Kang, Kyu-Min
- ETRI Journal
- /
- v.32 no.1
- /
- pp.1-10
- /
- 2010
In this paper, we present a fast Fourier transform (FFT) processor with four parallel data paths for multiband orthogonal frequency-division multiplexing ultra-wideband systems. The proposed 128-point FFT processor employs both a modified radix-$2^4$ algorithm and a radix-$2^3$ algorithm to significantly reduce the numbers of complex constant multipliers and complex booth multipliers. It also employs substructure-sharing multiplication units instead of constant multipliers to efficiently conduct multiplication operations with only addition and shift operations. The proposed FFT processor is implemented and tested using 0.18 ${\mu}m$ CMOS technology with a supply voltage of 1.8 V. The hardware- efficient 128-point FFT processor with four data streams can support a data processing rate of up to 1 Gsample/s while consuming 112 mW. The implementation results show that the proposed 128-point mixed-radix FFT architecture significantly reduces the hardware cost and power consumption in comparison to existing 128-point FFT architectures.
https://doi.org/10.4218/etrij.10.0109.0232 인용 PDF KSCI

Design of a 64×64-Bit Modified Booth Multiplier Using Current-Mode CMOS Quarternary Logic Circuits (전류모드 CMOS 4치 논리회로를 이용한 64×64-비트 변형된 Booth 곱셈기 설계)

Kim, Jeong-Beom
- The KIPS Transactions:PartA
- /
- v.14A no.4
- /
- pp.203-208
- /
- 2007
This paper proposes a $64{\times}64$ Modified Booth multiplier using CMOS multi-valued logic circuits. The multiplier based on the radix-4 algorithm is designed with current mode CMOS quaternary logic circuits. Designed multiplier is reduced the transistor count by 64.4% compared with the voltage mode binary multiplier. The multiplier is designed with Samsung $0.35{\mu}m$ standard CMOS process at a 3.3V supply voltage and unit current $5{\mu}m$. The validity and effectiveness are verified through the HSPICE simulation. The voltage mode binary multiplier is achieved the occupied area of $7.5{\times}9.4mm^2$, the maximum propagation delay time of 9.8ns and the average power consumption of 45.2mW. This multiplier is achieved the maximum propagation delay time of 11.9ns and the average power consumption of 49.7mW. The designed multiplier is reduced the occupied area by 42.5% compared with the voltage mode binary multiplier.
https://doi.org/10.3745/KIPSTA.2007.14-A.4.203 인용 PDF KSCI

A Study on Multiplier Architectures Optimized for 32-bit RISC Processor with 3-Stage Pipeline (32비트 3단 파이프라인을 가진 RISC 프로세서에 최적화된 Multiplier 구조에 관한 연구)

정근영;박주성;김석찬
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.41 no.11
- /
- pp.123-130
- /
- 2004
This paper describes a multiplier architecture optimized for 32 bit RISC processor with 3-stage pipeline. The multiplier of ARM7, the target processor, is variably carried out on the execution stage of pipeline within 7 cycles. The included multiplier employs a modified Booth's algerian to produce 64 bit multiplication and addition product and it has 6 separate instructions. We analyzed several multiplication algorithm such as radix4-32${\times}$8, radix4-32${\times}$16 and radix8-32${\times}$32 to decide which multiplication architecture is most fit for a typical architecture of ARM7. VLSI area, cycle delay time and execution cycle number is the index of an efficient design and the final multiplier was designed on these indexes. To verify the operation of embedded multiplier, it was simulated with various audio algorithms.
PDF KSCI

Design of a Multi-Valued Arithmetic Processor with Encoder and Decoder (인코더, 디코오더를 가지는 다치 연산기 설계)

박진우;양대영;송홍복
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.2 no.1
- /
- pp.147-156
- /
- 1998
In this paper, an arithmetic processor using multi-valued logic is designed. For implementing of multi-valued logic circuits, we use current-mode CMOS circuits and design encoder which change binary voltage-mode signals to multi-valued current-mode signals and decoder which change results of arithmetic to binary voltage-mode signals. To reduce the number of partial product we use 4-radix SD number partial product generation algorithm that is an extension of the modified Booth's algorithm. We demonstrate the effectiveness of the proposed arithmetic circuits through SPICE simulation and Hardware emulation using FPGA chip.
PDF

A Efficient Architecture of MBA-based Parallel MAC for High-Speed Digital Signal Processing (고속 디지털 신호처리를 위한 MBA기반 병렬 MAC의 효율적인 구조)

서영호;김동욱
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.41 no.7
- /
- pp.53-61
- /
- 2004
In this paper, we proposed a new architecture of MAC(Multiplier-Accumulator) to operate high-speed multiplication-accumulation. We used the MBA(Modified radix-4 Booth Algorithm) which is based on the 1's complement number system, and CSA(Carry Save Adder) for addition of the partial products. During the addition of the partial product, the signed numbers with the 1's complement type after Booth encoding are converted in the 2's complement signed number in the CSA tree. Since 2-bit CLA(Carry Look-ahead Adder) was used in adding the lower bits of the partial product, the input bit width of the final adder and whole delay of the critical path were reduced. The proposed MAC was applied into the DWT(Discrete Wavelet Transform) filtering operation for JPEG2000, and it showed the possibility for the practical application. Finally we identified the improved performance according to the comparison with the previous architecture in the aspect of hardware resource and delay.
PDF KSCI

A Study On the Design of a Floating Point Unit for MPEG-2 AAC Decoder (MPEG-2 AAC 복호기를 위한 부동소수점유닛 설계에 관한 연구)

구대성;김필중;김종빈
- Journal of the Institute of Electronics Engineers of Korea TE
- /
- v.39 no.4
- /
- pp.355-355
- /
- 2002
In this paper, we designed a FPU(floating point unit) that it is very important and requires of high density when digital audio is designed. Almost audio system must support the multi-channel and required for high quality. A floating point arithmetic function in MPEG-2 AAC that implemented by hardware is able to realtime decoding when DSP realization. The reason is that MPEG-2 AAC is compatible to the Audio field of MPEG-4 and afterwards. We designed a FPU by hardware to increase the speed of a floating point unit with much calculation part in the MPEG-2 AAC Decoder. A FPU is composed of a multiplier and an adder. A multiplier used the Radix-4 Booth algorithm and an adder adopted 1's complement method for speed up. A form of a floating point unit has 8bit of exponent part and 24bit of mantissa. It's compatible with the IEEE single precision format and adopted a pipeline architecture to increase the speed of a processor. All of sub blocks are based on ISO/IEC 13818-7 standard. The algorithm is tested by C language and the design does by use of VHDL(VHSIC Hardware Description Language). The maximum operation speed is 23.2MHz and the stable operation speed is 19MHz.
KSCI

Asynchronous 16bit Multiplier with micropipelined structure (마이크로파이프라인 구조의 16bit 비동기 곱셈기)

장미숙;이유진;김학윤;이우석;최호용
- Proceedings of the IEEK Conference
- /
- 2000.06b
- /
- pp.145-148
- /
- 2000
A 16bit asynchronous multiplier has been designed using micropipelind structure with 2 phase and data bundling. And 4-radix modified Booth algorithm, CPlatch(Cature-Pass latch) and modified 4-2 counters have adopted in this design. It is implemented in 0.65$\mu\textrm{m}$ double-poly/double-metal CMOS technology by using 12,074 transistors with core size of 1.4${\times}$1.8$\textrm{mm}^2$. And our design results in a computation rate 55MHz a supply voltage of 3.3V.
PDF

New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm (Radix-2 MBA 기반 병렬 MAC의 VLSI 구조)

Seo, Young-Ho;Kim, Dong-Wook
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.45 no.4
- /
- pp.94-104
- /
- 2008
In this paper, we propose a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator which has the largest delay in MAC was removed and its function was included into CSA, the overall performance becomes to be elevated. The proposed CSA tree uses 1's complement-based radix-2 modified booth algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of operands. The CSA propagates the carries by the least significant bits of the partial products and generates the least significant bits in advance for decreasing the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits not the output of the final adder for improving the performance by optimizing the efficiency of pipeline scheme. The proposed architecture was synthesized with $250{\mu}m,\;180{\mu}m,\;130{\mu}m$ and 90nm standard CMOS library after designing it. We analyzed the results such as hardware resource, delay, and pipeline which are based on the theoretical and experimental estimation. We used Sakurai's alpha power low for the delay modeling. The proposed MAC has the superior properties to the standard design in many ways and its performance is twice as much than the previous research in the similar clock frequency.
PDF KSCI

Search Result 26, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)