Search | Korea Science

Design of 32-bit Floating Point Multiplier for FPGA (FPGA를 위한 32비트 부동소수점 곱셈기 설계)

Xuhao Zhang;Dae-Ik Kim
- The Journal of the Korea institute of electronic communication sciences
- /
- v.19 no.2
- /
- pp.409-416
- /
- 2024
With the expansion of floating-point operation requirements for fast high-speed data signal processing and logic operations, the speed of the floating-point operation unit is the key to affect system operation. This paper studies the performance characteristics of different floating-point multiplier schemes, completes partial product compression in the form of carry and sum, and then uses a carry look-ahead adder to obtain the result. Intel Quartus II CAD tool is used for describing Verilog HDL and evaluating performance results of the floating point multipliers. Floating point multipliers are analyzed and compared based on area, speed, and power consumption. The FMAX of modified Booth encoding with Wallace tree is 33.96 Mhz, which is 2.04 times faster than the booth encoding, 1.62 times faster than the modified booth encoding, 1.04 times faster than the booth encoding with wallace tree. Furthermore, compared to modified booth encoding, the area of modified booth encoding with wallace tree is reduced by 24.88%, and power consumption of that is reduced by 2.5%.
https://doi.org/10.13067/JKIECS.2024.19.2.409 인용 PDF

A High-Performance ECC Processor Supporting NIST P-521 Elliptic Curve (NIST P-521 타원곡선을 지원하는 고성능 ECC 프로세서)

Yang, Hyeon-Jun;Shin, Kyung-Wook
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.26 no.4
- /
- pp.548-555
- /
- 2022
This paper describes the hardware implementation of elliptic curve cryptography (ECC) used as a core operation in elliptic curve digital signature algorithm (ECDSA). The ECC processor supports eight operation modes (four point operations, four modular operations) on the NIST P-521 curve. In order to minimize computation complexity required for point scalar multiplication (PSM), the radix-4 Booth encoding scheme and modified Jacobian coordinate system were adopted, which was based on the complexity analysis for five PSM algorithms and four different coordinate systems. Modular multiplication was implemented using a modified 3-Way Toom-Cook multiplication and a modified fast reduction algorithm. The ECC processor was implemented on xczu7ev FPGA device to verify hardware operation. Hardware resources of 101,921 LUTs, 18,357 flip-flops and 101 DSP blocks were used, and it was evaluated that about 370 PSM operations per second were achieved at a maximum operation clock frequency of 45 MHz.
https://doi.org/10.6109/jkiice.2022.26.4.548 인용 PDF KSCI

Low-Power Multiplier Using Input Data Partition (입력 데이터 분할을 이용한 저전력 부스 곱셈기 설계)

Park Jongsu;Kim Jinsang;Cho Won-Kyung
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.30 no.11A
- /
- pp.1092-1097
- /
- 2005
In this paper, we propose a low-power Booth multiplication which reduces the switching activities of partial products during multiplication process. Radix-4 Booth algorithm has a characteristic that produces the Booth encoded products with zero when input data have sequentially equal values (0 or 1). Therefore, partial products have higher chances of being zero when an input with a smaller effective dynamic range of two multiplication inputs is used as a multiplier data instead of a multiplicand. The proposed multiplier divides a multiplication expression into several multiplication expressions with smaller bits than those of an original input data, and each multiplication is computed independently for the Booth encoding. Finally, the results of each multiplication are added. This means that the proposed multiplier has a higher chance to have zero encoded products so that we can implement a low power multiplier with the smaller switching activity. Implementation results show the proposed multiplier can save maximally about $20\%$ power dissipation than a previous Booth multiplier.
PDF KSCI

Fast Motion Estimation Algorithm Using Motion Vectors of Neighboring Blocks (인접블록의 움직임벡터를 이용한 고속 움직임추정 방식)

So Hyeon-Ho;Kim Jinsang;Cho Won-Kyung;Kim Young-Soo;Suh Doug Young
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.30 no.12C
- /
- pp.1256-1261
- /
- 2005
In this paper, we propose a low-power Booth multiplication which reduces the switching activities of partial products during multiplication process. Radix-4 Booth algorithm has a characteristic that produces the Booth encoded products with zero when input data have sequentially equal values (0 or 1). Therefore, partial products have higher chances of being zero when an input with a smaller effective dynamic range of two multiplication inputs is used as a multiplier data instead of a multiplicand. The proposed multiplier divides a multiplication expression into several multiplication expressions with smaller bits than those of an original input data, and each multiplication is computed independently for the Booth encoding. Finally, the results of each multiplication are added. This means that the proposed multiplier has a higher chance to have zero encoded products so that we can implement a low power multiplier with the smaller switching activity. Implementation results show the proposed multiplier can save maximally about $20\%$ power dissipation than a previous Booth multiplier.
PDF KSCI

A Efficient Architecture of MBA-based Parallel MAC for High-Speed Digital Signal Processing (고속 디지털 신호처리를 위한 MBA기반 병렬 MAC의 효율적인 구조)

서영호;김동욱
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.41 no.7
- /
- pp.53-61
- /
- 2004
In this paper, we proposed a new architecture of MAC(Multiplier-Accumulator) to operate high-speed multiplication-accumulation. We used the MBA(Modified radix-4 Booth Algorithm) which is based on the 1's complement number system, and CSA(Carry Save Adder) for addition of the partial products. During the addition of the partial product, the signed numbers with the 1's complement type after Booth encoding are converted in the 2's complement signed number in the CSA tree. Since 2-bit CLA(Carry Look-ahead Adder) was used in adding the lower bits of the partial product, the input bit width of the final adder and whole delay of the critical path were reduced. The proposed MAC was applied into the DWT(Discrete Wavelet Transform) filtering operation for JPEG2000, and it showed the possibility for the practical application. Finally we identified the improved performance according to the comparison with the previous architecture in the aspect of hardware resource and delay.
PDF KSCI

Asynchronous Multiplier with Parallel Array Structure (병렬배열구조를 사용한 비동기 곱셈기)

Park, Chan-Ho;Choe, Byeong-Su;Lee, Dong-Ik
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.39 no.5
- /
- pp.87-94
- /
- 2002
In this paper an asynchronous away multiplier with a parallel array structure is introduced. This parallel array structure is used to make the computation time faster with a lower Power consumption. Asymmetric parallel away structure is used to minimize the average computation time in an asynchronous multiplier. Simulation shows that this structure reduces the time needed for computation by 55% as compared to conventional booth encoding array structures and that the multiplier with the proposed away structure shows a reduction of 40% in the computational time with a relatively lower power consumption.
PDF KSCI

Design of a Truncated Floating-Point Multiplier for Graphic Accelerator of Mobile Devices (모바일 그래픽 가속기용 부동소수점 절사 승산기 설계)

Cho, Young-Sung;Lee, Yong-Hwan
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.11 no.3
- /
- pp.563-569
- /
- 2007
As the mobile communication and the semiconductor technology is improved continuously, mobile contents such as the multimedia service and the 2D/3D graphics which require high level graphics are serviced recently. Mobile chips should consume small die area and low power. In this paper, we design a truncated floating-point multiplier that is useful for the 2D/3D vector graphics in mobile devices. The truncated multiplier is based on the radix-4 Booth's encoding algorithm and a truncation algorithm is used to achieve small area and low power. The average percent error of the multiplier is as small as 0.00003% and neglectable for mobile applications. The synthesis result using 0.35um CMOS cell library shows that the number of gates for the truncated multiplier is only 33.8 percent of the conventional radix-4 Booth's multiplier.
https://doi.org/10.6109/jkiice.2007.11.3.563 인용 PDF KSCI

A new scheme for VLSI implementation of fast parallel multiplier using 2x2 submultipliers and ture 4:2 compressors with no carry propagation (부분곱의 재정렬과 4:2 변환기법을 이용한 VLSI 고속 병렬 곱셈기의 새로운 구현 방법)

이상구;전영숙
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.34C no.10
- /
- pp.27-35
- /
- 1997
In this paper, we propose a new scheme for the generation of partial products for VLSI fast parallel multiplier. It adopts a new encoding method which halves the number of partial products using 2x2 submultipliers and rearrangement of primitive partial products. The true 4-input CSA can be achieved with appropriate rearrangement of primitive partial products out of 2x2 submultipliers using the newly proposed theorem on binary number system. A 16bit x 16bit multiplier has been desinged using the proposed method and simulated to prove that the method has comparable speed and area compared to booth's encoding method. Much smaller and faster multiplier could be obtained with far optimization. The proposed scheme can be easily extended to multipliers with inputs of higher resolutions.
PDF

Design of a $54{\times}54$-bit Multiplier Based on a Improved Conditional Sum Adder (개선된 조건 합 가산기를 이용한 $54{\times}54$-bit 곱셈기의 설계)

Lee, Young-Chul;Song, Min-Kyu
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.37 no.1
- /
- pp.67-74
- /
- 2000
In this paper, a $54{\times}54$-bit multiplier based on a improved conditional sum adder is proposed. To reduce the multiplication time, high compression-rate compressors without Booth's Encoding, and a 108-bit conditional sum adder with separated carry generation block, are developed. Furthermore, a design technique based on pass-transistor logic is utilized for optimize the multiplication time and the power consumption by about 5% compared to that of conventional one. With $0.65{\mu}m$, single-poly, triple-metal CMOS process, its chip size is $6.60{\times}6.69\;mm^2$ and the multiplication time is 135.ns at a 3.3V power supply.
PDF

Parameterized IP Core of Complex-Number Multiplier (파라미터화된 복소수 승산기 IP 코어)

양대성;이승기;신경욱
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2001.05a
- /
- pp.307-310
- /
- 2001
A parameterized complex-number multiplier (PCMUL) core IP (Intellectual Property), which can be used as an essential arithmetic unit in baseband signal processing of digital communication systems, is described. The bit-width of the multiplier is parameterized in the range of 8-b~24-b and is user-selectable in 2-b step. The PCMUL_GEN, a core generator with GUI, generates VHDL code of a CMUL core for a specified bit-width. The IP is based on redundant binary (RB) arithmetic and a new radix4 Booth encoding/decoding scheme proposed in this paper. It results in a simplified internal structure, as well as high-speed, low-power, and area-efficient implementation. The designed IP was verified using Xilinx FPGA board.
PDF

Search Result 14, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)