• Title/Summary/Keyword: Floating Point Multiplier

Search Result 24, Processing Time 0.021 seconds

Design of a Truncated Floating-Point Multiplier for Graphic Accelerator of Mobile Devices (모바일 그래픽 가속기용 부동소수점 절사 승산기 설계)

  • Cho, Young-Sung;Lee, Yong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.3
    • /
    • pp.563-569
    • /
    • 2007
  • As the mobile communication and the semiconductor technology is improved continuously, mobile contents such as the multimedia service and the 2D/3D graphics which require high level graphics are serviced recently. Mobile chips should consume small die area and low power. In this paper, we design a truncated floating-point multiplier that is useful for the 2D/3D vector graphics in mobile devices. The truncated multiplier is based on the radix-4 Booth's encoding algorithm and a truncation algorithm is used to achieve small area and low power. The average percent error of the multiplier is as small as 0.00003% and neglectable for mobile applications. The synthesis result using 0.35um CMOS cell library shows that the number of gates for the truncated multiplier is only 33.8 percent of the conventional radix-4 Booth's multiplier.

A design of Floating Point Arithmetic Unit for Geometry Operation of Mobile 3D Graphic Processor (모바일 3D 그래픽 프로세서의 지오메트리 연산을 위한 부동 소수점 연산기 구현)

  • Lee, Jee-Myong;Lee, Chan-Ho
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.711-714
    • /
    • 2005
  • We propose floating point arithmetic units for geometry operation of mobile 3D graphic processor. The proposed arithmetic units conform to the single precision format of IEEE standard 754-1985 that is a standard of floating point arithmetic. The rounding algorithm applies the nearest toward zero form. The proposed adder/subtraction unit and multiplier have one clock cycle latency, and the inversion unit has three clock cycle latency. We estimate the required numbers of arithmetic operation for Viewing transformation. The first stage of geometry operation is composed with translation, rotation and scaling operation. The translation operation requires three addition and the rotation operation needs three addition and six multiplication. The scaling operation requires three multiplication. The viewing transformation is performed in 15 clock cycles. If the adder and the multiplier have their own in/out ports, the viewing transformation can be done in 9 clock cycles. The error margin of proposed arithmetic units is smaller than $10^{-5}$ that is the request in the OpenGL standard. The proposed arithmetic units carry out operations in 100MHz clock frequency.

  • PDF

Single-Chip Controller Design for Piezoelectric Actuators using FPGA (FPGA를 이용한 압전소자 작동기용 단일칩 제어기 설계)

  • Yoon, Min-Ho;Park, Jungkeun;Kang, Taesam
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.22 no.7
    • /
    • pp.513-518
    • /
    • 2016
  • The piezoelectric actuating device is known for its large power density and simple structure. It can generate a larger force than a conventional actuator and has also wide bandwidth with fast response in a compact size. To control the piezoelectric actuator, we need an analog signal conditioning circuit as well as digital microcontrollers. Conventional microcontrollers are not equipped with an analog part and need digital-to-analog converters, which makes the system bulky compared with the small size of piezoelectric devices. To overcome these weaknesses, we are developing a single-chip controller that can handle analog and digital signals simultaneously using mixed-signal FPGA technology. This gives more flexibility than traditional fixed-function microcontrollers, and the control speed can be increased greatly due to the parallel processing characteristics of the FPGA. In this paper, we developed a floating-point multiplier, PWM generator, 80-kHz power control loop, and 1-kHz position feedback control loop using a single mixed-signal FPGA. It takes only 50 ns for single floating-point multiplication. The PWM generator gives two outputs to control the charging and discharging of the high-voltage output capacitor. Through experimentation and simulation, it is demonstrated that the designed control loops work properly in a real environment.

Design and MPW Implementation of 3D Graphics Floating Point Ips (3차원 그래픽용 부동 소수점 연산기 IP 설계 및 MPW 구현)

  • Lee, Jung-Woo;Kim, Ki-Chul
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.987-988
    • /
    • 2006
  • This paper presents a design and MPW implementation of 3D Graphics Floating Point IPs. Designed IPs include adder, subtractor, multiplier, divider, and reciprocal unit. The IPs have pipelined structures. The IPs meet the accuracy required in OpenGL ES. The operation frequency of the IPs is 100MHz. The IPs can be efficiently used in 3D graphics accelerators.

  • PDF

A New Multiplication Architecture for DSP Applications

  • Son, Nguyen-Minh;Kim, Jong-Soo;Choi, Jae-Ha
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.12 no.2
    • /
    • pp.139-144
    • /
    • 2011
  • The modern digital logic technology does not yet satisfy the speed requirements of real-time DSP circuits due to synchronized operation of multiplication and accumulation. This operation degrades DSP performance. Therefore, the double-base number system (DBNS) has emerged in DSP system as an alternative methodology because of fast multiplication and hardware simplicity. In this paper, authors propose a novel multiplication architecture. One operand is an output of a flash analog-to-digital converter (ADC) in DBNS format, while the other operand is a coefficient in the IEEE standard floating-point number format. The DBNS digital output from ADC is produced through a new double base number encoder (DBNE). The multiplied output is in the format of the IEEE standard floating-point number (FPNS). The proposed circuits process multiplication and conversion together. Compared to a typical multiplier that uses the FPNS, the proposed multiplier also consumes 45% less gates, and 44% faster than the FPNS multiplier on Spartan-3 FPGA board. The design is verified with FIR filter applications.

A low-cost compensated approximate multiplier for Bfloat16 data processing on convolutional neural network inference

  • Kim, HyunJin
    • ETRI Journal
    • /
    • v.43 no.4
    • /
    • pp.684-693
    • /
    • 2021
  • This paper presents a low-cost two-stage approximate multiplier for bfloat16 (brain floating-point) data processing. For cost-efficient approximate multiplication, the first stage implements Mitchell's algorithm that performs the approximate multiplication using only two adders. The second stage adopts the exact multiplication to compensate for the error from the first stage by multiplying error terms and adding its truncated result to the final output. In our design, the low-cost multiplications in both stages can reduce hardware costs significantly and provide low relative errors by compensating for the error from the first stage. We apply our approximate multiplier to the convolutional neural network (CNN) inferences, which shows small accuracy drops with well-known pre-trained models for the ImageNet database. Therefore, our design allows low-cost CNN inference systems with high test accuracy.

An exact floating point square root calculator using multiplier (곱셈기를 이용한 정확한 부동소수점 제곱근 계산기)

  • Cho, Gyeong-Yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.8
    • /
    • pp.1593-1600
    • /
    • 2009
  • There are two major algorithms to find a square root of floating point number, one is the Newton_Raphson algorithm and GoldSchmidt algorithm which calculate it approximately by iterating multiplications and the other is SRT algorithm which calculates it exactly by iterating subtractions. This paper proposes an exact floating point square root algorithm using only multiplication. At first an approximate inverse square root is calculated by Newton_Raphson algorithm, and then an exact square root algorithm by reducing an error in it and a compensation algorithm of it are proposed. The proposed algorithm is verified to calculate all of numbers in a single precision floating point number and 1 billion random numbers in a double precision floating point number. The proposed algorithm requires only the multipliers without another hardware, so it can be widely used in an embedded system and mobile production which requires an efact square root of floating point number.

Design of a high-performance floating-point unit adopting a new divide/square root implementation (새로운 제산/제곱근기를 내장한 고성능 부동 소수점 유닛의 설계)

  • Lee, Tae-Young;Lee, Sung-Youn;Hong, In-Pyo;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.12
    • /
    • pp.79-90
    • /
    • 2000
  • In this paper, a high-performance floating point unit, which is suitable for high-performance superscalar microprocessors and supports IEEE 754 standard, is designed. Floating-point arithmetic unit (AU) supports all denormalized number processing through hardware, while eliminating the additional delay time due to the denormalized number processing by proposing the proposed gradual underflow prediction (GUP) scheme. Contrary to the existing fixed-radix implementations, floating-point divide/square root unit adopts a new architecture which determines variable length quotient bits per cycle. The new architecture is superior to the SRT implementations in terms of performance and design complexity. Moreover, sophisticated exception prediction scheme enables precise exception to be implemented with ease on various superscalar microprocessors, and removes the stall cycles in division. Designed floating-point AU and divide/square root unit are integrated with and instruction decoder, register file, memory model and multiplier to form a floating-point unit, and its function and performance is verified.

  • PDF

Design of Floating-Point Multiplier for Mobile Graphics Application (모바일 그래픽스 응용을 위한 부동소수점 승산기의 설계)

  • Choi, Byeong-Yoon;Salcic, Zoran
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.3
    • /
    • pp.547-554
    • /
    • 2008
  • In this paper, two-stage pipelined floating-point multiplier (FP-MUL) is designed. The FP-MUL processor supports single precision multiplication for 3D graphic APIs, such as OpenGL and Direct3D and has area-efficient and low-latency architecture via saturated arithmetic, area-efficient sticky-bit generator, and flagged prefix adder. The FP-MUL has about 4-ns delay time under $0.13{\mu}m$ CMOS standard cell library and consists of about 7,500 gates. Because its maximum performance is about 250 MFLOPS, it can be applicable to mobile 3D graphics application.

A Study On the Design of a Floating Point Unit for MPEG-2 AAC Decoder (MPEG-2 AAC 복호기를 위한 부동소수점유닛 설계에 관한 연구)

  • 구대성;김필중;김종빈
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.4
    • /
    • pp.355-355
    • /
    • 2002
  • In this paper, we designed a FPU(floating point unit) that it is very important and requires of high density when digital audio is designed. Almost audio system must support the multi-channel and required for high quality. A floating point arithmetic function in MPEG-2 AAC that implemented by hardware is able to realtime decoding when DSP realization. The reason is that MPEG-2 AAC is compatible to the Audio field of MPEG-4 and afterwards. We designed a FPU by hardware to increase the speed of a floating point unit with much calculation part in the MPEG-2 AAC Decoder. A FPU is composed of a multiplier and an adder. A multiplier used the Radix-4 Booth algorithm and an adder adopted 1's complement method for speed up. A form of a floating point unit has 8bit of exponent part and 24bit of mantissa. It's compatible with the IEEE single precision format and adopted a pipeline architecture to increase the speed of a processor. All of sub blocks are based on ISO/IEC 13818-7 standard. The algorithm is tested by C language and the design does by use of VHDL(VHSIC Hardware Description Language). The maximum operation speed is 23.2MHz and the stable operation speed is 19MHz.