Search | Korea Science

Fast Algorithms for Computing Floating-Point Reciprocal Cube Root Functions

Leonid Moroz;Volodymyr Samotyy;Cezary Walczyk
- International Journal of Computer Science & Network Security
- /
- v.23 no.6
- /
- pp.84-90
- /
- 2023
In this article the problem of computing floating-point reciprocal cube root functions is considered. Our new algorithms for this task decrease the number of arithmetic operations used for computing $1/{\sqrt[3]{x}}$. A new approach for selection of magic constants is presented in order to minimize the computation time for reciprocal cube roots of arguments with movable decimal point. The underlying theory enables partitioning of the base argument range x∈[1,8) into 3 segments, what in turn increases accuracy of initial function approximation and decreases the number of iterations to one. Three best algorithms were implemented and carefully tested on 32-bit microcontroller with ARM core. Their custom C implementations were favourable compared with the algorithm based on cbrtf(x) function taken from C ＜math.h＞ library on three different hardware platforms. As a result, the new fast approximation algorithm for the function $1/{\sqrt[3]{x}}$ was determined that outperforms all other algorithms in terms of computation time and cycle count.
https://doi.org/10.22937/IJCSNS.2023.23.6.10 인용 PDF

Design of a high-performance floating-point unit adopting a new divide/square root implementation (새로운 제산/제곱근기를 내장한 고성능 부동 소수점 유닛의 설계)

Lee, Tae-Young;Lee, Sung-Youn;Hong, In-Pyo;Lee, Yong-Surk
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.37 no.12
- /
- pp.79-90
- /
- 2000
In this paper, a high-performance floating point unit, which is suitable for high-performance superscalar microprocessors and supports IEEE 754 standard, is designed. Floating-point arithmetic unit (AU) supports all denormalized number processing through hardware, while eliminating the additional delay time due to the denormalized number processing by proposing the proposed gradual underflow prediction (GUP) scheme. Contrary to the existing fixed-radix implementations, floating-point divide/square root unit adopts a new architecture which determines variable length quotient bits per cycle. The new architecture is superior to the SRT implementations in terms of performance and design complexity. Moreover, sophisticated exception prediction scheme enables precise exception to be implemented with ease on various superscalar microprocessors, and removes the stall cycles in division. Designed floating-point AU and divide/square root unit are integrated with and instruction decoder, register file, memory model and multiplier to form a floating-point unit, and its function and performance is verified.
PDF

MLP Design Method Optimized for Hidden Neurons on FPGA (FPGA 상에서 은닉층 뉴런에 최적화된 MLP의 설계 방법)

Kyoung Dong-Wuk;Jung Kee-Chul
- The KIPS Transactions:PartB
- /
- v.13B no.4 s.107
- /
- pp.429-438
- /
- 2006
Neural Networks(NNs) are applied for solving a wide variety of nonlinear problems in several areas, such as image processing, pattern recognition etc. Although NN can be simulated by using software, many potential NN applications required real-time processing. Thus they need to be implemented as hardware. The hardware implementation of multi-layer perceptrons(MLPs) in several kind of NNs usually uses a fixed-point arithmetic due to a simple logic operation and a shorter processing time compared to the floating-point arithmetic. However, the fixed-point arithmetic-based MLP has a drawback which is not able to apply the MLP software that use floating-point arithmetic. We propose a design method for MLPs which has the floating-point arithmetic-based fully-pipelining architecture. It has a processing speed that is proportional to the number of the hidden nodes. The number of input and output nodes of MLPs are generally constrained by given problems, but the number of hidden nodes can be optimized by user experiences. Thus our design method is using optimized number of hidden nodes in order to improve the processing speed, especially in field of a repeated processing such as image processing, pattern recognition, etc.
https://doi.org/10.3745/KIPSTB.2006.13B.4.429 인용 PDF KSCI

A Variable Latency Goldschmidt's Floating Point Number Square Root Computation (가변 시간 골드스미트 부동소수점 제곱근 계산기)

Kim, Sung-Gi;Song, Hong-Bok;Cho, Gyeong-Yeon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.9 no.1
- /
- pp.188-198
- /
- 2005
The Goldschmidt iterative algorithm for finding a floating point square root calculated it by performing a fixed number of multiplications. In this paper, a variable latency Goldschmidt's square root algorithm is proposed, that performs multiplications a variable number of times until the error becomes smaller than a given value. To find the square root of a floating point number F, the algorithm repeats the following operations: $R_i=\frac{3-e_r-X_i}{2},\;X_{i+1}=X_i{\times}R^2_i,\;Y_{i+1}=Y_i{\times}R_i,\;i{\in}\{{0,1,2,{\ldots},n-1} }}'$with the initial value is $'\;X_0=Y_0=T^2{\times}F,\;T=\frac{1}{\sqrt {F}}+e_t\;'$. The bits to the right of p fractional bits in intermediate multiplication results are truncated, and this truncation error is less than $'e_r=2^{-p}'$. The value of p is 28 for the single precision floating point, and 58 for the doubel precision floating point. Let $'X_i=1{\pm}e_i'$, there is $'\;X_{i+1}=1-e_{i+1},\;where\;'\;e_{i+1}<\frac{3e^2_i}{4}{\mp}\frac{e^3_i}{4}+4e_{r}'$. If '|X_i-1|<2^{\frac{-p+2}{2}}\;'$ is true, $'\;e_{i+1}<8e_r\;'$ is less than the smallest number which is representable by floating point number. So, $\sqrt{F}$ is approximate to $'\;\frac{Y_{i+1}}{T}\;'$. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications per an operation is derived from many reciprocal square root tables ($T=\frac{1}{\sqrt{F}}+e_i$) with varying sizes. The superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a square root unit. Also, it can be used to construct optimized approximate reciprocal square root tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia, scientific computing, etc.
PDF KSCI

A Variable Latency Newton-Raphson's Floating Point Number Reciprocal Square Root Computation (가변 시간 뉴톤-랍손 부동소수점 역수 제곱근 계산기)

Kim Sung-Gi;Cho Gyeong-Yeon
- The KIPS Transactions:PartA
- /
- v.12A no.5 s.95
- /
- pp.413-420
- /
- 2005
The Newton-Raphson iterative algorithm for finding a floating point reciprocal square mot calculates it by performing a fixed number of multiplications. In this paper, a variable latency Newton-Raphson's reciprocal square root algorithm is proposed that performs multiplications a variable number of times until the error becomes smaller than a given value. To find the rediprocal square root of a floating point number F, the algorithm repeats the following operations: '$X_{i+1}=\frac{{X_i}(3-e_r-{FX_i}^2)}{2}$, $i\in{0,1,2,{\ldots}n-1}$' with the initial value is '$X_0=\frac{1}{\sqrt{F}}{\pm}e_0$'. The bits to the right of p fractional bits in intermediate multiplication results are truncated and this truncation error is less than '$e_r=2^{-p}$'. The value of p is 28 for the single precision floating point, and 58 for the double precision floating point. Let '$X_i=\frac{1}{\sqrt{F}}{\pm}e_i$, there is '$X_{i+1}=\frac{1}{\sqrt{F}}-e_{i+1}$, where '$e_{i+1}{<}\frac{3{\sqrt{F}}{{e_i}^2}}{2}{\mp}\frac{{Fe_i}^3}{2}+2e_r$'. If '$|\frac{\sqrt{3-e_r-{FX_i}^2}}{2}-1|<2^{\frac{\sqrt{-p}{2}}}$' is true, '$e_{i+1}<8e_r$' is less than the smallest number which is representable by floating point number. So, $X_{i+1}$ is approximate to '$\frac{1}{\sqrt{F}}$. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications Per an operation is derived from many reciprocal square root tables ($X_0=\frac{1}{\sqrt{F}}{\pm}e_0$) with varying sizes. The superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a reciprocal square root unit. Also, it can be used to construct optimized approximate reciprocal square root tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia, scientific computing, etc.
https://doi.org/10.3745/KIPSTA.2005.12A.5.413 인용 PDF KSCI

A Variable Latency Newton-Raphson's Floating Point Number Reciprocal Computation (가변 시간 뉴톤-랍손 부동소수점 역수 계산기)

Kim Sung-Gi;Cho Gyeong-Yeon
- The KIPS Transactions:PartA
- /
- v.12A no.2 s.92
- /
- pp.95-102
- /
- 2005
The Newton-Raphson iterative algorithm for finding a floating point reciprocal which is widely used for a floating point division, calculates the reciprocal by performing a fixed number of multiplications. In this paper, a variable latency Newton-Raphson's reciprocal algorithm is proposed that performs multiplications a variable number of times until the error becomes smaller than a given value. To find the reciprocal of a floating point number F, the algorithm repeats the following operations: '$'X_{i+1}=X=X_i*(2-e_r-F*X_i),\;i\in\{0,\;1,\;2,...n-1\}'$ with the initial value $'X_0=\frac{1}{F}{\pm}e_0'$. The bits to the right of p fractional bits in intermediate multiplication results are truncated, and this truncation error is less than $'e_r=2^{-p}'$. The value of p is 27 for the single precision floating point, and 57 for the double precision floating point. Let $'X_i=\frac{1}{F}+e_i{'}$, these is $'X_{i+1}=\frac{1}{F}-e_{i+1},\;where\;{'}e_{i+1}, is less than the smallest number which is representable by floating point number. So, $X_{i+1}$ is approximate to $'\frac{1}{F}{'}$. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications per an operation is derived from many reciprocal tables $(X_0=\frac{1}{F}{\pm}e_0)$ with varying sizes. The superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a reciprocal unit. Also, it can be used to construct optimized approximate reciprocal tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia scientific computing, etc.
https://doi.org/10.3745/KIPSTA.2005.12A.2.095 인용 PDF KSCI

An Efficient Median Filter Algorithm for Floating-point Images (부동소수점 형식 이미지를 위한 효율적인 중간값 필터 알고리즘)

Kim, Jin Wook
- Journal of IKEEE
- /
- v.26 no.2
- /
- pp.240-248
- /
- 2022
Floating-point images that express pixel information as real numbers are used in HDR images. There have been various researches on efficient median filter algorithms, but most of them are applicable to 8-bit depth images and there are only a few number of algorithms applicable to floating-point images, including Gil and Werman's algorithm. In this paper, we propose a median filter algorithm that works efficiently on floating-point images by improving Kim's algorithm, which improved Gil and Werman's algorithm. Experimental results show that the execution time is improved by about 10% compared to the Kim's algorithm by reducing the redundant work for the repetitively used binary search tree and applying the inverted index.
https://doi.org/10.7471/ikeee.2022.26.2.240 인용 PDF KSCI

A Variable Latency Goldschmidt's Floating Point Number Divider (가변 시간 골드스미트 부동소수점 나눗셈기)

Kim Sung-Gi;Song Hong-Bok;Cho Gyeong-Yeon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.9 no.2
- /
- pp.380-389
- /
- 2005
The Goldschmidt iterative algorithm for a floating point divide calculates it by performing a fixed number of multiplications. In this paper, a variable latency Goldschmidt's divide algorithm is proposed, that performs multiplications a variable number of times until the error becomes smaller than a given value. To calculate a floating point divide '$\frac{N}{F}$', multifly '$T=\frac{1}{F}+e_t$' to the denominator and the nominator, then it becomes ’$\frac{TN}{TF}=\frac{N_0}{F_0}$'. And the algorithm repeats the following operations: ’$R_i=(2-e_r-F_i),\;N_{i+1}=N_i{\ast}R_i,\;F_{i+1}=F_i{\ast}R_i$, i$\in${0,1,...n-1}'. The bits to the right of p fractional bits in intermediate multiplication results are truncated, and this truncation error is less than ‘$e_r=2^{-p}$'. The value of p is 29 for the single precision floating point, and 59 for the double precision floating point. Let ’$F_i=1+e_i$', there is $F_{i+1}=1-e_{i+1},\;e_{i+1}',\;where\;e_{i+1}, If '$[F_i-1]<2^{\frac{-p+3}{2}}$ is true, ’$e_{i+1}<16e_r$' is less than the smallest number which is representable by floating point number. So, ‘$N_{i+1}$ is approximate to ‘$\frac{N}{F}$'. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications per an operation is derived from many reciprocal tables ($T=\frac{1}{F}+e_t$) with varying sizes. 1'he superiority of this algorithm is proved by comparing this average number with the fixed number of multiplications of the conventional algorithm. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a divider. Also, it can be used to construct optimized approximate reciprocal tables. The results of this paper can be applied to many areas that utilize floating point numbers, such as digital signal processing, computer graphics, multimedia, scientific computing, etc
PDF KSCI

Error Corrected K'th order Goldschmidt's Floating Point Number Division (오차 교정 K차 골드스미트 부동소수점 나눗셈)

Cho, Gyeong-Yeon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.19 no.10
- /
- pp.2341-2349
- /
- 2015
The commonly used Goldschmidt's floating-point divider algorithm performs two multiplications in one iteration. In this paper, a tentative error corrected K'th Goldschmidt's floating-point number divider algorithm which performs K times multiplications in one iteration is proposed. Since the number of multiplications performed by the proposed algorithm is dependent on the input values, the average number of multiplications per an operation in single precision and double precision divider is derived from many reciprocal tables with varying sizes. In addition, an error correction algorithm, which consists of one multiplication and a decision, to get exact result in divider is proposed. Since the proposed algorithm only performs the multiplications until the error gets smaller than a given value, it can be used to improve the performance of a divider unit. Also, it can be used to construct optimized approximate reciprocal tables.
https://doi.org/10.6109/jkiice.2015.19.10.2341 인용 PDF KSCI KPUBS HTML

Floating Power Supply Based on Bootstrap Operation for Three-Level Neutral-Point-Clamped Voltage-Source Inverter

Nguyen, Qui Tu Vo;Lee, Dong-Choon
- Proceedings of the KIPE Conference
- /
- 2011.11a
- /
- pp.3-4
- /
- 2011
This paper presents a survey of floating power supply based on bootstrap operation for three-level voltage-source inverters. The floating power supply for upper switches is achieved by the bootstrap capacitor charged during on-time of the switch underneath. Hence, a large number of bulky isolated DC/DC power supplies for each gate driver are reduced. The Pspice simulation results show the behavior of bootstrap devices and the performance of bootstrap capacitor voltage.
PDF

Search Result 82, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)