Search | Korea Science

Jaehong Lee;Duksu Kim
- Journal of Broadcast Engineering
- /
- v.28 no.1
- /
- pp.21-30
- /
- 2023
Computer-generated holography requires much more computation costs and memory space rather than image processing. We implemented the diffraction calculation with low-precision and mixed-precision floating point numbers and compared the processing time and quality of the hologram with various precision. We compared diffraction quality with double, single and bfloat16 precision. bfloat16 shows 5.94x and 1.52x times faster performance than double precision and single precision. Also, bfloat16 shows lower PSNR and SSIM and higher MSE than other precision. However, there is no significant effect on reconstructed images. These results show low precision, like bfloat16, can be utilized for computer-generated holography.
https://doi.org/10.5909/JBE.2023.28.1.21 인용 PDF

Kim, H.J.;Han, J.H.;Kwon, Y.S.
- Electronics and Telecommunications Trends
- /
- v.37 no.1
- /
- pp.53-62
- /
- 2022
With increasing size of transformer-based neural networks, a light-weight algorithm and efficient AI accelerator has been developed to train these huge networks in practical design time. In this article, we present a survey of state-of-the-art research on the low-precision computational algorithms especially for floating-point formats and their hardware accelerator. We describe the trends by focusing on the work of two leading research groups-IBM and Seoul National University-which have deep knowledge in both AI algorithm and hardware architecture. For the low-precision algorithm, we summarize two efficient floating-point formats (hybrid FP8 and radix-4 FP4) with accuracy-preserving algorithms for training on the main research stream. Moreover, we describe the AI processor architecture supporting the low-bit mixed precision computing unit including the integer engine.
https://doi.org/10.22648/ETRI.2022.J.370106 인용 PDF

이종남;박종화;신경욱
- Proceedings of the IEEK Conference
- /
- 2000.06b
- /
- pp.149-152
- /
- 2000
A dual-mode multiplier (DMM) that performs single- and double-precision multiplications has been designed. An algorithm for efficiently implementing double-precision multiplication with a single-precision multiplier was proposed, which is based on partitioning double-precision multiplication into four single-precision sub-multiplications and computing them with sequential accumulations. When compared with conventional double-precision multipliers, our approach reduces the hardware complexity by about one third resulting in small silicon area and low-power dissipation at the expense of increased latency and throughput cycles.
PDF