DOI QR코드

DOI QR Code

Quality Analysis on Computer Generated Hologram Depending on the Precision on Diffraction Computation

회절연산 정밀도에 따른 CGH 기반 홀로그램 생성 품질 분석

  • 이재홍 (한국기술교육대학교 컴퓨터공학과) ;
  • 김덕수 (한국기술교육대학교 컴퓨터공학과)
  • Received : 2022.12.02
  • Accepted : 2023.01.13
  • Published : 2023.01.30

Abstract

Computer-generated holography requires much more computation costs and memory space rather than image processing. We implemented the diffraction calculation with low-precision and mixed-precision floating point numbers and compared the processing time and quality of the hologram with various precision. We compared diffraction quality with double, single and bfloat16 precision. bfloat16 shows 5.94x and 1.52x times faster performance than double precision and single precision. Also, bfloat16 shows lower PSNR and SSIM and higher MSE than other precision. However, there is no significant effect on reconstructed images. These results show low precision, like bfloat16, can be utilized for computer-generated holography.

컴퓨터 생성 홀로그래피는 일반 이미지에 비해 연산 부하와 메모리 요구량이 크다. 본 논문은 정밀도를 낮추어 연산속도를 높이는저정밀도(low-precision) 및 혼합정밀도(mixed precision) 연산 방법을 회절연산에 적용하여, 정밀도에 따른 홀로그램의 생성 속도와 품질의 변화를 분석한다. 본 논문은 배정밀도, 단정밀도, bfloat16 정밀도에서의 회전 연산을 비교하였으며, bfloat16의 회절연산의 속도가 배정밀도에 비해 최대 5.94배, 단정밀도에 비해 1.52배 빠른 것을 확인하였다. 또한, MSE, PSNR, SSIM을 기준으로 회절 연산의오차를 측정하였으며, 정밀도가 낮아질수록 홀로그램 품질이 낮아지는 것을 확인했다. 하지만, 정성적인 이미지 품질에는 유의미한 영향이 없는 것을 확인했다. 이러한 결과는, bfloat16등 낮은 정밀도 연산의 홀로그램 연산으로의 적용 가능성을 보여준다.

Keywords

Acknowledgement

이 논문은 2022년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받은 기초연구사업(No. 2021R1I1A3048263 40%), 지차제-대학 협력기반 지역혁신 사업(2021RIS-004, 20%), 그리고 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원(No.2019-0-00001, 40%)을 받아 수행된 연구임

References

  1. R. Corda, D. Giusto, A. Liotta, W. Song, and C. Perra, "Recent Advances in the Processing and Rendering Algorithms for Computer-Generated Holography," Electronics, vol. 8, no. 5, 2019, doi: https://doi.org/10.3390/electronics8050556.
  2. T. Shimobaba and T. Ito, Computer Holography: Acceleration Algorithms and Hardware Implementations. CRC press, 2019.
  3. K. Matsushima, Introduction to Computer Holography: Creating Computer-Generated Holograms as the Ultimate 3D Image. Springer Nature, 2020.
  4. Y. Wang et al., "Hardware implementations of computer-generated holography: a review," Optical Engineering, vol. 59, no. 10, 2020. doi: https://doi.org/10.1117/1.Oe.59.10.102413.
  5. Y. Yuan, G. Xun, K. Jia, and A. Zhang, "A Multi-view Deep Learning Method for Epileptic Seizure Detection using Short-time Fourier Transform," presented at the Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics, 2017. doi: https://doi.org/10.1145/3107411.3107419.
  6. R. N. Bracewell, The Fourier transform and its applications. McGraw-hill New York, 1986.
  7. O. Yasuhito, E. Toshio, M. Naoya, and M. Satoshi, "An efficient, model-based CPU-GPU heterogeneous FFT library," presented at the 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008. doi: https://doi.org/10.1109/IPDPS.2008.4536163.
  8. L. Gu, J. Siegel, and X. Li, "Using GPUs to compute large out-of-card FFTs," presented at the Proceedings of the international conference on Supercomputing, Tucson, Arizona, USA, 2011. [Online]. doi: https://doi.org/10.1145/1995896.1995937.
  9. S. Chen and X. Li, "A hybrid GPU/CPU FFT library for large FFT problems", 2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC), 6-8 Dec. 2013. doi: https://doi.org/10.1109/PCCC.2013.6742796.
  10. A. Gholami, J. Hill, D. Malhotra, and G. Biros, "AccFFT: A library for distributed-memory FFT on CPU and GPU architectures," arXiv preprint arXiv:1506.07933, 2015.
  11. Z. Zhao and Y. Zhao, "The Optimization of FFT Algorithm Based with Parallel Computing on GPU," 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 12-14 Oct. 2018. doi: https://doi.org/10.1109/IAEAC.2018.8577843.
  12. D. Takahashi, Fast Fourier transform algorithms for parallel computers. Springer, 2019.
  13. H. Kang, J. Lee, and D. Kim, "HI-FFT: Heterogeneous Parallel In-Place Algorithm for Large-Scale 2D-FFT," IEEE Access, vol. 9, pp. 120261-120273, 2021. doi: https://doi.org/10.1109/ACCESS.2021.3108404.
  14. J. Lee, H. Kang, H.-j. Yeom, S. Cheon, J. Park, and D. Kim, "Out-of-core GPU 2D-shift-FFT algorithm for ultra-high-resolution hologram generation," Opt. Express, vol. 29, no. 12, pp. 19094-19112, 2021/06/07 2021. doi: https://doi.org/10.1364/OE.422266.
  15. D. Sharp, M. Stoyanov, S. Tomov, and J. Dongarra, "A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms," 2021 IEEE High Performance Extreme Computing Conference (HPEC), 20-24 Sept. 2021 2021, pp. 1-5. doi: https://doi.org/10.1109/HPEC49654.2021.9622811.
  16. A. Ayala, S. Tomov, M. Stoyanov, A. Haidar, and J. Dongarra, "Performance Analysis of Parallel FFT on Large Multi-GPU Systems," 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022. doi: https://doi.org/10.1109/IPDPSW55747.2022.00072.
  17. Intel® oneAPI Math Kernel Library, https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html#gs.i3mm29 (accessed Nov. 24, 2022).
  18. cuFFT :: CUDA Toolkit Documentation, https://docs.nvidia.com/cuda/cufft/index.html (accessed Nov. 24, 2022).
  19. rocFFT API, https://docs.amd.com/bundle/rocFFT-release-rocm-rel-5.2/page/library.html (accessed Nov. 24, 2022).
  20. M. Frigo and S. G. Johnson, "The Design and Implementation of FFTW3," Proceedings of the IEEE, vol. 93, no. 2, pp. 216-231, 2005. doi: https://doi.org/10.1109/JPROC.2004.840301.
  21. M. Frigo and S. G. Johnson, "FFTW: an adaptive software architecture for the FFT," Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 15-15 May 1998 1998, vol. 3, M. Frigo S. G. Johnson, pp. 1381-1384 vol.3. doi: https://doi.org/10.1109/ICASSP.1998.681704.
  22. S. Wang and P. Kanwar, "BFloat16: The secret to high performance on Cloud TPUs," Google Cloud Blog, vol. 4,
  23. N. Burgess, J. Milanovic, N. Stephens, K. Monachopoulos, and D. Mansell, "Bfloat16 Processing for Neural Networks," 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH), 2019. doi: https://doi.org/10.1109/ARITH.2019.00022.
  24. Y. Kikuchi, K. Fujita, T. Ichimura, M. Hori, and L. Maddegedara, "Calculation of Cross-correlation Function Accelerated by Tensor Cores with TensorFloat-32 Precision on Ampere GPU," Computational Science - ICCS 2022, (Lecture Notes in Computer Science, 2022, ch. Chapter 37, pp. 277-290. doi: https://doi.org/10.1007/978-3-031-08754-7_37.
  25. "IEEE Standard for Floating-Point Arithmetic," IEEE Std 754-2019 (Revision of IEEE 754-2008), pp. 1-84, 2019. doi: https://doi.org/10.1109/IEEESTD.2019.8766229.
  26. T. Shimobaba, N. Masuda, and T. Ito, "Simple and fast calculation algorithm for computer-generated hologram with wavefront recording plane," Opt. Lett., vol. 34, no. 20, pp. 3133-3135, 2009/10/15 2009. doi: https://doi.org/10.1364/OL.34.003133.
  27. M. Seznec, N. Gac, A. Ferrari, and F. Orieux, "A Study on Convolution using Half-Precision Floating-Point Numbers on GPU for Radio Astronomy Deconvolution," 2018 IEEE International Workshop on Signal Processing Systems (SiPS), 21-24 Oct. 2018 2018, pp. 170-175, doi: https://doi.org/10.1109/SiPS.2018.8598342.
  28. C. Maass, M. Baer, and M. Kachelriess, "CT image reconstruction with half precision floating-point values," Med Phys, vol. 38 Suppl 1, p. S95, Jul 2011. doi: https://doi.org/10.1118/1.3528218.
  29. N. M. Ho and W. F. Wong, "Exploiting half precision arithmetic in Nvidia GPUs," 2017 IEEE High Performance Extreme Computing Conference (HPEC), 12-14 Sept. 2017 2017, pp. 1-7. doi: https://doi.org/10.1109/HPEC.2017.8091072.
  30. P. Luszczek, J. Kurzak, I. Yamazaki, and J. Dongarra, "Towards numerical benchmark for half-precision floating point arithmetic," 2017 IEEE High Performance Extreme Computing Conference (HPEC), 12-14 Sept. 2017 2017, pp. 1-5. doi: https://doi.org/10.1109/HPEC.2017.8091031.
  31. A. Abdelfattah, S. Tomov, and J. Dongarra, "Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs," 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), 2019. doi: https://doi.org/10.1109/ScalA49573.2019.00008.
  32. A. Abdelfattah, S. Tomov, and J. Dongarra, "Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs," 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019. doi: https://doi.org/10.1109/IPDPS.2019.00022.
  33. D. Blinder and P. Schelkens, "Fast Low-Precision Computer-Generated Holography on GPU," Applied Sciences, vol. 11, no. 13, 2021. doi: https://doi.org/10.3390/app11136235.
  34. S. Markidis, S. W. D. Chien, E. Laure, I. B. Peng, and J. S. Vetter, "NVIDIA Tensor Core Programmability, Performance & Precision," 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018. doi: https://doi.org/10.1109/IPDPSW.2018.00091.
  35. L. Pisha and L. Ligowski, "Accelerating non-power-of-2 size Fourier transforms with GPU Tensor Cores," 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021. doi: https://doi.org/10.1109/IPDPS49936.2021.00059.
  36. A. Sorna, X. Cheng, E. D. Azevedo, K. Won, and S. Tomov, "Optimizing the Fast Fourier Transform Using Mixed Precision on Tensor Core Hardware," 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), 17-20 Dec. 2018 2018, pp. 3-7. doi: https://doi.org/10.1109/HiPCW.2018.8634417.
  37. B. Li, S. Cheng, and J. Lin, "tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores," 2021 IEEE International Conference on Cluster Computing (CLUSTER), 2021. doi: https://doi.org/10.1109/Cluster48925.2021.00035.
  38. S. Durrani et al., "Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles," 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2021. doi: https://doi.org/10.1109/PACT52795.2021.00032.
  39. D. H. Mugler, "The Centered Discrete Fourier Transform and a parallel implementation of the FFT," 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 22-27 May 2011 2011, pp. 1725-1728. doi: https://doi.org/10.1109/ICASSP.2011.5946834.
  40. J. G. Vargas-Rubio and B. Santhanam, "On the multiangle centered discrete fractional Fourier transform," IEEE Signal Processing Letters, vol. 12, no. 4, pp. 273-276, 2005. doi: https://doi.org/10.1109/lsp.2005.843762.
  41. M. Abdellah, "cufftShift: high performance CUDA-accelerated FFT-shift library," Proceedings of the High Performance Computing Symposium, Tampa, Florida, 2014.
  42. E. Agustsson and R. Timofte, "Ntire 2017 challenge on single image super-resolution: Dataset and study," Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 126-135. doi: https://doi.org/10.1109/CVPRW.2017.150.