Acknowledgement
이 논문은 2022년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받은 기초연구사업(No. 2021R1I1A3048263 40%), 지차제-대학 협력기반 지역혁신 사업(2021RIS-004, 20%), 그리고 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원(No.2019-0-00001, 40%)을 받아 수행된 연구임
References
- R. Corda, D. Giusto, A. Liotta, W. Song, and C. Perra, "Recent Advances in the Processing and Rendering Algorithms for Computer-Generated Holography," Electronics, vol. 8, no. 5, 2019, doi: https://doi.org/10.3390/electronics8050556.
- T. Shimobaba and T. Ito, Computer Holography: Acceleration Algorithms and Hardware Implementations. CRC press, 2019.
- K. Matsushima, Introduction to Computer Holography: Creating Computer-Generated Holograms as the Ultimate 3D Image. Springer Nature, 2020.
- Y. Wang et al., "Hardware implementations of computer-generated holography: a review," Optical Engineering, vol. 59, no. 10, 2020. doi: https://doi.org/10.1117/1.Oe.59.10.102413.
- Y. Yuan, G. Xun, K. Jia, and A. Zhang, "A Multi-view Deep Learning Method for Epileptic Seizure Detection using Short-time Fourier Transform," presented at the Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics, 2017. doi: https://doi.org/10.1145/3107411.3107419.
- R. N. Bracewell, The Fourier transform and its applications. McGraw-hill New York, 1986.
- O. Yasuhito, E. Toshio, M. Naoya, and M. Satoshi, "An efficient, model-based CPU-GPU heterogeneous FFT library," presented at the 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008. doi: https://doi.org/10.1109/IPDPS.2008.4536163.
- L. Gu, J. Siegel, and X. Li, "Using GPUs to compute large out-of-card FFTs," presented at the Proceedings of the international conference on Supercomputing, Tucson, Arizona, USA, 2011. [Online]. doi: https://doi.org/10.1145/1995896.1995937.
- S. Chen and X. Li, "A hybrid GPU/CPU FFT library for large FFT problems", 2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC), 6-8 Dec. 2013. doi: https://doi.org/10.1109/PCCC.2013.6742796.
- A. Gholami, J. Hill, D. Malhotra, and G. Biros, "AccFFT: A library for distributed-memory FFT on CPU and GPU architectures," arXiv preprint arXiv:1506.07933, 2015.
- Z. Zhao and Y. Zhao, "The Optimization of FFT Algorithm Based with Parallel Computing on GPU," 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 12-14 Oct. 2018. doi: https://doi.org/10.1109/IAEAC.2018.8577843.
- D. Takahashi, Fast Fourier transform algorithms for parallel computers. Springer, 2019.
- H. Kang, J. Lee, and D. Kim, "HI-FFT: Heterogeneous Parallel In-Place Algorithm for Large-Scale 2D-FFT," IEEE Access, vol. 9, pp. 120261-120273, 2021. doi: https://doi.org/10.1109/ACCESS.2021.3108404.
- J. Lee, H. Kang, H.-j. Yeom, S. Cheon, J. Park, and D. Kim, "Out-of-core GPU 2D-shift-FFT algorithm for ultra-high-resolution hologram generation," Opt. Express, vol. 29, no. 12, pp. 19094-19112, 2021/06/07 2021. doi: https://doi.org/10.1364/OE.422266.
- D. Sharp, M. Stoyanov, S. Tomov, and J. Dongarra, "A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms," 2021 IEEE High Performance Extreme Computing Conference (HPEC), 20-24 Sept. 2021 2021, pp. 1-5. doi: https://doi.org/10.1109/HPEC49654.2021.9622811.
- A. Ayala, S. Tomov, M. Stoyanov, A. Haidar, and J. Dongarra, "Performance Analysis of Parallel FFT on Large Multi-GPU Systems," 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022. doi: https://doi.org/10.1109/IPDPSW55747.2022.00072.
- Intel® oneAPI Math Kernel Library, https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html#gs.i3mm29 (accessed Nov. 24, 2022).
- cuFFT :: CUDA Toolkit Documentation, https://docs.nvidia.com/cuda/cufft/index.html (accessed Nov. 24, 2022).
- rocFFT API, https://docs.amd.com/bundle/rocFFT-release-rocm-rel-5.2/page/library.html (accessed Nov. 24, 2022).
- M. Frigo and S. G. Johnson, "The Design and Implementation of FFTW3," Proceedings of the IEEE, vol. 93, no. 2, pp. 216-231, 2005. doi: https://doi.org/10.1109/JPROC.2004.840301.
- M. Frigo and S. G. Johnson, "FFTW: an adaptive software architecture for the FFT," Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 15-15 May 1998 1998, vol. 3, M. Frigo S. G. Johnson, pp. 1381-1384 vol.3. doi: https://doi.org/10.1109/ICASSP.1998.681704.
- S. Wang and P. Kanwar, "BFloat16: The secret to high performance on Cloud TPUs," Google Cloud Blog, vol. 4,
- N. Burgess, J. Milanovic, N. Stephens, K. Monachopoulos, and D. Mansell, "Bfloat16 Processing for Neural Networks," 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH), 2019. doi: https://doi.org/10.1109/ARITH.2019.00022.
- Y. Kikuchi, K. Fujita, T. Ichimura, M. Hori, and L. Maddegedara, "Calculation of Cross-correlation Function Accelerated by Tensor Cores with TensorFloat-32 Precision on Ampere GPU," Computational Science - ICCS 2022, (Lecture Notes in Computer Science, 2022, ch. Chapter 37, pp. 277-290. doi: https://doi.org/10.1007/978-3-031-08754-7_37.
- "IEEE Standard for Floating-Point Arithmetic," IEEE Std 754-2019 (Revision of IEEE 754-2008), pp. 1-84, 2019. doi: https://doi.org/10.1109/IEEESTD.2019.8766229.
- T. Shimobaba, N. Masuda, and T. Ito, "Simple and fast calculation algorithm for computer-generated hologram with wavefront recording plane," Opt. Lett., vol. 34, no. 20, pp. 3133-3135, 2009/10/15 2009. doi: https://doi.org/10.1364/OL.34.003133.
- M. Seznec, N. Gac, A. Ferrari, and F. Orieux, "A Study on Convolution using Half-Precision Floating-Point Numbers on GPU for Radio Astronomy Deconvolution," 2018 IEEE International Workshop on Signal Processing Systems (SiPS), 21-24 Oct. 2018 2018, pp. 170-175, doi: https://doi.org/10.1109/SiPS.2018.8598342.
- C. Maass, M. Baer, and M. Kachelriess, "CT image reconstruction with half precision floating-point values," Med Phys, vol. 38 Suppl 1, p. S95, Jul 2011. doi: https://doi.org/10.1118/1.3528218.
- N. M. Ho and W. F. Wong, "Exploiting half precision arithmetic in Nvidia GPUs," 2017 IEEE High Performance Extreme Computing Conference (HPEC), 12-14 Sept. 2017 2017, pp. 1-7. doi: https://doi.org/10.1109/HPEC.2017.8091072.
- P. Luszczek, J. Kurzak, I. Yamazaki, and J. Dongarra, "Towards numerical benchmark for half-precision floating point arithmetic," 2017 IEEE High Performance Extreme Computing Conference (HPEC), 12-14 Sept. 2017 2017, pp. 1-5. doi: https://doi.org/10.1109/HPEC.2017.8091031.
- A. Abdelfattah, S. Tomov, and J. Dongarra, "Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs," 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), 2019. doi: https://doi.org/10.1109/ScalA49573.2019.00008.
- A. Abdelfattah, S. Tomov, and J. Dongarra, "Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs," 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019. doi: https://doi.org/10.1109/IPDPS.2019.00022.
- D. Blinder and P. Schelkens, "Fast Low-Precision Computer-Generated Holography on GPU," Applied Sciences, vol. 11, no. 13, 2021. doi: https://doi.org/10.3390/app11136235.
- S. Markidis, S. W. D. Chien, E. Laure, I. B. Peng, and J. S. Vetter, "NVIDIA Tensor Core Programmability, Performance & Precision," 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018. doi: https://doi.org/10.1109/IPDPSW.2018.00091.
- L. Pisha and L. Ligowski, "Accelerating non-power-of-2 size Fourier transforms with GPU Tensor Cores," 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021. doi: https://doi.org/10.1109/IPDPS49936.2021.00059.
- A. Sorna, X. Cheng, E. D. Azevedo, K. Won, and S. Tomov, "Optimizing the Fast Fourier Transform Using Mixed Precision on Tensor Core Hardware," 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), 17-20 Dec. 2018 2018, pp. 3-7. doi: https://doi.org/10.1109/HiPCW.2018.8634417.
- B. Li, S. Cheng, and J. Lin, "tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores," 2021 IEEE International Conference on Cluster Computing (CLUSTER), 2021. doi: https://doi.org/10.1109/Cluster48925.2021.00035.
- S. Durrani et al., "Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles," 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2021. doi: https://doi.org/10.1109/PACT52795.2021.00032.
- D. H. Mugler, "The Centered Discrete Fourier Transform and a parallel implementation of the FFT," 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 22-27 May 2011 2011, pp. 1725-1728. doi: https://doi.org/10.1109/ICASSP.2011.5946834.
- J. G. Vargas-Rubio and B. Santhanam, "On the multiangle centered discrete fractional Fourier transform," IEEE Signal Processing Letters, vol. 12, no. 4, pp. 273-276, 2005. doi: https://doi.org/10.1109/lsp.2005.843762.
- M. Abdellah, "cufftShift: high performance CUDA-accelerated FFT-shift library," Proceedings of the High Performance Computing Symposium, Tampa, Florida, 2014.
- E. Agustsson and R. Timofte, "Ntire 2017 challenge on single image super-resolution: Dataset and study," Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 126-135. doi: https://doi.org/10.1109/CVPRW.2017.150.