Acknowledgement
이 논문은 대한민국 정부(과학기술정보통신부)의 재원으로 한국연구재단 슈퍼컴퓨터개발선도사업의 지원을 받아 수행된 연구임[과제번호: 2021M3H6A1017683].
References
- D.K. Panda et al., "The MVAPICH project: Transforming research into high-performance MPI library for HPC community," J. Comput. Sci., vol. 52, 2021, article no. 101208.
- K.S. Jin, S.M. Lee, and Y.C. Kim, "Adaptive and optimized agent placement scheme for parallel agent-based simulation," ETRI J., vol. 44, no. 2, 2021.
- B. Andjelkovic et al., "Grid-enabled parallel simulation based on parallel equation formulation," ETRI J., vol. 32, no. 4, 2010, pp. 555-565. https://doi.org/10.4218/etrij.10.0109.0197
- M . GAO et al., "Proteome-scale deployment of protein structure prediction workflows on the summit supercomputer," arXiv preprint, CoRR, 2022, arXiv: 2201.10024.
- A. Acharya et al., "Supercomputer-based ensemble docking drug discovery pipeline with application to COVID-19," J. Chem. Inf. Model., vol. 60, no. 12, 2020, pp. 5832-5852. https://doi.org/10.1021/acs.jcim.0c01010
- M. Tolstykh et al., "SL-AV model: Numerical weather prediction at extra-massively parallel supercomputer," in Russian Supercomputing Days, Springer, Cham, Switzerland, 2018, pp. 379-387.
- V . Khryashchev et al., "Comparison of different convolutional neural network architectures for satellite image segmentation," in Proc. Conf. Open Innov. Assoc. (FRUCT), (Bologna, Italy), Nov. 2018, pp. 172-179.
- M.P. KATZ et al., "Preparing nuclear astrophysics for exascale," in Proc. SC20: Int. Conf. High Perform. Comput., Netw., Storage Analysis (Atlanta, GA, USA), Nov. 2020, pp. 1-12.
- Wikipedia, Shared Memory, https://en.wikipedia.org/wiki/Shared_memory
- Wikipedia, Distributed Memory, https://en.wikipedia.org/wiki/Distributed_memory
- Wikipedia, Distributed Shared Memory, https://en.wikipedia.org/wiki/Distributed_shared_memory
- Z. Huang et al., "VODCA: View-oriented, distributed, cluster-based approach to parallel computing," in Proc. IEEE Int. Symp. Cluster Comput. the Grid (CCGRID'06), (Singapore, Singapore), May 2006.
- Argonne, MPICH: A High-Performance, Portable Implementation of MPI, https://www.anl.gov/mcs/mpich-a-highperformance-portable-implementation-of-mpi
- The Open MPI Project, Open MPI: Open Source High Performance Computing, https://www.open-mpi.org/
- The Ohio State University, MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE, http://mvapich.cse.ohio-state.edu/
- IBM, IBM Spectrum MPI, https://www.ibm.com/products/spectrum-mpi?utm_content=SRCWW&p1=Search&p4=43700067987454012&p5=p&gclid=Cj0KCQiAr5iQBhCsARIsAPcwROMfviu3UCQI4w4tdjcY6gF9AzywHVCsqODz2ZBdV-RxaIcASCobBfMaAhPMEALw_wcB&gclsrc=aw.ds
- Microsoft, Microsoft MPI, https://docs.microsoft.com/en-us/message-passing-interface/microsoft-mpi
- Intel, Intel MPI Library, https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpilibrary.html#gs.pcbhj9
- S.H. Cho and Y.H. Kim, "A fast transmission of mobile agents using binomial trees," The KIPS Trans.: Part A, vol. 9A, no. 3, 2002, pp. 341-350. https://doi.org/10.3745/KIPSTA.2002.9A.3.341
- H. Zhao and J. Canny, "Butterfly mixing: Accelerating incremental-update algorithms on clusters," in Proceedings of the 2013 SIAM International Conference on Data Mining, SIAM, Philadelphia, PA, USA, 2013, pp. 785-793.
- 이정희, 한동수, "패킷단위 병렬데이터 전송을 통한 MPICH-G2 집합통신 성능 향상," 한국정보과학회, 가을 학술발표논문집, 제30권 제2호, 2003.
- R. Thakur et al., "Optimization of collective communication operations in MPICH," Int. J. High Perform. Comput. Appl., vol. 19, no. 1, 2005, pp. 49-66. https://doi.org/10.1177/1094342005051521
- M. Chaarawi et al., "A tool for optimizing runtime parameters of Open MPI," in Recent Advances in Parallel Virtual Machine and Message Passing Interface, vol. 5205, Springer, Berlin, Heidelberg, Germany, 2008, pp. 210-217.
- E. Nuriyev and A. Lastovetsky, "Accurate runtime selection of optimal MPI collective algorithms using analytical performance modelling," arXiv preprint, 2020, CoRR, arXiv: 2004.11062.
- J. Pjesivac-Grbovic et al., "MPI collective algorithm selection and quadtree encoding," Parallel Comput. vol. 33, no. 9, 2007, pp. 613-623. https://doi.org/10.1016/j.parco.2007.06.005
- S. Hunold et al., "Predicting MPI collective communication performance using machine learning," in Proc. IEEE Int. Conf. Clust. Comput. (CLUSTER), (Kobe, Japan), Sept. 2020.
- J.M. Hashmi et al., "Design and characterization of shared address space MPI collectives on modern architectures," in Proc. IEEE/ACM Int. Symp. Clust., Cloud Grid Comput. (CCGRID), (Larnaca, Cyprus), May 2019.
- Google, Xpmem: Cross-process Memory Mapping, 2011, https://code.google.com/archive/p/xpmem/
- Google, Google Code Archive XPMEM, https://code.google.com/archive/p/xpmem/
- S. Chakraborty et al., "SHMEMPMI--Shared memory based PMI for improved performance and scalability," in Proc. IEEE/ACM Int. Symp. Clust., Cloud Grid Comput. (CCGrid), (Cartagena, Colombia), May 2016.
- P. Balaji et al., "PMI: A scalable parallel process-management interface for extreme-scale systems," in European MPI Users' Group Meeting, Springer, Berlin, Heidelberg, Germany, 2010, pp. 31-41.
- R.L. Graham et al., "Scalable hierarchical aggregation protocol (SHArP): A hardware architecture for efficient data reduction," in Proc. Int. Workshop Commun. Optim. HPC (COMHPC), (Salt Lake City, UT, USA), Nov. 2016.
- NVIDIA, NVIDIA Mellanox Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), https://docs.nvidia.com/networking/display/sharpv214
- J. Stern et al., "Accelerating MPI_Reduce with FPGAs in the Network," Proc Workshop on Exascale MPI. 2017.
- P. Haghi et al., "FPGAs in the network and novel communicator support accelerate MPI collectives," in Proc. IEEE High Perform. Extreme Comput. Conf. (HPEC), (Waltham, MA, USA), Sept. 2020.
- Mvapich, OSU Collective MPI Benchmarks, http://mvapich.cse.ohio-state.edu/benchmarks/
- S. Kumar et al., "Optimization of MPI collective operations on the IBM Blue Gene/Q supercomputer," Int. J. High Perform. Comput. Appl., vol. 28, no. 4, 2014, pp. 450-464. https://doi.org/10.1177/1094342014552086
- J. Liu, A.R. Mamidala, and D.K. Panda, "Fast and scalable MPI-level broadcast using InfiniBand's hardware multicast support," in Proc. Int. Parallel Distrib. Process. Symp., (Santa Fe, NM, USA), Apr. 2004.
- T. Hoefler, C. Siebert, and W. Rehm, "A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast," in Proc. Int. Symp. Parallel Distrib. Process. Symp., (Long Beach, CA, USA), Mar. 2007, pp. 1-8.
- S. Aga et al., "Compute caches," in Proc. IEEE Int. Symp. High Perform. Comput. Archit. (HPCA), (Austin, TX, USA), Feb. 2017.
- S. Jung et al., "A crossbar array of magnetoresistive memory devices for in-memory computing," Nature, vol. 601, 2022, pp. 211-216. https://doi.org/10.1038/s41586-021-04196-6
- J. Huang et al., "Active-routing: Compute on the way for near-data processing," in Proc. IEEE Int. Symp. High Perform. Comput. Archit. (HPCA), (Washington, DC, USA), Feb. 2019.
- M. Torabzadehkashi et al., "Catalina: In-storage processing acceleration for scalable big data analytics," in Proc. Euromicro Int. Conf. Parallel, Distrib. Netw.-Based Process. (PDP), (Pavia, Italy), Feb. 2019.
- Github, Faiss, https://github.com/facebookresearch/faiss
- Texmex, Datasets for approximate nearest neighbor search, http://corpus-texmex.irisa.fr
- S.W. Jun et al., "Bluedbm: An appliance for big data analytics," in Proc. ACM/IEEE Annu. Int. Symp. Comput. Archit. (ISCA), (Portland, OR, USA), June 2015.
- B. GU et al., "Biscuit: A framework for near-data processing of big data workloads," ACM SIGARCH Comput. Archit. News, 2016, vol. 44, no. 3, pp. 153-165. https://doi.org/10.1145/3007787.3001154
- S.C. Kim et al., "In-storage processing of database scans and joins," Inf. Sci., vol. 327, 2016, pp. 183-200. https://doi.org/10.1016/j.ins.2015.07.056
- SoCs, MPSoCs & RFSoCs, https://www.xilinx.com/products/silicon-devices/soc.html
- Hadoop, Hadoop MapReduce, https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
- 김선영 외, "CCIX 연결망과 메모리 확장기술 동향," 전자통신동향 분석, 제37권 제1호, 2022, pp. 42-52. https://doi.org/10.22648/ETRI.2022.J.370105