DOI QR코드

DOI QR Code

제온 파이 x200 프로세서를 이용한 3차원 음향 파동 전파 모델링 병렬 연산 성능 비교

Comparison of Parallel Computation Performances for 3D Wave Propagation Modeling using a Xeon Phi x200 Processor

  • 이종우 (부경대학교 에너지자원공학과) ;
  • 하완수 (부경대학교 에너지자원공학과)
  • Lee, Jongwoo (Department of Energy Resources Engineering, Pukyong National University) ;
  • Ha, Wansoo (Department of Energy Resources Engineering, Pukyong National University)
  • 투고 : 2018.07.31
  • 심사 : 2018.09.11
  • 발행 : 2018.11.30

초록

본 연구에서는 제온 파이 x200 프로세서를 이용하여 3차원 파동 전파 모델링을 수행하고 기존의 제온 CPU를 사용한 경우와 병렬 연산 성능을 비교하였다. 제온 파이 1세대 프로세서인 제온 파이 나이츠 코너 보조프로세서와 달리 제온 파이 2세대 프로세서인 x200 프로세서는 직접 운영체제 실행이 가능하므로 내장 메모리와 주메모리 사이의 추가적인 통신이 필요 없다. 또한 제온 파이 x200 프로세서는 대용량 주메모리와 고대역폭 메모리를 이용하여 대규모 컴퓨팅을 독립적으로 실행할 수 있다. 병렬 연산 성능 비교를 위해 MPI (Message Passing Interface)와 OpenMP (Open Multi-Processing)를 이용해 모델링을 수행하였다. SEG/EAGE 암염돔 모델을 이용한 수치 실험 결과 제온 파이에서 다량의 연산 코어와 고대역폭 메모리를 이용해 12 코어 CPU 대비 2.69 ~ 3.24배 우수한 모델링 성능을 얻을 수 있었다.

In this study, we simulated 3D wave propagation modeling using a Xeon Phi x200 processor and compared the parallel computation performance with that using a Xeon CPU. Unlike the 1st generation Xeon Phi coprocessor codenamed Knights Corner, the 2nd generation x200 Xeon Phi processor requires no additional communication between the internal memory and the main memory since it can run an operating system directly. The Xeon Phi x200 processor can run large-scale computation independently, with the large main memory and the high-bandwidth memory. For comparison of parallel computation, we performed the modeling using the MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) libraries. Numerical examples using the SEG/EAGE salt model demonstrated that we can achieve 2.69 to 3.24 times faster modeling performance using the Xeon Phi with a large number of computational cores and high-bandwidth memory compared to that using the 12-core CPU.

키워드

MRTSBC_2018_v21n4_213_f0001.png 이미지

Fig. 1. Block diagram of a tile (Jeffers et al., 2016, used with permission).

MRTSBC_2018_v21n4_213_f0002.png 이미지

Fig. 2. Block diagram showing overview of Xeon Phi x200 Architecture (Jeffers et al., 2016, used with permission).

MRTSBC_2018_v21n4_213_f0003.png 이미지

Fig. 3. A time-domain modeling algorithm.

MRTSBC_2018_v21n4_213_f0004.png 이미지

Fig. 4. Parallelization using MPI processes (left) and OpenMP threads (right). Each number of star shows the rank of a process who performs a shot simulation. Each number on a grid shows the ID of a thread who calculates the wavefield on each grid block.

MRTSBC_2018_v21n4_213_f0005.png 이미지

Fig. 5. Speed-ups using OpenMP with respect to the calculation times using one CPU core.

Table 1. Comparison of calculation times depending on the number of OpenMP threads, precision and order of FDM (s).

MRTSBC_2018_v21n4_213_t0001.png 이미지

Table 2. Calculation times without high bandwidth memory on the Xeon Phi processor.

MRTSBC_2018_v21n4_213_t0002.png 이미지

Table 3. Calculation times using both MPI and OpenMP.

MRTSBC_2018_v21n4_213_t0003.png 이미지

참고문헌

  1. Abdelkhalek, R., Calandra, H., Coulaud, O., Roman, J., and Latu, G., 2009, Fast seismic modeling and reverse time migration on a GPU cluster, 2009 International Conference on High Performance Computing & Simulation, 36-43.
  2. Heinecke, A., Breuer, A., Bader, M., and Dubey, P., 2016, High order seismic simulations on the intel Xeon Phi processor (Knights Landing). 2016 International Conference on High Performance Computing & Simulation, 343-362.
  3. Intel, 2018, https://ark.intel.com/compare/81908,94033 (August 20, 2018 Accessed)
  4. Jeffers, J., Reinders, J., and Sodani, A., 2016, Intel Xeon Phi processor high performance programming: Knights Landing edition, Morgan Kaufmann, 3-145.
  5. Jo, S.-H., and Ha, W., 2018, 3D time-domain wave propagation modeling using high-performance Python libraries, J. Korea Inst. Mineral Mining Eng., 55(3), 213-218 (in Korean with English abstract).
  6. Kim, A., Ryu, D., and Ha, W., 2016, Time-domain 3D wave propagation modeling and memory management using graphics processing units, Geophys. and Geophys. Explor., 19(3), 145-152 (in Korean with English abstract). https://doi.org/10.7582/GGE.2016.19.3.145
  7. Lu, L., Renwei, D., Hongwei, L., and Hong, L., 2015, 3D hybrid-domain full waveform inversion on GPU, Comput. Geosci., 83, 27-36. https://doi.org/10.1016/j.cageo.2015.06.017
  8. Min, D.-J., Pyun, S., Ha, W., Kwak, S., Chung, W., and Shin, C., 2016, Numerical Analysis for Geophysics, CIR, Seoul, Korea, 37-52.
  9. Pacheco, P., 2011, An introduction to parallel programming, Morgan Kaufmann, 15-82.
  10. Reinders, J., and Jeffers, J., 2015, High performance parallelism pearls: Multicore and Many-core Programming Approaches, Morgan Kaufmann, 377-396.
  11. Rodriguez, S., Farre, P., Rosas, C., and Hanzich, M., 2017, Evaluating directive-based programming models on Wave Propagation Kernels, 79th EAGE Conference and Exhibition 2017-Workshops, Paris, France.
  12. Ryu, D., Jo, S. H., and Ha, W., 2017, Parallelizing 3D frequencydomain acoustic wave propagation modeling using a Xeon Phi coprocessor, Geophys. and Geophys. Explor., 20(3), 129-136 (in Korean with English abstract). https://doi.org/10.7582/GGE.2017.20.3.129
  13. Sourouri, M., and Birger Raknes, E., 2017, Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors, 19th EGU General Assembly Conference Abstracts, 19, 7790p.
  14. Tobin, J., Breuer, A., Heinecke, A., Yount, C., and Cui, Y., 2017, Accelerating seismic simulations using the Intel Xeon Phi knights landing processor, 2017 International Supercomputing Conference, High Performance Computing, 139- 157.
  15. Wikipedia, 2018, https://en.wikipedia.org/wiki/Stencil_code (July 30, 2018 Accessed)