DOI QR코드

DOI QR Code

Accelerating the Sweep3D for a Graphic Processor Unit

  • Gong, Chunye (Dept. of Computer Sciences, National University of Defense Technology) ;
  • Liu, Jie (Dept. of Computer Sciences, National University of Defense Technology) ;
  • Chen, Haitao (Dept. of Computer Sciences, National University of Defense Technology) ;
  • Xie, Jing (School of Information, XI'AN University of Finance and Economics) ;
  • Gong, Zhenghu (Dept. of Computer Sciences, National University of Defense Technology)
  • Received : 2010.09.30
  • Accepted : 2011.02.22
  • Published : 2011.03.31

Abstract

As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer a great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI application. The wavefront process for parallel computation in Sweep3D limits the concurrent threads on the GPU. In this paper, we present multi-dimensional optimization methods for Sweep3D, which can be efficiently implemented on the finegrained parallel architecture of the GPU. Our results show that the overall performance of Sweep3D on the CPU-GPU hybrid platform can be improved up to 4.38 times as compared to the CPU-based implementation.

Keywords

References

  1. H. Nguyen, “GPU Gems 3,” Addison Wesley, 2007.
  2. D. Kirk, “Innovation in graphics technology,” Talk in Canadian Undergraduate Technology Conference, 2004.
  3. AMD Corporation, “ATI Radeon HD 5870 Feature Summary,” http://www.amd.com/, 2010.
  4. NVIDIA Corporation, “CUDA Programming Guide Version 3.1,” 2010.
  5. AMD Corporation, “ATI Stream Computing User Guide Version 2.0,” 2010.
  6. A. Munshi, “The OpenCL Specification Version: 1.1,” Khronos OpenCL Working Group, 2010.
  7. NVIDIA Corporation, “Vertical solutions on CUDA,” http://www.nvidia.com/object/vertical solutions.html, 2010.
  8. M.M. Mathis, N. Amato, M. Adams, W. Zhao, “A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations,” Proc. ACM Int. Conf. Supercomputing, 2000, pp.255-263.
  9. A. Hoisie, O. Lubeck, H. Wasserman, “Scalability analysis of multidimensional wavefront algorithms on large-scale SMP clusters,” The 7th Symposium on the Frontiers of Massively Parallel Computation, 1999, pp.4-15.
  10. A. Hoisie, O. Lubeck, H. Wasserman, “Performance and scalability analysis of teraflop- scale parallel architectures using multidimensional wavefront applications,” International Journal of High Performance Computing Applications, Vol.14, No.4, 2000, pp.330-346. https://doi.org/10.1177/109434200001400405
  11. The Los Alamos National Laboratory, “Sweep3D,” http://wwwc3.lanl.gov/pal/software/sweep3d/, 2010.
  12. K. Davis, A. Hoisie, G. Johnson, D.J. Kerbyson, M. Lang, M. Pakin, F. Petrini, “A Performance and Scalability Analysis of the BlueGene/L Architecture,” Proceedings of the 2004 ACM/IEEE conference on Supercomputing, 2004, pp.41-50.
  13. K.J. Barker, K. Davis, A. Hoisie, D.J. Kerbyson, M. Lang, S. Pakin, J.C. Sancho, “Entering the petaflop era: the architecture and performance of Roadrunner,” Proceedings of the 2008 ACM/IEEE conference on Supercomputing, 2008, pp.1-11.
  14. E.E. Lewis, W.F. Miller, “Computational Methods of Neutron Transport,” American Nuclear Society, LaGrange Park, 1993.
  15. K. Koch, R. Baker, R. Alcouffe, “Solution of the First-Order Form of Three-Dimensional Discrete Ordinates Equations on a Massively Parallel Machine,” Transactions of American Nuclear Society, V65, 1992, pp.198-199.
  16. M.M Mathis, D.J. Kerbyson, “A General Performance Model of structured and Unstructured Mesh Particle Transport Computations,” Journal of Supercomputing, Vol.34, 2005, pp.181-199. https://doi.org/10.1007/s11227-005-2339-8
  17. D.J. Kerbyson, A. Hoisie, “Analysis of Wavefront Algorithms on Large-scale Two level Heterogeneous Processing Systems,” Workshop on Unique Chips and Systems, 2006, pp.259-279.
  18. F. Petrini, G. Fossum, J. Fernandez, A.L. Varbanescu, N. Kistler, M. Perrone, “Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine,” The 21st International Parallel and Distributed Processing Symposium, 2007.
  19. NVIDIA Corporation, “NVIDIA Tesla S1070 1U Computing System,” http://www.nvidia.com/object/product tesla s1070 us.html, 2010.
  20. V.Volkov, J.W. Demmel, “Benchmarking GPUs to tune dense linear algebra,” Proceedings of the 2008 ACM/IEEE conference on Supercomputing, 2008.
  21. O. Lubeck, M. Lang, R. Srinivasan, G. Johnson, “Implementation and performance modeling of deterministic particle transport (Sweep3D) on the IBM Cell/BE,” Scientific Programming, Vol.17, No.1, 2009.
  22. C. Gong, J. Liu, Z. Gong, J. Qin, J. Xie, “Optimizing Sweep3D for Graphic Processor Unit,” C.-H. Hsu, L. Yang, J. Park, S.-S. Yeo (Eds.), Algorithms and Architectures for Parallel Processing, Vol.6081 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2010, pp.416-426. https://doi.org/10.1007/978-3-642-13119-6_36

Cited by

  1. A high performance parallel DCT with OpenCL on heterogeneous computing environment vol.64, pp.2, 2013, https://doi.org/10.1007/s11042-012-1028-x
  2. Accelerating IP routing algorithm using graphics processing unit for high speed multimedia communication vol.75, pp.23, 2016, https://doi.org/10.1007/s11042-014-2013-3
  3. Local feature-based multi-object recognition scheme for surveillance vol.25, pp.7, 2012, https://doi.org/10.1016/j.engappai.2012.03.005
  4. Traversable Ground Surface Segmentation and Modeling for Real-Time Mobile Mapping vol.10, pp.4, 2014, https://doi.org/10.1155/2014/795851
  5. Real-Time Terrain Storage Generation from Multiple Sensors towards Mobile Robot Operation Interface vol.2014, 2014, https://doi.org/10.1155/2014/769149