Fast GPU Implementation for the Solution of Tridiagonal Matrix Systems

삼중대각행렬 시스템 풀이의 빠른 GPU 구현

  • 김영희 (경북대학교 컴퓨터과학과) ;
  • 이성기 (경북대학교 전자전기컴퓨터학부)
  • Published : 2005.12.01

Abstract

With the improvement of computer hardware, GPUs(Graphics Processor Units) have tremendous memory bandwidth and computation power. This leads GPUs to use in general purpose computation. Especially, GPU implementation of compute-intensive physics based simulations is actively studied. In the solution of differential equations which are base of physics simulations, tridiagonal matrix systems occur repeatedly by finite-difference approximation. From the point of view of physics based simulations, fast solution of tridiagonal matrix system is important research field. We propose a fast GPU implementation for the solution of tridiagonal matrix systems. In this paper, we implement the cyclic reduction(also known as odd-even reduction) algorithm which is a popular choice for vector processors. We obtained a considerable performance improvement for solving tridiagonal matrix systems over Thomas method and conjugate gradient method. Thomas method is well known as a method for solving tridiagonal matrix systems on CPU and conjugate gradient method has shown good results on GPU. We experimented our proposed method by applying it to heat conduction, advection-diffusion, and shallow water simulations. The results of these simulations have shown a remarkable performance of over 35 frame-per-second on the 1024x1024 grid.

컴퓨터 하드웨어의 급속한 발전으로 그래픽 프로세서 유닛(Graphics Processor Units : GPUs)은 굉장한 메모리 대역폭과 산술 능역을 보유하게 되어 범용 계산에 많이 활용되고 있으며, 특히 계산 집약적인 물리 기반 시뮬레이션(physics based simulation)의 GPU 구현이 활발하게 연구되고 있다. 물리 기반 시뮬레이션의 기본이 되는 미분방정식 풀이 과정에서 삼중대각행렬(tridiagonal matrix) 시스템은 유한차분(finite-difference) 근사에 의해서 자주 나타나는 선형시스템으로 물리 기반 시뮬레이션 관점에서 삼중대각행렬 시스템의 빠른 풀이는 중요한 연구 분야이다. 본 논문에서는 GPU에서 삼중대각행렬 시스템 풀이를 빠르게 구현할 수 있는 방법을 제안한다. 벡터 프로세서(vector processor) 계산에서 삼중대각행렬 시스템 풀이 방법으로 널리 사용되는 cyclic reduction 또는 odd-even reduction 알고리즘을 GPU에서 구현하였다. 본 논문에서 제안한 방법을 삼중대각행렬 시스템 풀이 방법으로 잘 알려져 있는 Thomas 방법과 GPU를 이용한 선형시스템 풀이에서 좋은 성과를 보이고 있는 conjugate gradient 방법과 비교할 때 상당한 성능 향상을 얻을 수 있었다. 또한, 열전도(heat conduction) 방정식, 이류 확산(advection-diffusion) 방정식, 얕은 물(shallow water) 방정식에 의한 물리 기반 시뮬레이션의 GPU 구현에 본 논문에서 제안한 방법을 사용하여 1024x1024 격자의 계산 영역에서 초당 35프레임 이상의 놀라운 성능을 보여주었다.

Keywords

References

  1. T. J. Purcell, C. Donner, M. Cammarano, H. W. Jensen, and P. Hanrahan, 'Photon mapping on programmable graphics hardware,' In proceeding of the ACM SIGGRAPH/Eurographics conference on Graphics hardware, pp, 41-50, 2003
  2. J. Kruger and R. Westermann, 'Linear Algebra Operators for GPU Implementation of Numerical Algorithms,' ACM Transactions on Graphics 22(3), pp. 908-916, 2003 https://doi.org/10.1145/1201775.882363
  3. J. Bolz, I. Farmer, E. Grinspun, and P. Schroder, 'Sparce Matrix Solvers on the GPU: conjugate gradients and multigrid,' ACM Transactions on Graphics 22(3), pp. 917-924, 2003 https://doi.org/10.1145/882262.882364
  4. N. Goodnight, C. Woolley, G. Luebke, G. humphreys, 'A multigrid solver for boundary value problems using programmable graphics hardware,' In Graphics Hardware 2003, pp. 102-111
  5. M. Harris, Fast fluid dynamics simulation on the GPUs, In GPU Gems, pp. 637-665, Addition Wesley, 2004
  6. T. Kim, M, C. Lin, 'Visual simulation of ice crystal growth,' In 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 86-97
  7. C. Zeller, 'Cloth simulation on the GPU,' In ACM SIGGRAPH Conference Abstracta and Applications, 2005 https://doi.org/10.1145/1187112.1187158
  8. I. Viola, A. Kanitsar, M. E. Groller, 'Hardware-based nonlinear filtering and segmentation using high-level shading languages,' In IEEE Visualization 2003, pp, 309-316 https://doi.org/10.1109/VISUAL.2003.1250387
  9. N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, D. Manocha, 'Fast computation of database operations using graphics processors,' In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 215-226, 2004 https://doi.org/10.1145/1007568.1007594
  10. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P., Numerical recipes in C. The Art of scientific Computing. Cambridge University Press, 1992
  11. R. W. Hockney and C. R. Jesshope, Parallel Computers, Adam and Hilger Ltd, Bristol, 1981 Chapter 5
  12. D. Heller, 'A survey of Parallel Algorithms in Numerical Algebra,' SIAM Review, vol. 20, pp. 740-777, 1978 https://doi.org/10.1137/1020096
  13. S. Kotake and K. Hijikata, Numerical Simulation of Heat Transfer and Fluid Flow on a Personal Computer, Elsevier, 1993
  14. Peaceman, D. W., and H. H. Rachford, J.. 'Numerical solution of parabolic and elliptic differential equations,' J. Soc. Indust. Appl, Math., 3, pp. 28-41, 1955 https://doi.org/10.1137/0103003
  15. K. Fatahalian, J. Sugerman, and P. Hanrahan, 'Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication,' Graphics Hardware, 2004 https://doi.org/10.1145/1058129.1058148
  16. J. K. Lee, J. Y. Kim, and H. S. Kim, 'Simulation of 2-D pollutant transport phenomena using modified characteristic method,' Proc. of 29th IAHR Congress, 2001
  17. M. Kass and G. Miller, 'Rapid, Stable Fluid Dynamics for Computer Graphics,' Computer Graphics, 24(4), pp.49-57, 1990 https://doi.org/10.1145/97879.97884
  18. http://www.developer.nvidia.com