CUDA-based Parallel Bi-Conjugate Gradient Matrix Solver for BioFET Simulation

BioFET 시뮬레이션을 위한 CUDA 기반 병렬 Bi-CG 행렬 해법

  • Park, Tae-Jung (Computer Graphics Lab., Dept. of Computer Science, Korea University) ;
  • Woo, Jun-Myung (Dept. of Electrical Engineering, Seoul National University) ;
  • Kim, Chang-Hun (Computer Graphics Lab., Dept. of Computer Science, Korea University)
  • 박태정 (고려대학교 컴퓨터학과 CG Lab) ;
  • 우준명 (서울대학교 전기공학부) ;
  • 김창헌 (고려대학교 컴퓨터학과 CG Lab)
  • Received : 2010.09.16
  • Accepted : 2010.12.30
  • Published : 2011.01.25

Abstract

We present a parallel bi-conjugate gradient (Bi-CG) matrix solver for large scale Bio-FET simulations based on recent graphics processing units (GPUs) which can realize a large-scale parallel processing with very low cost. The proposed method is focused on solving the Poisson equation in a parallel way, which requires massive computational resources in not only semiconductor simulation, but also other various fields including computational fluid dynamics and heat transfer simulations. As a result, our solver is around 30 times faster than those with traditional methods based on single core CPU systems in solving the Possion equation in a 3D FDM (Finite Difference Method) scheme. The proposed method is implemented and tested based on NVIDIA's CUDA (Compute Unified Device Architecture) environment which enables general purpose parallel processing in GPUs. Unlike other similar GPU-based approaches which apply usually 32-bit single-precision floating point arithmetics, we use 64-bit double-precision operations for better convergence. Applications on the CUDA platform are rather easy to implement but very hard to get optimized performances. In this regard, we also discuss the optimization strategy of the proposed method.

본 연구에서는 연산 부하가 매우 큰 Bio-FET 시뮬레이션을 위해 낮은 비용으로 대규모 병렬처리 환경 구축이 가능한 최신 그래픽 프로세서(GPU)를 이용해서 선형 방정식 해법을 수행하기 위한 병렬 Bi-CG(Bi-Conjugate Gradient) 방식을 제안한다. 제안하는 병렬 방식에서는 반도체 소자 시뮬레이션, 전산유체역학(CFD), 열전달 시뮬레이션 등을 포함한 다양한 분야에서 많은 연산량이 집중되어 전체 시뮬레이션에 필요한 시간을 증가시키는 포아송(Poisson) 방정식의 해를 병렬 방식으로 구한다. 그 결과, 이 논문의 테스트에서 사용된 FDM 3차원 문제 공간에서 단일 CPU 대비 연산 속도가 최대 30 배 이상 증가했다. 실제 구현은 NVIDIA의 태슬라 아키텍처(Tesla Architecture) 기반 GPU에서 범용 목적으로 병렬 프로그래밍이 가능한 NVIDIA사의 CUDA(Compute Unified Device Architecture) 환경에서 수행되었으며 기존 연구가 주로 32 비트 정밀도(single floating point) 실수 범위에서 수행된 것과는 달리 본 연구는 64 비트 정밀도(double floating point) 실수 범위로 수행되어 Bi-CG 해법의 수렴성을 개선했다. 특히, CUDA는 비교적 코딩이 쉬운 반면, 최적화가 어려운 특성이 있어 본 논문에서는 제안하는 Bi-CG 해법에서의 최적화 방향도 논의한다.

Keywords

References

  1. Y. Cui, Q. Wei, H. Park, C. M. Lieber, Nanowire Nanosensors for Highly Sensitive and Selective Detection of Biological and Chemical Species, Science 293, 1289, 2001. https://doi.org/10.1126/science.1062711
  2. C. Stagni, C. Guiducci, L. Benini, B. Riccò, S. Carrara, B. Samorí, C. Paulus, M. Schienle, M. Augustyniak, and R. Thewes, CMOS DNA Sensor Array With Integrated A/D Conversion Based on Label-Free Capacitance Measurement, IEEE Journal of Solid-State Circuits 41, 2956, 2006. https://doi.org/10.1109/JSSC.2006.884867
  3. D. Landheer, G. Aers, W. R. McKinnon, M. J. Deen, and J. C. Ranuarez, "Model for the field effect from layers of biological macromolecules on the gates of metal-oxide-semiconductor transistors," Journal of Applied Physics, vol. 98, p. 044701-044701-15, 2005. https://doi.org/10.1063/1.2008354
  4. Y. Liu, K. Lilja, C. Heitzinger, and R. W. Dutton, "Overcoming the screening-induced performance limits of nanowire biosensors: a simulation study on the effect of electro-diffusion flow," in IEDM Tech. Dig., San Francisco, pp. 491-494, 2008.
  5. C. Heitzinger, N. J. Mauser, and C. Ringhofer, Multiscale Modeling of Planar and Nanowire Field-Effect Biosensors, SIAM J. Appl. Math. Volume 70, Issue 5, pp. 1634-1654, 2010. https://doi.org/10.1137/080725027
  6. NVIDIA, CUDA C Programming guide. Version 3.1.1, July, 2010. http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf
  7. 이호영, 박종현, 김준성, "CUDA를 이용한 FDTD 알고리즘의 병렬처리," 전자공학회논문지, 제47권 CI편, 제4호, 82-87쪽, 2010년 7월
  8. 이주석, 류현곤, "GPU 병렬 컴퓨팅 기술을 이용한 개인용 수퍼 컴퓨터 현황과 전망 - CUDA 기술에 대한 이해," 전자공학회논문지, 제36권, 제5호, 18-27쪽, 2009년 5월
  9. G. Strang, Computational Science and Engineering, Wellesley-Cambridge Press, pp. 586-595, 2007.
  10. F. V'azquez, G. Ortega, J.J. Fern'andez, E.M. Garz'on. Improving the performance of the sparse matrix vector product with GPUs, In Proceedings of the 10th IEEE International Conference on Computer and Information Technology (CIT 2010), pp. 1146-1151. Bradford, the UK, July, 2010.
  11. F. V'azquez, E.M. Garzon, J.A. Martinez, and J.J. Fernandez. Accelerating sparse matrix vector product with GPUs. In Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering, volume 2, pp. 1081-1092. CMMSE, 2009.
  12. C. Ericson, Real Time Collison Detection, Morgan Kaufmann Publishers, pp 525-536, 2005.
  13. X. L. Wu, N. Obeid, and W. M. Hwu, Exploiting More Parallelism from Applications Having Gneralized Reduction on GPU Architectures, In Proceedings of the 10th IEEE International Conference on Computer and Information Technology (CIT 2010), pp 1175-1180, Bradford, the UK, July, 2010.
  14. M. Harris, Optimizing Parallel Reduction in CUDA. http://developer.download.nvidia.com/compute/cuda/1_1/Website/projects/reduction/doc/reduction.pdf