DOI QR코드

DOI QR Code

Accelerating Medical Image Processing on Integrated GPU Using OpenCL

OpenCL을 이용한 내장형 GPU에서의 의학영상처리 가속화

  • 김범준 (인하대학교 컴퓨터공학과) ;
  • 신병석 (인하대학교 컴퓨터공학과)
  • Received : 2017.01.18
  • Accepted : 2017.05.30
  • Published : 2017.06.01

Abstract

A variety of filters are applied to improve the quality of noise and low resolution medical images. This is necessary to reduce the radiation dose of the patient and to improve the utilization of the conventional spherical imaging equipment. In the conventional method, it is common to perform filtering using the CPU of the PC. However, it is difficult to produce results in real time by applying various calculations and filters to high-resolution human images using only the CPU performance of a PC used in a hospital. In this paper, we analyze the structure and performance of Intel integrated GPU in CPU and propose a method to perform image filtering using OpenCL parallel processing function. By applying complex filters with high computational complexity to medical images, high quality images can be generated in real time.

잡음이 있거나 해상도가 낮은 의료 영상의 화질을 개선하기 위해 다양한 필터를 적용한다. 이것은 환자의 방사선 피폭량을 줄이고, 기존에 사용하던 영상 촬영기기의 활용도를 높이기 위해 반드시 필요한 작업이다. 기존 방법에서는 PC의 CPU를 이용하여 필터링하는 것이 일반적이었다. 하지만 병원에서 사용하는 PC의 CPU 성능만으로는 해상도가 높은 인체 영상에 각종 연산 및 필터를 적용하여 실시간으로 결과를 만들어 내기는 어렵다. 본 논문에서는 CPU 안에 탑재되어 있는 인텔 내장 GPU의 구조와 성능을 분석하고 이를 기반으로 하여 OpenCL 병렬처리 기능을 적용한 영상 필터링을 수행하는 방법을 제안하였다. 이를 통해 의료 영상에 높은 연산량을 가지는 복잡한 필터를 적용하여 고화질의 결과물을 실시간에 생성할 수 있도록 하였다.

Keywords

Acknowledgement

Supported by : National Research Foundation of Korea(NRF)

References

  1. Brenner, David J, Computed Tomography - An Increasing Source of Radiation Exposure Current Concepts, The New England Journal of Medicine 357.22, Nov 29, 2007
  2. E. Stewart, Intel Integrated Performance Primitives: How to Optimize Software Applications Using Intel IPP, Intel Press, 2004
  3. LDagum, R menon, OpenMP : OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, 1998
  4. Khronos OpenCL Working Group, The OpenCL specification, Hot Chips 21 Symposium (HCS), IEEE, 2009
  5. Intel Corporation, Intergrated graphics and video computer display system, US Patent 5,432,900, 1995
  6. Intel corporation, The Compute Architecture of intel(R) Processor Graphics Gen8, 2014
  7. Intel Corporation, The Compute Architecture of Intel(R) Processor Graphics Gen9, 2015
  8. Janghaeng Lee, Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems, Parallel Architectures and Compilation Techniques (PACT), 2013
  9. Daniel Lustig, Reducing GPU offload latency via fine-grained CPU-GPU synchronization, High Performance Computer Architecture (HPCA2013), 2013
  10. Moinuddin K. Qureshi, Yale N. Patt, Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches, Micro architecture, MICRO-39, 2006
  11. S. Che, M. Boyer, J. Meng, D. TaIjan, J. Sheaffer, S.H. Lee, and K. Skadron, Rodinia: A benchmark suite for heterogeneous computing, International Symposium on Workload Characterization, Oct. 2009
  12. Victor, Garcia, Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneo us applications, Workload Characterization (IISWC), 2016
  13. Jason Power, gem5-gpu: A Heterogeneous CPU-GPU Simulator, IEEE Computer Architecture Letters, June, 2015
  14. Ali Bakhoda, Analyzing CUDA workloads using a detailed GPU simulator, Performance Analysis of Systems and Software, 2009
  15. S.J. Pennycook, An investigation of the performenace portability of OpenCL, Journal of Parallel and Distributed Computing, Volume 73. Issue 11 , November 2013
  16. Timothy G. Rogers, Cache-Concious Wavefront Scheduling, MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, Pages 72-83, 2012
  17. JE Stone, OpenCL : A parallel programming standard for hetrogeneous computing systems, Computing in science & engineering, 2010
  18. NVlDIA Inc., OpenCL Best Practices Guide, May 2010
  19. M. M. Baskaran, A compiler framework for optimization of affine loop nests for gpgpus, in Proceedings of the 22nd annual international conference on Supercomputing, pages 225-234, 2008
  20. Yi Yang, A GPGPU compiler for memory optimization and parallelism management, PLDI '10 Proceedings of the 31st ACM SGIGPLAN Conference on Programming Language Design and Implementation, p 86-97, 2010
  21. B Keswani, A Comparative Performance Analysis of Convolution W/O OpenCL on a Standalone System, Advances in Computing and Communication Engineering(ICACCE), 2015
  22. Bilal Jan, Fast parallel sorting algorithms on GPU, International Journal of Distributed and Parallel Systems, Vol3, No.6, 2012
  23. Nobuyuki Otsu, A Threshold Selection Method from Gray-Level Histograms, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. SMC-9, NO.1, 1979
  24. M Harris, Optimizing parallel reduction in CUDA, NVIDIA Developer Technology, 2007
  25. M Harris, Parallel prefix sum (scan) with CUDA, GPU gems, 2007