• Title/Summary/Keyword: Cuda

Search Result 294, Processing Time 0.019 seconds

Implementation of Massive FDTD Simulation Computing Model Based on MPI Cluster for Semi-conductor Process (반도체 검증을 위한 MPI 기반 클러스터에서의 대용량 FDTD 시뮬레이션 연산환경 구축)

  • Lee, Seung-Il;Kim, Yeon-Il;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.9
    • /
    • pp.21-28
    • /
    • 2015
  • In the semi-conductor process, a simulation process is performed to detect defects by analyzing the behavior of the impurity through the physical quantity calculation of the inner element. In order to perform the simulation, Finite-Difference Time-Domain(FDTD) algorithm is used. The improvement of semiconductor which is composed of nanoscale elements, the size of simulation is getting bigger. Problems that a processor such as CPU or GPU cannot perform the simulation due to the massive size of matrix or a computer consist of multiple processors cannot handle a massive FDTD may come up. For those problems, studies are performed with parallel/distributed computing. However, in the past, only single type of processor was used. In GPU's case, it performs fast, but at the same time, it has limited memory. On the other hand, in CPU, it performs slower than that of GPU. To solve the problem, we implemented a computing model that can handle any FDTD simulation regardless of size on the cluster which consist of heterogeneous processors. We tested the simulation on processors using MPI libraries which is based on 'point to point' communication and verified that it operates correctly regardless of the number of node and type. Also, we analyzed the performance by measuring the total execution time and specific time for the simulation on each test.

Performance Evaluation of the GPU Architecture Executing Parallel Applications (병렬 응용프로그램 실행 시 GPU 구조에 따른 성능 분석)

  • Choi, Hong-Jun;Kim, Cheol-Hong
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.5
    • /
    • pp.10-21
    • /
    • 2012
  • The role of GPU has evolved from graphics-specific processing to general-purpose processing with the development of unified shader core architecture. Especially, execution methods for general-purpose parallel applications using GPU have been researched intensively, since the parallel hardware architecture can be utilized efficiently when the parallel applications are executed. However, current GPU architecture has limitations in executing general-purpose parallel applications, since the GPU is not specialized for general-purpose computing yet. To improve the GPU performance when general-purpose parallel applications are executed, the GPU architecture should be evolved. In this work, we analyze the GPU performance according to the architecture varying the number of cores and clock frequency. Our simulation results show that the GPU performance improves by up to 125.8% and 16.2% as the number of cores increases and the clock frequency increases, respectively. However, note that the improvement of the GPU performance is saturated even though the number of cores increases and the clock frequency increases continuously, since the data cannot be provided to the GPU due to the limit of memory bandwidth. Consequently, to accomplish high performance effectiveness on GPU, computational resources must be more suitably considered.

Implementation of FFT on Massively Parallel GPU for DVB-T Receiver (DVB-T 수신기를 위한 대규모 병렬처리 GPU 기반의 FFT 구현)

  • Lee, Kyu Hyung;Heo, Seo Weon
    • Journal of Broadcast Engineering
    • /
    • v.18 no.2
    • /
    • pp.204-214
    • /
    • 2013
  • Recently various research have been conducted relating to the implementation of signal processing or communication system by software using the massively parallel processing capability of the GPU. In this work, we focus on reducing software simulation time of 2K/8K FFT in DVB-T by using GPU. we estimate the processing time of the DVB-T system, which is one of the standards for DTV transmission, by CPU. Then we implement the FFT processing by the software using the NVIDIA's massively parallel GPU processor. In this paper we apply stream process method to reduce the overhead for data transfer between CPU and GPU, coalescing method to reduce the global memory access time and data structure design method to maximize the shared memory usage. The results show that our proposed method is approximately 20~30 times as fast as the CPU based FFT processor, and approximately 1.8 times as fast as the CUFFT library (version 2.1) which is provided by the NVIDIA when applied to the DVB-T 2K/8K mode FFT.

A Parallel Processing Technique for Large Spatial Data (대용량 공간 데이터를 위한 병렬 처리 기법)

  • Park, Seunghyun;Oh, Byoung-Woo
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.1-9
    • /
    • 2015
  • Graphical processing unit (GPU) contains many arithmetic logic units (ALUs). Because many ALUs can be exploited to process parallel processing, GPU provides efficient data processing. The spatial data require many geographic coordinates to represent the shape of them in a map. The coordinates are usually stored as geodetic longitude and latitude. To display a map in 2-dimensional Cartesian coordinate system, the geodetic longitude and latitude should be converted to the Universal Transverse Mercator (UTM) coordinate system. The conversion to the other coordinate system and the rendering process to represent the converted coordinates to screen use complex floating-point computations. In this paper, we propose a parallel processing technique that processes the conversion and the rendering using the GPU to improve the performance. Large spatial data is stored in the disk on files. To process the large amount of spatial data efficiently, we propose a technique that merges the spatial data files to a large file and access the file with the method of memory mapped file. We implement the proposed technique and perform the experiment with the 747,302,971 points of the TIGER/Line spatial data. The result of the experiment is that the conversion time for the coordinate systems with the GPU is 30.16 times faster than the CPU only method and the rendering time is 80.40 times faster than the CPU.