• Title/Summary/Keyword: GPGPU computing

Search Result 87, Processing Time 0.026 seconds

Study on LLVM application in Parallel Computing System (병렬 컴퓨팅 시스템에서 LLVM 응용 연구)

  • Cho, Jungseok;Cho, Doosan;Kim, Yongyeon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.1
    • /
    • pp.395-399
    • /
    • 2019
  • In order to support various parallel computing systems, it is necessary to extend LLVM IR to more efficiently support vector / matrix and to design LLVM IR to machine code as a new algorithm. As shown in the IR example, RISC instruction generation is naturally generated because the RISC instruction is basically composed of the RISC instruction, and the vector instruction is also not supported. There is a need for new IR structures, command generation algorithms and related extensions to support vector / matrix more robustly. To do this, it is important to map each instruction in the LLVM IR to the appropriate instruction in the target architecture (vector / matrix) (instruction selection algorithm). It is necessary to understand the meaning of LLVM IR command, to compare the meaning of each instruction of the target architecture with syntax, and to select the instruction that matches the pattern to make mapping efficient.

Fast Hologram Generating of 3D Object with Super Multi-Light Source using Parallel Distributed Computing (병렬 분산 컴퓨팅을 이용한 초다광원 3차원 물체의 홀로그램 고속 생성)

  • Song, Joongseok;Kim, Changseob;Park, Jong-Il
    • Journal of Broadcast Engineering
    • /
    • v.20 no.5
    • /
    • pp.706-717
    • /
    • 2015
  • The computer generated hologram (CGH) method is the technology which can generate a hologram by using only a personal computer (PC) commonly used. However, the CGH method requires a huge amount of calculational time for the 3D object with a super multi-light source or a high-definition hologram. Hence, some solutions are obviously necessary for reducing the computational complexity of a CGH algorithm or increasing the computing performance of hardware. In this paper, we propose a method which can generate a digital hologram of the 3D object with a super multi-light source using parallel distributed computing. The traditional methods has the limitation of improving CGH performance by using a single PC. However, the proposed method where a server PC efficiently uses the computing power of client PCs can quickly calculate the CGH method for 3D object with super multi-light source. In the experimental result, we verified that the proposed method can generate the digital hologram with 1,5361,536 resolution size of 3D object with 157,771 light source in 121 ms. In addition, in the proposed method, we verify that the proposed method can reduce generation time of a digital hologram in proportion to the number of client PCs.

Accelerating Self-Similarity-Based Image Super-Resolution Using OpenCL

  • Jun, Jae-Hee;Choi, Ji-Hoon;Lee, Dae-Yeol;Jeong, Seyoon;Cho, Suk-Hee;Kim, Hui-Yong;Kim, Jong-Ok
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.1
    • /
    • pp.10-15
    • /
    • 2015
  • This paper proposes the parallel implementation of a self-similarity based image SR (super-resolution) algorithm using OpenCL. The SR algorithm requires tremendous computations to search for a similar patch. This becomes a bottleneck for the real-time conversion from a FHD image to UHD. Therefore, it is imperative to accelerate the processing speed of SR algorithms. For parallelization, the SR process is divided into several kernels, and memory optimization is performed. In addition, two GPUs are used for further acceleration. The experimental results shows that a GPGPU implementation can speed up over 140 times compared to a single-core CPU. Furthermore, it was confirmed experimentally that utilizing two GPUs can speed up the execution time proportionally, up to 277 times.

GPU-accelerated Lattice Boltzmann Simulation for the Prediction of Oil Slick Movement in Ocean Environment (GPU 가속 기술을 이용한 격자 볼츠만법 기반 원유 확산 과정 시뮬레이션)

  • Ha, Sol;Ku, Namkug;Roh, Myung-Il
    • Korean Journal of Computational Design and Engineering
    • /
    • v.18 no.6
    • /
    • pp.399-406
    • /
    • 2013
  • This paper describes a new simulation technique for advection-diffusion phenomena over the sea surface using the lattice Boltzmann method (LBM), capable of predicting oil dispersion from tankers. The LBM is used to solve the pollutant transport problem within the framework of the ocean environment. The sea space is represented by the lattices, where each lattice has the information on oil transportation. Since dispersed oils (i.e., oil droplets) at sea are transported by convection due to waves, buoyancy, and turbulent diffusion, the conservation of mass and many physical oil transport rules were used in the prediction model. Since the LBM is modeled using the uniform lattices and simple rules, it can be easily accelerated by the parallel mechanism, for example, GPU-accelerated method. The proposed model using the LBM is used to simulate a simple pollution event with the oil pollutants of 10,000 kL. The simulation results indicate that the LBM method accelerated with the GPU is 6 times faster than that without the GPU.

Real-time Full-view 3D Human Reconstruction using Multiple RGB-D Cameras

  • Yoon, Bumsik;Choi, Kunwoo;Ra, Moonsu;Kim, Whoi-Yul
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.4
    • /
    • pp.224-230
    • /
    • 2015
  • This manuscript presents a real-time solution for 3D human body reconstruction with multiple RGB-D cameras. The proposed system uses four consumer RGB/Depth (RGB-D) cameras, each located at approximately $90^{\circ}$ from the next camera around a freely moving human body. A single mesh is constructed from the captured point clouds by iteratively removing the estimated overlapping regions from the boundary. A cell-based mesh construction algorithm is developed, recovering the 3D shape from various conditions, considering the direction of the camera and the mesh boundary. The proposed algorithm also allows problematic holes and/or occluded regions to be recovered from another view. Finally, calibrated RGB data is merged with the constructed mesh so it can be viewed from an arbitrary direction. The proposed algorithm is implemented with general-purpose computation on graphics processing unit (GPGPU) for real-time processing owing to its suitability for parallel processing.

Simulation of Deformable Objects using GLSL 4.3

  • Sung, Nak-Jun;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.8
    • /
    • pp.4120-4132
    • /
    • 2017
  • In this research, we implement a deformable object simulation system using OpenGL's shader language, GLSL4.3. Deformable object simulation is implemented by using volumetric mass-spring system suitable for real-time simulation among the methods of deformable object simulation. The compute shader in GLSL 4.3 which helps to access the GPU resources, is used to parallelize the operations of existing deformable object simulation systems. The proposed system is implemented using a compute shader for parallel processing and it includes a bounding box-based collision detection solution. In general, the collision detection is one of severe computing bottlenecks in simulation of multiple deformable objects. In order to validate an efficiency of the system, we performed the experiments using the 3D volumetric objects. We compared the performance of multiple deformable object simulations between CPU and GPU to analyze the effectiveness of parallel processing using GLSL. Moreover, we measured the computation time of bounding box-based collision detection to show that collision detection can be processed in real-time. The experiments using 3D volumetric models with 10K faces showed the GPU-based parallel simulation improves performance by 98% over the CPU-based simulation, and the overall steps including collision detection and rendering could be processed in real-time frame rate of 218.11 FPS.

Development of Diffusive Wave Rainfall-Runoff Model Based on CUDA FORTRAN (CUDA FORTEAN기반 확산파 강우유출모형 개발)

  • Kim, Boram;Kim, Hyeong-Jun;Yoon, Kwang Seok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.287-287
    • /
    • 2021
  • 본 연구에서는 CUDA(Compute Unified Device Architecture) 포트란을 이용하여 확산파 강우 유출모형을 개발하였다. CUDA 포트란은 그래픽 처리 장치(Graphic Processing Unit: GPU)에서 수행하는 병렬 연산 알고리즘을 포트란 언어를 사용하여 작성할 수 있도록 하는 GPU상의 범용계산(General-Purpose Computing on Graphics Processing Units: GPGPU) 기술이다. GPU는 그래픽 처리 작업에 특화된 다수의 산술 논리 장치(Arithmetic Logic Unit: ALU)로 구성되어 있어서 중앙 처리 장치(Central Processing Unit: CPU)보다 한 번에 더 많은 연산 수행이 가능하다. 이에 따라, CUDA 포트란기반 확산파모형은 분포형 강우유출모형의 수치모의 연산시간을 단축시킬 수 있다. 분포형모형의 지배방정식은 확산파모형과 Green-Ampt모형으로 구성되었고, 확산파모형은 유한체적법을 이용하여 이산화 하였다. CUDA 포트란기반 확산파모형의 정확성은 기존 연구된 수리실험 결과 및 CPU기반 강우유출모형과 비교하였으며, 연산소요시간에 대한 효율성은 CPU기반 확산파모형과 비교하였다. 그 결과 CUDA 포트란기반 확산파모형의 결과는 수리실험 결과 및 CPU기반 강우유출모형의 결과와 유사한 결과를 나타냈다. 또한, 연산소요시간은 CPU 기반 확산파모형의 연산소요시간보다 단축되었으며, 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.

  • PDF

A Study on High Speed Image Rotation Algorithm using CUDA (CUDA를 이용한 고속 영상 회전 알고리즘에 관한 연구)

  • Kwon, Hee-Choul;Cho, Hyung-Jin;Kwon, Hee-Yong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.5
    • /
    • pp.1-6
    • /
    • 2016
  • Image rotation is one of main pre-processing step in image processing or image pattern recognition. It is implemented with rotation matrix multiplication. However it requires lots of floating point arithmetic operations and trigonometric function calculations, so it takes long execution time. We propose a new high speed image rotation algorithm without two major time-consuming operations. It use just 2 shear translation operations, so it is very fast. In addition, we apply a parallel computing technique with CUDA. CUDA is a massively parallel computing architecture using prevailed GPU recently. As GPU is a dedicated graphic processor, it is exellent for parallel processing of pixels. We compare the proposed algorithm with the conventional rotation one with various size images. Experimental results show that the proposed algorithm is superior to the conventional rotation ones.

Analysis of Impact of Correlation Between Hardware Configuration and Branch Handling Methods Executing General Purpose Applications (범용 응용프로그램 실행 시 하드웨어 구성과 분기 처리 기법에 따른 GPU 성능 분석)

  • Choi, Hong Jun;Kim, Cheol Hong
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.3
    • /
    • pp.9-21
    • /
    • 2013
  • Due to increased computing power and flexibility of GPU, recent GPUs execute general purpose parallel applications as well as graphics applications. Programmers can use GPGPU by using the APIs from GPU vendors. Unfortunately, computational resources of GPU are not fully utilized when executing general purpose applications because of frequent branch instructions. To handle the branch problem, several warp formations have been proposed. Intuitively, we expect that the warp formations providing higher computational resource utilization show higher performance. Contrary to our expectations, according to simulation results, the performance of the warp formation providing better utilization is lower than that of the warp formation providing worse utilization. This is because warp formation providing high utilization causes serious memory bottleneck due to increased memory request. Therefore, warp formation providing high computation utilization cannot guarantee high performance without proper hardware resources. For this reason, we will analyze the correlation between hardware configuration and warp formation. Our simulation results present the guideline to solve the underutilization problem due to branch instructions when designing recent GPU.

Analysis on the Performance Impact of Partitioned LLC for Heterogeneous Multicore Processors (이종 멀티코어 프로세서에서 분할된 공유 LLC가 성능에 미치는 영향 분석)

  • Moon, Min Goo;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.15 no.2
    • /
    • pp.39-49
    • /
    • 2019
  • Recently, CPU-GPU integrated heterogeneous multicore processors have been widely used for improving the performance of computing systems. Heterogeneous multicore processors integrate CPUs and GPUs on a single chip where CPUs and GPUs share the LLC(Last Level Cache). This causes a serious cache contention problem inside the processor, resulting in significant performance degradation. In this paper, we propose the partitioned LLC architecture to solve the cache contention problem in heterogeneous multicore processors. We analyze the performance impact varying the LLC size of CPUs and GPUs, respectively. According to our simulation results, the bigger the LLC size of the CPU, the CPU performance improves by up to 21%. However, the GPU shows negligible performance difference when the assigned LLC size increases. In other words, the GPU is less likely to lose the performance when the LLC size decreases. Because the performance degradation due to the LLC size reduction in GPU is much smaller than the performance improvement due to the increase of the LLC size of the CPU, the overall performance of heterogeneous multicore processors is expected to be improved by applying partitioned LLC to CPUs and GPUs. In addition, if we develop a memory management technique that can maximize the performance of each core in the future, we can greatly improve the performance of heterogeneous multicore processors.