• Title/Summary/Keyword: GPU algorithm

Search Result 265, Processing Time 0.027 seconds

Domain decomposition for GPU-Based continuous energy Monte Carlo power reactor calculation

  • Choi, Namjae;Joo, Han Gyu
    • Nuclear Engineering and Technology
    • /
    • v.52 no.11
    • /
    • pp.2667-2677
    • /
    • 2020
  • A domain decomposition (DD) scheme for GPU-based Monte Carlo (MC) calculation which is essential for whole-core depletion is introduced within the framework of the modified history-based tracking algorithm. Since GPU-offloaded MC calculations suffer from limited memory capacity, employing DDMC is inevitable for the simulation of depleted cores which require large storage to save hundreds of newly generated isotopes. First, an automated domain decomposition algorithm named wheel clustering is devised such that each subdomain contains nearly the same number of fuel assemblies. Second, an innerouter iteration algorithm allowing overlapped computation and communication is introduced which enables boundary neutron transactions during the tracking of interior neutrons. Third, a bank update scheme which is to include the boundary sources in a way to be adequate to the peculiar data structures of the GPU-based neutron tracking algorithm is presented. The verification and demonstration of the DDMC method are done for 3D full-core problems: APR1400 fresh core and a mock-up depleted core. It is confirmed that the DDMC method performs comparably with the standard MC method, and that the domain decomposition scheme is essential to carry out full 3D MC depletion calculations with limited GPU memory capacities.

Precise Sweep Volume Computation Accelerated by GPU (GPU 가속을 이용한 정밀밀한 스웹 볼륨 경계 계산)

  • Lee, Hyunho;Kyung, Minho
    • Journal of the Korea Computer Graphics Society
    • /
    • v.21 no.1
    • /
    • pp.13-21
    • /
    • 2015
  • We present a robust GPU algorithm constructing a sweep volume boundary for a triangular mesh model. Sweeping geometric entities of a triangular mesh object is first approximated to a set of triangles, the envelope of which becomes the outer boundary of the sweep volume. We find the envelope by computing the arrangement of the triangle set and extracting its outmost boundary. To ensure robustness of the algorithm, we adopt random perturbation of sweep vertices and the interval arithmetic using multi-level precisions. The algorithm is implemented to perform most computation on GPU, and as a result it runs two orders of magnitude faster than other algorithms.

Efficient Computation of Isosurface Curvatures on GPUs Based on the de Boor Algorithm (드 부어 알고리즘을 이용한 GPU에서의 효율적인 등가면 곡률 계산)

  • Kim, Minho
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.3
    • /
    • pp.47-54
    • /
    • 2017
  • In this paper, we propose an improved curvature-based GPU (Graphics Processing Unit) isosurface ray-casting technique. Our method adopts the fast evaluation method proposed by Sigg et al. [1] to find the isosurface, but replaces the computation of the gradient and Hessian with the de Boor algorithm. In this way, we can reduce the number of additional texture fetches from 84 to 27 thus improving the performance by up to ${\approx}30%$, depending on the platforms.

Memory-Efficient Belief Propagation for Stereo Matching on GPU (GPU 에서의 고속 스테레오 정합을 위한 메모리 효율적인 Belief Propagation)

  • Choi, Young-Kyu;Williem, Williem;Park, In Kyu
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.11a
    • /
    • pp.52-53
    • /
    • 2012
  • Belief propagation (BP) is a commonly used global energy minimization algorithm for solving stereo matching problem in 3D reconstruction. However, it requires large memory bandwidth and data size. In this paper, we propose a novel memory-efficient algorithm of BP in stereo matching on the Graphics Processing Units (GPU). The data size and transfer bandwidth are significantly reduced by storing only a part of the whole message. In order to maintain the accuracy of the matching result, the local messages are reconstructed using shared memory available in GPU. Experimental result shows that there is almost an order of reduction in the global memory consumption, and 21 to 46% saving in memory bandwidth when compared to the conventional algorithm. The implementation result on a recent GPU shows that we can obtain 22.8 times speedup in execution time compared to the execution on CPU.

  • PDF

The Performance Analysis of GPU-based Cloth simulation according to the Change of Work Group Configuration (워크 그룹 구성 변화에 따른 GPU 기반 천 시뮬레이션의 성능 분석)

  • Choi, Young-Hwan;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.29-36
    • /
    • 2017
  • In these days, 3D dynamic simulation is closely related to many industries. In the past, physically-based 3D simulation was used mainly in the car crash or construction related fields, but it also plays an important role in movies or games today. Many mathematical computations are needed to represent the 3D object realistically, but it is difficult to process a large amount of calculations for simulation of application based on CPU in real-time. Recently, with the advanced graphic hardware and improved architecture, GPU can be utilized for the general purposes of computation function as well as graphic computation. Many approaches using GPU have been applied for various research fields. In this paper, we analyze the performance variation of two cloth simulation algorithms based on GPU according to the change of execution properties of GPU shaders in oder to optimize the performance of GPU-based cloth simulation. Cloth simulation is implemented by the spring centric algorithm and node centric algorithm with GPU parallel computing using compute shader of GLSL 4.3. We compare the performance of between these algorithms according to the change of the size and dimension of work group. The experiment is repeated to 10 times during 5,000 frames for each test and experimental results are provided by averaging of FPS. The experimental result shows that the node centric algorithm is executed in higher speed than the spring centric algorithm.

Optimization of Lightweight Encryption Algorithm (LEA) using Threads and Shared Memory of GPU (GPU의 스레드와 공유메모리를 이용한 LEA 최적화 방안)

  • Park, Moo Kyu;Yoon, Ji Won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.4
    • /
    • pp.719-726
    • /
    • 2015
  • As big-data and cloud security technologies become popular, many researchers have recently been conducted on faster and lighter encryption. As a result, National Security Research Institute developed LEA which is lightweight and fast block cipher. To date, there have been various studies on lightweight encryption algorithm (LEA) for speeding up using GPU rather than conventional CPU. However, it is rather difficult to explore any guideline how to manipulate the GPU for the efficient usage of the LEA. Therefore, we introduce a guideline which explains how to implement and design the optimal LEA using GPU.

Parallel Design and Implementation of Shot Boundary Detection Algorithm (샷 경계 탐지 알고리즘의 병렬 설계와 구현)

  • Lee, Joon-Goo;Kim, SeungHyun;You, Byoung-Moon;Hwang, DooSung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.76-84
    • /
    • 2014
  • As the number of high-density videos increase, parallel processing approaches are necessary to process a large-scale of video data. When a processing method of video data requires thousands of simple operations, GPU-based parallel processing is preferred to CPU-based parallel processing by way of reducing the time and space complexities of a given computation problem. This paper studies the parallel design and implementation of a shot-boundary detection algorithm. The proposed shot-boundary detection algorithm uses pixel brightness comparisons and global histogram data among the blocks of frames, and the computation of these data is characterized with the high parallelism for the related operations. In order to maximize these operations in parallel, the computations of the pixel brightness and histogram are designed in parallel and implemented in NVIDIA GPU. The GPU-based shot detection method is tested with 10 videos from the set of videos in National Archive of Korea. In experiments, the detection rate is similar but the computation time is about 10 time faster to that of the CPU-based algorithm.

Robust GPU-based intersection algorithm for a large triangle set (GPU를 이용한 대량 삼각형 교차 알고리즘)

  • Kyung, Min-Ho;Kwak, Jong-Geun;Choi, Jung-Ju
    • Journal of the Korea Computer Graphics Society
    • /
    • v.17 no.3
    • /
    • pp.9-19
    • /
    • 2011
  • Computing triangle-triangle intersections has been a fundamental task required for many 3D geometric problems. We propose a novel robust GPU algorithm to efficiently compute intersections in a large triangle set. The algorithm has three stages:k-d tree construction, triangle pair generation, and exact intersection computation. All three stages are executed on GPU except, for unsafe triangle pairs. Unsafe triangle pairs are robustly handled by CLP(controlled linear perturbation) on a CPU thread. They are identified by floating-point filtering while exact intersection is computed on GPU. Many triangles crossing a split plane are duplicated in k-d tree construction, which form a lot of redundant triangle pairs later. To eliminate them efficiently, we use a split index which can determine redundancy of a pair by a simple bitwise operation. We applied the proposed algorithm to computing 3D Minkowski sum boundaries to verify its efficiency and robustness.

Implementation and Performance Evaluation of a Video-Equipped Real-Time Fire Detection Method at Different Resolutions using a GPU (GPU를 이용한 다양한 해상도의 비디오기반 실시간 화재감지 방법 구현 및 성능평가)

  • Shon, Dong-Koo;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.1
    • /
    • pp.1-10
    • /
    • 2015
  • In this paper, we propose an efficient parallel implementation method of a widely used complex four-stage fire detection algorithm using a graphics processing unit (GPU) to improve the performance of the algorithm and analyze the performance of the parallel implementation method. In addition, we use seven different resolution videos (QVGA, VGA, SVGA, XGA, SXGA+, UXGA, QXGA) as inputs of the four-stage fire detection algorithm. Moreover, we compare the performance of the GPU-based approach with that of the CPU implementation for each different resolution video. Experimental results using five different fire videos with seven different resolutions indicate that the execution time of the proposed GPU implementation outperforms that of the CPU implementation in terms of execution time and takes a 25.11ms per frame for the UXGA resolution video, satisfying real-time processing (30 frames per second, 30fps) of the fire detection algorithm.

An efficient acceleration algorithm of GPU ray tracing using CUDA (CUDA를 이용한 효과적인 GPU 광선추적 가속 알고리즘)

  • Ji, Joong-Hyun;Yun, Dong-Ho;Ko, Kwang-Hee
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.469-474
    • /
    • 2009
  • This paper proposes an real time ray tracing system using optimized kd-tree traversal environment and ray/triangle intersection algorithm. The previous kd-tree traversal algorithms search for the upper nodes in a bottom-up manner. In a such way we need to revisit the already visited parent node or use redundant memory after failing to find the intersected primitives in the leaf node. Thus ray tracing for relatively complex scenes become more difficult. The new algorithm contains stacks implemented on GPU's local memory on CUDA framework, thus elegantly eliminate the problems of previous algorithms. After traversing the node we perform the latest CPU-based ray/triangle intersection algorithm 'Plucker coordinate test', which is further accelerated in massively parallel thanks to CUDA. Plucker test can drastically reduce the computational costs since it does not use barycentric coordinates but only simple test using the relations between a ray and the triangle edges. The entire system is consist of a single ray kernel simply and implemented without introduction of complicated synchronization or ray packets. Consequently our experiment shows the new algorithm can is roughly twice as faster as the previous.

  • PDF