• Title/Summary/Keyword: GPU acceleration technique

Search Result 16, Processing Time 0.024 seconds

Development and run time assessment of the GPU accelerated technique of a 2-Dimensional model for high resolution flood simulation in wide area (광역 고해상도 홍수모의를 위한 2차원 모형의 GPU 가속기법 개발 및 실행시간 평가)

  • Choi, Yun Seok;Noh, Hui Seong;Choi, Cheon Kyu
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.12
    • /
    • pp.991-998
    • /
    • 2022
  • The purpose of this study is to develop GPU (Graphics Processing Unit) acceleration technique for 2-dimensional model and to assess the effectiveness for high resolution flood simulation in wide area In this study, GPU acceleration technique was implemented in the G2D (Grid based 2-Dimensional land surface flood model) model, using implicit scheme and uniform square grid, by using CUDA. The technique was applied to flood simulation in Jinju-si. The spatial resolution of the simulation domain is 10 m × 10 m, and the number of cells to calculate is 5,090,611. Flood period by typhoon Mitag, December 2019, was simulated. Rainfall radar data was applied to source term and measured discharge of Namgang-Dam (Ilryu-moon) and measured stream flow of Jinju-si (Oksan-gyo) were applied to boundary conditions. From this study, 2-dimensional flood model could be implemented to reproduce the measured water level in Nam-gang (Riv.). The results of GPU acceleration technique showed more faster flood simulation than the serial and parallel simulation using CPU (Central Processing Unit). This study can contribute to the study of developing GPU acceleration technique for 2-dimensional flood model using implicit scheme and simulating land surface flood in wide area.

GPU-Based ECC Decode Unit for Efficient Massive Data Reception Acceleration

  • Kwon, Jisu;Seok, Moon Gi;Park, Daejin
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1359-1371
    • /
    • 2020
  • In transmitting and receiving such a large amount of data, reliable data communication is crucial for normal operation of a device and to prevent abnormal operations caused by errors. Therefore, in this paper, it is assumed that an error correction code (ECC) that can detect and correct errors by itself is used in an environment where massive data is sequentially received. Because an embedded system has limited resources, such as a low-performance processor or a small memory, it requires efficient operation of applications. In this paper, we propose using an accelerated ECC-decoding technique with a graphics processing unit (GPU) built into the embedded system when receiving a large amount of data. In the matrix-vector multiplication that forms the Hamming code used as a function of the ECC operation, the matrix is expressed in compressed sparse row (CSR) format, and a sparse matrix-vector product is used. The multiplication operation is performed in the kernel of the GPU, and we also accelerate the Hamming code computation so that the ECC operation can be performed in parallel. The proposed technique is implemented with CUDA on a GPU-embedded target board, NVIDIA Jetson TX2, and compared with execution time of the CPU.

Acceleration of Feature-Based Image Morphing Using GPU (GPU를 이용한 특징 기반 영상모핑의 가속화)

  • Kim, Eun-Ji;Yoon, Seung-Hyun;Lee, Jieun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.20 no.2
    • /
    • pp.13-24
    • /
    • 2014
  • In this study, a graphics-processing-unit (GPU)-based acceleration technique is proposed for the feature-based image morphing. This technique uses the depth-buffer of the graphics hardware to calculate efficiently the shortest distance between a pixel and the control lines. The pairs of control lines between the source image and the destination image are determined by user's input, and the distance function of each control line is rendered using two rectangles and two cones. The distance between each pixel and its nearest control line is stored in the depth buffer through the graphics pipeline, and this is used to conduct the morphing operation efficiently. The pixel-unit morphing operation is parallelized using the compute unified device architecture (CUDA) to reduce the morphing time. We demonstrate the efficiency of the proposed technique using several experimental results.

An Acceleration Technique of Terrain Rendering using GPU-based Chunk LOD (GPU 기반의 묶음 LOD 기법을 이용한 지형 렌더링의 가속화 기법)

  • Kim, Tae-Gwon;Lee, Eun-Seok;Shin, Byeong-Seok
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.1
    • /
    • pp.69-76
    • /
    • 2014
  • It is hard to represent massive terrain data in real-time even using recent graphics hardware. In order to process massive terrain data, mesh simplification method such as continuous Level-of-Detail is commonly used. However, existing GPU-based methods using quad-tree structure such as geometry splitting, produce lots of vertices to traverse the quad-tree and retransmit those vertices back to the GPU in each tree traversal. Also they have disadvantage of increase of tree size since they construct the tree structure using texture. To solve the problem, we proposed GPU-base chunked LOD technique for real-time terrain rendering. We restrict depth of tree search and generate chunks with tessellator in GPU. By using our method, we can efficiently render the terrain by generating the chunks on GPU and reduce the computing time for tree traversal.

Acceleration of 2D Image Based Flow Visualization using GPU (GPU를 이용한 2차원 영상 기반 유동 가시화 기법의 가속)

  • Lee, Joong-Youn
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.543-546
    • /
    • 2007
  • Flow visualization is one of visualization techniques and it means a visual expression of vector data using 2D or 3D graphics. It aims for human to easily find and understand a special feature of the vector data. The Image Based Flow Visualization (IBFV) is one of the fastest technique in the dense integration based flow visualization techniques. In this paper, IBFV is accelerated and implemented using commodity GPU. Especially, mesh advection is accelerated at the vertex program.

  • PDF

Digital Image based Real-time Sea Fog Removal Technique using GPU (GPU를 이용한 영상기반 고속 해무제거 기술)

  • Choi, Woon-sik;Lee, Yoon-hyuk;Seo, Young-ho;Choi, Hyun-jun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.12
    • /
    • pp.2355-2362
    • /
    • 2016
  • Seg fog removal is an important issue concerned by both computer vision and image processing. Sea fog or haze removal is widely used in lots of fields, such as automatic control system, CCTV, and image recognition. Color image dehazing techniques have been extensively studied, and expecially the dark channel prior(DCP) technique has been widely used. This paper propose a fast and efficient image prior - dark channel prior to remove seg-fog from a single digital image based on the GPU. We implement the basic parallel program and then optimize it to obtain performance acceleration with more than 250 times. While paralleling and the optimizing the algorithm, we improve some parts of the original serial program or basic parallel program according to the characteristics of several steps. The proposed GPU programming algorithm and implementation results may be used with advantages as pre-processing in many systems, such as safe navigation for ship, topographical survey, intelligent vehicles, etc.

Grid Acceleration Structure for Efficiently Tracing the Secondary Rays in Dynamic Scenes on Mobile Platforms (모바일 환경에서의 동적 장면의 효율적인 이차 광선 추적을 위한 격자 가속 구조)

  • Seo, Woong;Choi, Byeongjun;Ihm, Insung
    • Journal of KIISE
    • /
    • v.44 no.6
    • /
    • pp.573-580
    • /
    • 2017
  • Despite the recent remarkable advances in the computing power of mobile devices, the heat and battery problems still restrict their performances, particularly compared to PCs. Therefore, in the application of the ray-tracing technique for high-quality rendering, the consideration of a method that traces only the secondary rays while the effects of the primary rays are generated through rasterization-based OpenGL ES rendering is worthwhile. Given that most of the rendering time is for the secondary-ray processing in such a method, a new volume-grid technique for dynamic scenes that enhances the tracing performance of the secondary rays with a low coherence is proposed here. The proposed method attempts to model all of the possible spatial secondary rays in a fixed number of sampling rays, thereby alleviating the visitation problem regarding all of the cells along the ray in a uniform grid. Also, a hybrid rendering pipeline that speeds up the overall rendering performance by exploiting the mobile-device CPU and GPU is presented.

Image Identifier based on Local Feature's Histogram and Acceleration Technique using GPU (지역 특징 히스토그램 기반 영상식별자와 GPU 가속화)

  • Jeon, Hyeok-June;Seo, Yong-Seok;Hwang, Chi-Jung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.9
    • /
    • pp.889-897
    • /
    • 2010
  • Recently, a cutting-edge large-scale image database system has demanded these attributes: search with alarming speed, performs with high accuracy, archives efficiently and much more. An image identifier (descriptor) is for measuring the similarity of two images which plays an important role in this system. The extraction method of an image identifier can be roughly classified into two methods: a local and global method. In this paper, the proposed image identifier, LFH(Local Feature's Histogram), is obtained by a histogram of robust and distinctive local descriptors (features) constrained by a district sub-division of a local region. Furthermore, LFH has not only the properties of a local and global descriptor, but also can perform calculations at a magnificent clip to determine distance with pinpoint accuracy. Additionally, we suggested a way to extract LFH via GPU (OpenGL and GLSL). In this experiment, we have compared the LFH with SIFT (local method) and EHD (global method) via storage capacity, extraction and retrieval time along with accuracy.

Quad Tree Based 2D Smoke Super-resolution with CNN (CNN을 이용한 Quad Tree 기반 2D Smoke Super-resolution)

  • Hong, Byeongsun;Park, Jihyeok;Choi, Myungjin;Kim, Changhun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.3
    • /
    • pp.105-113
    • /
    • 2019
  • Physically-based fluid simulation takes a lot of time for high resolution. To solve this problem, there are studies that make up the limitation of low resolution fluid simulation by using deep running. Among them, Super-resolution, which converts low-resolution simulation data to high resolution is under way. However, traditional techniques require to the entire space where there are no density data, so there are problems that are inefficient in terms of the full simulation speed and that cannot be computed with the lack of GPU memory as input resolution increases. In this paper, we propose a new method that divides and classifies 2D smoke simulation data into the space using the quad tree, one of the spatial partitioning methods, and performs Super-resolution only required space. This technique accelerates the simulation speed by computing only necessary space. It also processes the divided input data, which can solve GPU memory problems.

Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing (터커 분해 및 은닉층 병렬처리를 통한 임베디드 시스템의 다중 DNN 가속화 기법)

  • Kim, Ji-Min;Kim, In-Mo;Kim, Myung-Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.842-849
    • /
    • 2022
  • With the development of deep learning technology, there are many cases of using DNNs in embedded systems such as unmanned vehicles, drones, and robotics. Typically, in the case of an autonomous driving system, it is crucial to run several DNNs which have high accuracy results and large computation amount at the same time. However, running multiple DNNs simultaneously in an embedded system with relatively low performance increases the time required for the inference. This phenomenon may cause a problem of performing an abnormal function because the operation according to the inference result is not performed in time. To solve this problem, the solution proposed in this paper first reduces the computation by applying the Tucker decomposition to DNN models with big computation amount, and then, make DNN models run in parallel as much as possible in the unit of hidden layer inside the GPU. The experimental result shows that the DNN inference time decreases by up to 75.6% compared to the case before applying the proposed technique.