• Title/Summary/Keyword: GPU Parallel Processing

Search Result 226, Processing Time 0.027 seconds

Parallel Processing of Multi-Core Processor and GPUs in Projection Step for Efficient Fluid Simulation (효율적인 유체 시뮬레이션을 위한 투영 단계에서의 멀티 코어 프로세서와 그래픽 프로세서의 병렬처리)

  • Kim, Sun-Tae;Jung, Hwi-Ryong;Hong, Jeong-Mo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.6
    • /
    • pp.48-54
    • /
    • 2013
  • In these days, the state-of-art technologies employ the heterogeneous parallelization of CPU and GPU for fluid simulations in the field of computer graphics. In this paper, we present a novel CPU-GPU parallel algorithm that solves projection step of fluid simulation more efficiently than existing sequential CPU-GPU processing. Fluid simulation that requires high computational resources can be carried out efficiently by the proposed method.

From WiFi to WiMAX: Efficient GPU-based Parameterized Transceiver across Different OFDM Protocols

  • Li, Rongchun;Dou, Yong;Zhou, Jie;Li, Baofeng;Xu, Jinbo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.8
    • /
    • pp.1911-1932
    • /
    • 2013
  • Orthogonal frequency-division multiplexing (OFDM) has become a popular modulation scheme for wireless protocols because of its spectral efficiency and robustness against multipath interference. Although the components of various OFDM protocols are functionally similar, they remain distinct because of the characteristics of the environment. Recently, graphics processing units (GPUs) have been used to accelerate the signal processing of the physical layer (PHY) because of their great computational power, high development efficiency, and flexibility. In this paper, we describe the implementation of parameterized baseband modules using GPUs for two different OFDM protocols, namely, 802.11a and 802.16. First, we introduce various modules in the modulator/demodulator parts of the transmitter and receiver and analyze the computational complexity of each module. We then describe the integration of the GPU-based baseband modules of the two protocols using the parameterized method. GPU-based implementations are addressed to explain how to accelerate the baseband processing to archive real-time throughput. Finally, the performance results of each signal processing module are evaluated and analyzed. The experiments show that the GPU-based 802.11a and 802.16 PHY meet the real-time requirement and demonstrate good bit error ratio (BER) performance. The performance comparison indicates that our GPU-based implemented modules have better flexibility and throughput to the current ones.

A dynamic analysis algorithm for RC frames using parallel GPU strategies

  • Li, Hongyu;Li, Zuohua;Teng, Jun
    • Computers and Concrete
    • /
    • v.18 no.5
    • /
    • pp.1019-1039
    • /
    • 2016
  • In this paper, a parallel algorithm of nonlinear dynamic analysis of three-dimensional (3D) reinforced concrete (RC) frame structures based on the platform of graphics processing unit (GPU) is proposed. Time integration is performed using Newmark method for nonlinear implicit dynamic analysis and parallelization strategies are presented. Correspondingly, a parallel Preconditioned Conjugate Gradients (PCG) solver on GPU is introduced for repeating solution of the equilibrium equations for each time step. The RC frames were simulated using fiber beam model to capture nonlinear behaviors of concrete and reinforcing bars. The parallel finite element program is developed utilizing Compute Unified Device Architecture (CUDA). The accuracy of the GPU-based parallel program including single precision and double precision was verified in comparison with ABAQUS. The numerical results demonstrated that the proposed algorithm can take full advantage of the parallel architecture of the GPU, and achieve the goal of speeding up the computation compared with CPU.

GPU-Based Parallel Collision Detection for Deformable Objects (변형 물체를 위한 GPU 기반 병렬 충돌 감지)

  • Sung, Nak-Jun;Kim, Min Sang;Hong, Min;Choi, Yoo-Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.1
    • /
    • pp.25-32
    • /
    • 2018
  • Due to heavy computational cost, deformable object simulation requires more effective collision detection method than rigid body simulation. However, when the CPU-based collision detection algorithm is purely applied to the GPU environment, the collision detection algorithm and the data structure optimized for the GPU environment are essential because the performance of the GPU can not be used properly. Therefore, we propose a GPU-based parallel collision detection algorithm for mass-spring system which is widely used for deformable object representation in this paper. The proposed method uses a parallel algorithm and data structure to reduce collision detection cost through GPU-based curling algorithm using AABB-Octree structure. In this paper, we prove the effectiveness of the proposed method by comparing the intersection test of all triangle pairs in parallel. The results of experimental tests show that the proposed method improves the performance by about 24% on average. Therefore, it is expected that the proposed method can improve the performance of real-time simulation for deformable objects.

Implementation of $2{\times}2$ MIMO LTE Base Station using GPU for SDR System (GPU를 이용한 SDR 시스템 용 LTE MIMO 기지국 기능 구현)

  • Lee, Seung Hak;Kim, Kyung Hoon;Ahn, Chi Young;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.4
    • /
    • pp.91-98
    • /
    • 2012
  • This paper implements 2X2 MIMO Long Term Evolution (LTE) base station using Software defined radio (SDR) technology. The implemented base station system processes baseband signals on a Graphics Processor Unit(GPU). GPU is a high-speed parallel processor which provides very important advantage of using a very powerful C-based programming environment that is Compute Unified Device Architecture (CUDA). The implemented software-based base station system processes baseband signals through GPU. It utilizes USRP2 as its RF transceiver. In order to guarantee a real-time processing of LTE baseband signals, we have adopted well-known signal processing algorithms such as frame synchronization algorithms, ML detection, etc. using GPU operating in parallel processing.

Performance Evaluation of the GPU Architecture Executing Parallel Applications (병렬 응용프로그램 실행 시 GPU 구조에 따른 성능 분석)

  • Choi, Hong-Jun;Kim, Cheol-Hong
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.5
    • /
    • pp.10-21
    • /
    • 2012
  • The role of GPU has evolved from graphics-specific processing to general-purpose processing with the development of unified shader core architecture. Especially, execution methods for general-purpose parallel applications using GPU have been researched intensively, since the parallel hardware architecture can be utilized efficiently when the parallel applications are executed. However, current GPU architecture has limitations in executing general-purpose parallel applications, since the GPU is not specialized for general-purpose computing yet. To improve the GPU performance when general-purpose parallel applications are executed, the GPU architecture should be evolved. In this work, we analyze the GPU performance according to the architecture varying the number of cores and clock frequency. Our simulation results show that the GPU performance improves by up to 125.8% and 16.2% as the number of cores increases and the clock frequency increases, respectively. However, note that the improvement of the GPU performance is saturated even though the number of cores increases and the clock frequency increases continuously, since the data cannot be provided to the GPU due to the limit of memory bandwidth. Consequently, to accomplish high performance effectiveness on GPU, computational resources must be more suitably considered.

Implementation of Lattice Reduction-aided Detector using GPU on SDR System (SDR 시스템에서 GPU를 사용한 Lattice Reduction-aided 검출기 구현)

  • Kim, Tae Hyun;Leem, Hyun Seok;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.7 no.3
    • /
    • pp.55-61
    • /
    • 2011
  • This paper presents an implementation of Lattice Reduction (LR)-aided detector for Multiple-Input Multiple-Output (MIMO) system using Graphics Processing Unit (GPU). GPU is a parallel processor which has a number of Arithmetic Logic Units (ALUs), thus, it can minimize the operation time of LR algorithm through the parallelization using multiple threads in the GPU. Through the implemented LR-aided detector, we verify that the LR-aided detector operates a lot faster than Maximum Likelihood (ML) detector. The implemented LR-aided detector has been applied to WiMAX system to show the feasibility of its real-time processing. In addition, we demonstrate that the processing time can be reduced at the cost of 3dB SNR loss by limiting the repeating loop in Lenstra-Lenstra-Lovasz (LLL) algorithm which is frequently used in LR-aided detector.

Frequency Hopping Signal Analysis Using High-Speed Parallel Processing (고속 병렬처리 기법을 활용한 주파수 도약 신호 분석)

  • Lee, Kwang-Yong;Yoon, Hyun-Chul;Lee, Hyeon-Hwi
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.25 no.2
    • /
    • pp.251-254
    • /
    • 2014
  • In this paper, we studied a technique of extracting a Frequency Hopping(FH) signal for analysis using high-speed parallel processing structure. Unlike fixed frequency signal, FH signal is difficult to detect and analyze because FH systems use many random frequencies instead of a single carrier frequency. To solve this problem we designed a method that analyze FH signal using high-speed parallel processing. In order to apply parallel processing, we use CUDA using GPU and compare single processing with prarallel processing. As a result, using CUDA on a GPU is about 8.53 times faster than single processing.

Parallel Processing Algorithm of JPEG2000 Using GPU (GPU를 이용한 JPEG2000 병렬 알고리즘)

  • Lee, Dong-Ha;Cho, Shi-Won;Lee, Dong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.6
    • /
    • pp.1075-1080
    • /
    • 2008
  • Most modem computers or game consoles are well equipped with powerful graphics processing units(GPUs) to accelerate graphics operations. However, since the graphics engines in these GPUs are specially designed for graphics operations, we could not take advantage of their computing power for more general nongraphic operations. In this paper, we studied the GPUs graphics engine in order to accelerate the image processing capability. Specifically, we implemented a JPEC2000 decoding/encoding framework that involves both OpenMP and GPU. Initial experimental results show that significant speed-up can be achieved by utilizing the GPU power.

A Parallel Processing Method for Partial Nodes in R*-tree Using GPU (GPU를 활용한 R*-tree에서의 부분 노드 병렬 처리 방법)

  • Kim, Seong;Oh, Byoung-Woo
    • Spatial Information Research
    • /
    • v.20 no.6
    • /
    • pp.139-144
    • /
    • 2012
  • The R*-tree manages hierarchical nodes for efficient access of spatial data. We propose a method that maintains partial nodes of R*-tree in the GPU memory to improve efficiency using parallel processing. The proposed method attempts to load as many nodes as possible to the GPU memory. The new nodes are inserted to manage the rest of R*-tree nodes in the main memory. The experimental result shows that the proposed method is more efficient than the main memory based R*-tree.