Search | Korea Science

Analyzing Fine-Grained Resource Utilization for Efficient GPU Workload Allocation (GPU 작업 배치의 효율화를 위한 자원 이용률 상세 분석)

Park, Yunjoo;Shin, Donghee;Cho, Kyungwoon;Bahn, Hyokyung
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.19 no.1
- /
- pp.111-116
- /
- 2019
Recently, GPU expands application domains from graphic processing to various kinds of parallel workloads. However, current GPU systems focus on the maximization of each workload's parallelism through simplified control rather than considering various workload characteristics. This paper classifies the resource usage characteristics of GPU workloads into computing-bound, memory-bound, and dependency-latency-bound, and quantifies the fine-grained bottleneck for efficient workload allocation. For example, we identify the exact bottleneck resources such as single function unit, double function unit, or special function unit even for the same computing-bound workloads. Our analysis implies that workloads can be allocated together if fine-grained bottleneck resources are different even for the same computing-bound workloads, which can eventually contribute to efficient workload allocation in GPU.
https://doi.org/10.7236/JIIBC.2019.19.1.111 인용 PDF KSCI HTML

Implememtation of Fast Rasterizer processing using GPGPU based on SIMT structure (SIMT 구조 기반 GPGPU를 이용한 고속 Rasterizer 구현)

Kim, Chiyong
- Journal of IKEEE
- /
- v.21 no.3
- /
- pp.276-279
- /
- 2017
In this paper, SIMT structure based GPGPU (General Purpose Computing on Graphics Processing Units) is used for accelerating the Rasterizer which constitutes the screen of the display device in pixel unit. The GPU has a large number of ALUs, and the processing is very fast because of parallel processing. Therefore, in this paper, we implemented a rasterizer that generates a 3D graphics model using a CPU that performs operations sequentially and a GPU that performs operations in parallel. We confirmed that proposed rasterizer in this paper is 1.45 times better than rasterizer using Intel CPU when generating one frame.
https://doi.org/10.7471/ikeee.2017.21.3.276 인용 PDF KSCI

Real-time Stereo Video Generation using Graphics Processing Unit (GPU를 이용한 실시간 양안식 영상 생성 방법)

Shin, In-Yong;Ho, Yo-Sung
- Journal of Broadcast Engineering
- /
- v.16 no.4
- /
- pp.596-601
- /
- 2011
In this paper, we propose a fast depth-image-based rendering method to generate a virtual view image in real-time using a graphic processor unit (GPU) for a 3D broadcasting system. Before the transmission, we encode the input 2D+depth video using the H.264 coding standard. At the receiver, we decode the received bitstream and generate a stereo video using a GPU which can compute in parallel. In this paper, we apply a simple and efficient hole filling method to reduce the decoder complexity and reduce hole filling errors. Besides, we design a vertical parallel structure for a forward mapping process to take advantage of the single instruction multiple thread structure of GPU. We also utilize high speed GPU memories to boost the computation speed. As a result, we can generate virtual view images 15 times faster than the case of CPU-based processing.
https://doi.org/10.5909/JEB.2011.16.4.596 인용 PDF KSCI

Real-Time GPU Technique for Extracting Mesh Isosurfaces from BCC Volume Datasets (BCC 볼륨 데이터로부터 실시간으로 메시 형태의 등가면을 추출하는 GPU 기법)

Kim, Hyunjun;Kim, Minho
- Journal of the Korea Computer Graphics Society
- /
- v.26 no.4
- /
- pp.17-26
- /
- 2020
We present a real-time GPU(Graphic Processing Unit) marching tetrahedra technique that extracts isosurfaces in the indexed mesh format from BCC(Body Centered Cubic) volume datasets. Compared to classical marching tetrahedra, our method shows better performance with little memory overhead. Our technique is composed of five stages. In the first stage, which needs to be done only once, we build min/max blocks that is to be used for empty space skipping to boost the performance. Next, we extract active blocks that contain the current isovalue. In the next two stages, we extract the edges and cells that contain the isosurface and then the final triangular mesh is generated in the last stage. When applied 512³ or higher resolution volume dataset, our technique shows up to 5 times speed improvement compared to the classical marching tetrahedra algorithm.
https://doi.org/10.15701/kcgs.2020.26.4.17 인용 PDF KSCI

Implementation of PSO(Particle Swarm Optimization) Algorithm using Parallel Processing of GPU (GPU의 병렬 처리 기능을 이용한 PSO(Particle Swarm Optimization) 알고리듬 구현)

Kim, Eun-Su;Kim, Jo-Hwan;Kim, Jong-Wook
- Proceedings of the KIEE Conference
- /
- 2008.10b
- /
- pp.181-182
- /
- 2008
본 논문에서는 연산 최적화 알고리듬 중 PSO(Particle Swarm Optimization) 알고리듬을 NVIDIA사(社)에서 제공한 CUDA(Compute Unified Device Architecture)를 이용하여 새롭게 구현하였다. CUDA는 CPU가 아닌 GPU(Graphic Processing Unit)의 다양한 병렬 처리 능력을 사용해 복잡한 컴퓨팅 문제를 해결하는 소프트웨어 개발을 가능케 하는 기술이다. 이 기술을 연산 최적화 알고리듬 중 PSO에 적용함으로써 알고리듬의 수행 속도를 개선하였다. CUDA를 적용한 PSO 알고리듬의 검증을 위해 언어 기반으로 프로그래밍하고 다양한 Test Function을 통해 시뮬레이션 하였다. 그리고 기존의 PSO 알고리듬과 비교 분석하였다. 또한 알고리듬의 성능 향상으로 여러 가지 최적화 분야에 적용 할 수 있음을 보인다.
PDF

Acceleration of computation speed for elastic wave simulation using a Graphic Processing Unit (그래픽 프로세서를 이용한 탄성파 수치모사의 계산속도 향상)

Nakata, Norimitsu;Tsuji, Takeshi;Matsuoka, Toshifumi
- Geophysics and Geophysical Exploration
- /
- v.14 no.1
- /
- pp.98-104
- /
- 2011
Numerical simulation in exploration geophysics provides important insights into subsurface wave propagation phenomena. Although elastic wave simulations take longer to compute than acoustic simulations, an elastic simulator can construct more realistic wavefields including shear components. Therefore, it is suitable for exploration of the responses of elastic bodies. To overcome the long duration of the calculations, we use a Graphic Processing Unit (GPU) to accelerate the elastic wave simulation. Because a GPU has many processors and a wide memory bandwidth, we can use it in a parallelised computing architecture. The GPU board used in this study is an NVIDIA Tesla C1060, which has 240 processors and a 102 GB/s memory bandwidth. Despite the availability of a parallel computing architecture (CUDA), developed by NVIDIA, we must optimise the usage of the different types of memory on the GPU device, and the sequence of calculations, to obtain a significant speedup of the computation. In this study, we simulate two- (2D) and threedimensional (3D) elastic wave propagation using the Finite-Difference Time-Domain (FDTD) method on GPUs. In the wave propagation simulation, we adopt the staggered-grid method, which is one of the conventional FD schemes, since this method can achieve sufficient accuracy for use in numerical modelling in geophysics. Our simulator optimises the usage of memory on the GPU device to reduce data access times, and uses faster memory as much as possible. This is a key factor in GPU computing. By using one GPU device and optimising its memory usage, we improved the computation time by more than 14 times in the 2D simulation, and over six times in the 3D simulation, compared with one CPU. Furthermore, by using three GPUs, we succeeded in accelerating the 3D simulation 10 times.
https://doi.org/10.7582/GGE.2011.14.1.098 인용 PDF KSCI

IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

Son, Dong Oh;Choi, Hong Jun;Kim, Cheol Hong
- Journal of the Korea Society of Computer and Information
- /
- v.25 no.2
- /
- pp.11-19
- /
- 2020
Modern GPU can execute general purpose computation on the graphic processing unit, and provide high performance by exploiting many core on GPU. To run AES algorithm efficiently, parallel computational resources are required. However, computational resource of CPU architecture are not enough to cryptographic algorithm such as AES whereas GPU architecture has mass parallel computation resources. Therefore, this paper reduce the time to execute AES by employing parallel computational resource on GPGPU. Unfortunately, AES cannot utilize computational resource on GPGPU since it isn't suitable to GPGPU architecture. In this paper, IPC based dynamic SM management technique are proposed to efficiently execute AES on GPGPU. IPC based dynamic SM management can increase and decrease the number of active SMs by using IPC in run-time. According to simulation results, proposed technique improve the performance by increasing resource utilization compared to baseline GPGPU architecture. The results show that AES improve the performance by 41.2% on average.
https://doi.org/10.9708/jksci.2020.25.02.011 인용 PDF KSCI

Proposal of 3D Graphic Processor Using Multi-Access Memory System (Multi-Access Memory System을 이용한 3D 그래픽 프로세서 제안)

Lee, S-Ra-El;Kim, Jae-Hee;Ko, Kyung-Sik;Park, Jong-Won
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.19 no.4
- /
- pp.119-128
- /
- 2019
Due to the nature of the 3D graphics processor system, many mathematical calculations are required and parallel processing research using GPU (Graphics Processing Unit) is being performed for high-speed processing. In this paper, we propose a 3D graphics processor using MAMS, a parallel processor that does not use cache memory, to solve the GPU problem of increasing bandwidth caused by cache memory miss and the problem that 3D shader processing speed is not constant. The 3D graphics processor using MAMS proposed in this paper designed Vertex shader, Pixel shader, Tiling and Rasterizing structure using DirectX command analysis, the FPGA(Xilinx Virtex6@100MHz) board for MAMS was constructed and designed using Verilog. We compared the processing time of the developed FPGA (100Mhz) and nVidia GeForce GTX 660 (980Mhz), the processing time using GTX 660 was not constant and suing MAMS was constant.
https://doi.org/10.7236/JIIBC.2019.19.4.119 인용 PDF KSCI HTML

GPU-Accelerated Password Cracking of PDF Files

Kim, Keon-Woo;Lee, Sang-Su;Hong, Do-Won;Ryou, Jae-Cheol
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.5 no.11
- /
- pp.2235-2253
- /
- 2011
Digital document file such as Adobe Acrobat or MS-Office is encrypted by its own ciphering algorithm with a user password. When this password is not known to a user or a forensic inspector, it is necessary to recover the password to open the encrypted file. Password cracking by brute-force search is a perfect approach to discover the password but a time consuming process. This paper presents a new method of speeding up password recovery on Graphic Processing Unit (GPU) using a Compute Unified Device Architecture (CUDA). PDF files are chosen as a password cracking target, and the Abode Acrobat password recovery algorithm is examined. Experimental results show that the proposed method gives high performance at low cost, with a cluster of GPU nodes significantly speeding up the password recovery by exploiting a number of computing nodes. Password cracking performance is increased linearly in proportion to the number of computing nodes and GPUs.
https://doi.org/10.3837/tiis.2011.11.021 인용 PDF KSCI

Trends of Hardware Acceleration Technology in Wed Browser (HW 가속 기반 웹 고속화 기술동향)

Lee, J.H.;Cho, H.W.;Kim, D.H.;Lee, H.S.;Yoon, S.J.;Ryu, C.;Cho, C.S.
- Electronics and Telecommunications Trends
- /
- v.31 no.4
- /
- pp.65-76
- /
- 2016
특정 제조사의 단말 또는 운영체제에 의존성이 없는 플랫폼 독립적인 웹은 높은 이식성, 소프트웨어의 재활용, 개발 생산성, 풍부한 개발자 존재, 유지 보수 면에서 장점을 가지나, 화려한 UI/UX를 제공하는 네이티브 응용에 비해 낮은 성능으로 웹 기반의 응용 개발 및 보급이 크게 활성화되지 못했다. 한편 데스크톱은 물론 모바일 단말의 멀티코어 기반 Graphic Processing Unit(GPU), CPU 탑재 등 HW 고사양화와 웹 응용에서도 HW 가속 기능을 활용할 수 있는 표준 제공으로 성능 제약을 극복할 수 있게 되었다. 본고에서는 GPU 발전동향을 살펴보고, 고속 렌더링 및 병렬 연산처리를 요구하는 웹 응용이 GPU기반 HW 가속 기능을 활용할 수 있는 크로노스 그룹의 그래픽 가속(Web Graphics Library: WebGL) 및 컴퓨팅(Web Computing Language: WebCL) 지원 표준 규격을 정리한다. 또한, 최근 차세대 GPU Application Programming Interface(API)로 발표된 Vulkan에 대해 알아보고, 웹 고속화 기술에 적용 가능성에 대해 전망한다.
PDF

Search Result 81, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)