• Title/Summary/Keyword: 다중 GPGPU

Search Result 13, Processing Time 0.019 seconds

Thread Block Scheduling for Multi-Workload Environments in GPGPU (다중 워크로드 환경을 위한 GPGPU 스레드 블록 스케줄링)

  • Park, Soyeon;Cho, Kyung-Woon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.2
    • /
    • pp.71-76
    • /
    • 2022
  • Round-robin is widely used for the scheduling of large-scale parallel workloads in the computing units of GPGPU. Round-robin is easy to implement by sequentially allocating tasks to each computing unit, but the load balance between computing units is not well achieved in multi-workload environments like cloud. In this paper, we propose a new thread block scheduling policy to resolve this situation. The proposed policy manages thread blocks generated by various GPGPU workloads with multiple queues based on their computation loads and tries to maximize the resource utilization of each computing unit by selecting a thread block from the queue that can maximally utilize the remaining resources, thereby inducing load balance between computing units. Through simulation experiments under various load environments, we show that the proposed policy improves the GPGPU performance by 24.8% on average compared to Round-robin.

A High Speed Hologram Generation Method Using Scheduling of Multi-GPGPU and Multi-Processor (다중 프로세서와 다중 GPGPU의 스케줄링을 이용한 고속 홀로그램 생성 방법)

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Kim, Dong-Wook
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2017.06a
    • /
    • pp.213-214
    • /
    • 2017
  • 홀로그램을 생성하기 위해서 많은 양의 계산을 필요하기 때문에 고속 홀로그램 생성 방법이 필요하다. 본 논문에서는 다중 프로세서와 다중 GPGPU의 스케줄링을 이용하여 고속화 하는 방법을 제안하고 구현하였다. 다중 프로세서를 이용하여 입력과 출력부분을 나누어 동기화 동작을 줄이고, 버퍼를 이용하여 커널과 커널 사이의 대기 시간을 줄일 수 있도록 스케줄링 하였다. nVidia사의 GTX680(Kepler구조) 2개를 이용하여 구현하였을 때, 이전 연구에서 제안한 방법에 비하여 약 70% 정도 계산시간을 줄일 수 있다.

  • PDF

Fast Hologram Generation Method Using Scheduling of Multi-GPGPUs (다중 GPGPU의 스케쥴링을 이용한 고속 홀로그램 생성 방법)

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Kim, Dong-Wook
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2016.06a
    • /
    • pp.364-365
    • /
    • 2016
  • 컴퓨터 생성 홀로그램(CGH)는 방대한 계산 량을 가지고 있어, 고해상도의 홀로그램을 생성하기 위하여 고속 홀로그램 생성 방법이 필요하다. 본 논문에서는 다중 GPGPU의 스케쥴링 기법을 이용하여 고속화 하는 방법을 제안한다. 첫 번째로는 커널 내에서 공유 메모리를 이용한 스케쥴링 기법을 통하여 고속화를 하고, 두 번째로는 GPGPU간의 P2P(peer-to-peer)데이터 전송을 이용한 스케쥴링을 했다. nVidia의 GTX680 2개 GPGPU를 이용하여 기존의 방법보다 약 50%의 속도 향상을 확인하였다.

  • PDF

Parallel Range Query Processing with R-tree on Multi-GPUs (다중 GPU를 이용한 R-tree의 병렬 범위 질의 처리 기법)

  • Ryu, Hongsu;Kim, Mincheol;Choi, Wonik
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.522-529
    • /
    • 2015
  • Ever since the R-tree was proposed to index multi-dimensional data, many efforts have been made to improve its query performances. One common trend to improve query performance is to parallelize query processing with the use of multi-core architectures. To this end, a GPU-base R-tree has been recently proposed. However, even though a GPU-based R-tree can exhibit an improvement in query performance, it is limited in its ability to handle large volumes of data because GPUs have limited physical memory. To address this problem, we propose MGR-tree (Multi-GPU R-tree), which can manage large volumes of data by dividing nodes into multiple GPUs. Our experiments show that MGR-tree is up to 9.1 times faster than a sequential search on a GPU and up to 1.6 times faster than a conventional GPU-based R-tree.

Analyzing delay of Kernel function owing to GPU memory input from multiple VMs in RPC-based GPU virtualization environments (RPC 기반 GPU 가상화 환경에서 다중 가상머신의 GPU 메모리 입력으로 인한 커널 함수의 지연 문제 분석)

  • Kang, Jihun;Kim, Soo Kyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.541-542
    • /
    • 2021
  • 클라우드 컴퓨팅 환경에서는 고성능 컴퓨팅을 지원하기 위해 사용자에게 GPU(Graphic Processing Unit)가 할당된 가상머신을 제공하여 사용자가 고성능 응용을 실행할 수 있도록 지원한다. 일반적인 컴퓨팅 환경에서 한 명의 사용자가 GPU를 독점해서 사용하기 때문에 자원 경쟁으로 인한 문제가 상대적으로 적게 발생하지만 독립적인 여러 사용자가 컴퓨팅 자원을 공유하는 클라우드 환경에서는 자원 경쟁으로 인해 서로 성능 영향을 미치는 문제를 발생시킨다. 본 논문에서는 여러 개의 가상머신이 단일 GPU를 공유하는 RPC(Remote Procedure Call) 기반 GPU 가상화 환경에서 다수의 가상머신이 GPGPU(General Purpose computing on Graphics Processing Units) 작업을 수행할 때 GPU 메모리 입력 경쟁으로 인해 발생하는 커널 함수의 실행 지연 문제를 분석한다.

  • PDF

Implementation of Parallel Computer Generated Hologram Using Multi-GPGPU (다중 GPGPU를 이용한 컴퓨터 생성 홀로그램의 병렬화 구현)

  • Seo, Young-Ho;Lee, Yoon-Hyuk;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.5
    • /
    • pp.1177-1186
    • /
    • 2014
  • Computer-generated hologram (CGH) is to mathematically model optical phenomenon with digital computer. Because it requires huge amount of computational power, a fast and high performance technique is needed. In this paper, we proposed two parallelizations for CGH calculation. The first is to parallelize CGH algorithm in a GPU (general processing unit) and the second is to parallelize multiple GPUs. The proposed algorithm was implemented in GTX780 Ti GPU. It calculates a $1,024{\times}1,024$ hologram with 10K object points for about 24ms.

Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units (범용 그래픽 처리 장치의 메모리 설계를 위한 그래픽 처리 장치의 메모리 특성 분석)

  • Choi, Hongjun;Kim, Cheolhong
    • Smart Media Journal
    • /
    • v.3 no.1
    • /
    • pp.33-38
    • /
    • 2014
  • Even though the performance of microprocessor is improved continuously, the performance improvement of computing system becomes hard to increase, in order to some drawbacks including increased power consumption. To solve the problem, general-purpose computing on graphics processing units(GPGPUs), which execute general-purpose applications by using specialized parallel-processing device representing graphics processing units(GPUs), have been focused. However, the characteristics of applications related with graphics is substantially different from the characteristics of general-purpose applications. Therefore, GPUs cannot exploit the outstanding computational resources sufficiently due to various constraints, when they execute general-purpose applications. When designing GPUs for GPGPU, memory system is important to effectively exploit the GPUs since typically general-purpose applications requires more memory accesses than graphics applications. Especially, external memory access requiring long latency impose a big overhead on the performance of GPUs. Therefore, the GPU performance must be improved if hierarchical memory architecture which can reduce the number of external memory access is applied. For this reason, we will investigate the analysis of GPU performance according to hierarchical cache architectures in executing various benchmarks.

Multi-GPU based Fast Multi-view Depth Map Generation Method (다중 GPU 기반의 고속 다시점 깊이맵 생성 방법)

  • Ko, Eunsang;Ho, Yo-Sung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2014.11a
    • /
    • pp.236-239
    • /
    • 2014
  • 3차원 영상을 제작하기 위해서는 여러 시점의 색상 영상과 함께 깊이 정보를 필요로 한다. 하지만 깊이 정보를 얻을 때 사용하는 ToF 카메라는 해상도가 낮으며 적외선 신호의 주파수 문제 때문에 최대 3대까지 사용할 수 있다. 따라서 깊이 정보를 색상 영상과 함께 사용하기 위해서 깊이 정보의 업샘플링이 필수적이다. 업샘플링은 깊이 정보를 색상 카메라 위치로 3차원 워핑하고 결합형 양방향 필터(joint bilateral filter, JBF)를 사용하여 빈 영역을 채우는 방법으로 진행된다. 업샘플링은 오랜 시간이 소요되지만 그래픽스 프로세싱 유닛(graphics processing units, GPU)를 이용하여 빠르게 수행될 수 있다. 본 논문에서는 다중 GPU의 병렬 수행을 통하여 빠르게 다시점 깊이맵을 생성할 수 있는 방법을 제안한다. 다중 GPU 병렬 수행은 범용 목적 GPU(general purpose computing on GPU, GPGPU) 중의 하나인 CUDA를 이용하였으며, 본 논문에서 제안된 방법을 이용하여 3개의 GPU 사용한 실험 결과 초당 35 프레임의 다시점 깊이맵을 생성했다.

  • PDF

A Execution Performance Analysis of Applications using Multi-Process Service over GPU (다중 프로세스 서비스를 이용한 GPU 응용 동시 실행 성능 분석)

  • Kim, Se-Jin;Oh, Ji-Sun;Kim, Yoonhee
    • KNOM Review
    • /
    • v.22 no.1
    • /
    • pp.60-67
    • /
    • 2019
  • Graphical Processing Units(GPUs) achieve high performance undertaking from relatively uniformed computation in parallel. The technology related to General Purpose GPU(GPGPU) has been enhanced, which provides concurrent kernel execution of multi and diverse applications at the same time, but it is still limited to support resource sharing or planning. NVIDIA recently introduces Multi-Process Service(MPS), which allows kernels from different applications can be execute concurrently. However, the strength of MPS comes along with the characteristics of applications and the order of their execution. This paper shows the performance analysis of diverse scientific applications in real world. Based on the analysis, we prove that it is important to the identify characteristics of co-run applications, and to schedule multiple applications via profiling to maximize MPS functionality.

A design of GPU container co-execution framework measuring interference among applications (GPU 컨테이너 동시 실행에 따른 응용의 간섭 측정 프레임워크 설계)

  • Kim, Sejin;Kim, Yoonhee
    • KNOM Review
    • /
    • v.23 no.1
    • /
    • pp.43-50
    • /
    • 2020
  • As General Purpose Graphics Processing Unit (GPGPU) recently plays an essential role in high-performance computing, several cloud service providers offer GPU service. Most cluster orchestration platforms in a cloud environment using containers allocate the integer number of GPU to jobs and do not allow a node shared with other jobs. In this case, resource utilization of a GPU node might be low if a job does not intensively require either many cores or large size of memory in GPU. GPU virtualization brings opportunities to realize kernel concurrency and share resources. However, performance may vary depending on characteristics of applications running concurrently and interference among them due to resource contention on a node. This paper proposes GPU container co-execution framework with multiple server creation and execution based on Kubernetes, container orchestration platform for measuring interference which may be occurred by sharing GPU resources. Performance changes according to scheduling policies were investigated by executing several jobs on GPU. The result shows that optimal scheduling is not possible only considering GPU memory and computing resource usage. Interference caused by co-execution among applications is measured using the framework.