• Title/Summary/Keyword: GPU

Search Result 964, Processing Time 0.032 seconds

GPU Resource Contention Management Technique for Simultaneous GPU Tasks in the Container Environments with Share the GPU (GPU를 공유하는 컨테이너 환경에서 GPU 작업의 동시 실행을 위한 GPU 자원 경쟁 관리기법)

  • Kang, Jihun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.10
    • /
    • pp.333-344
    • /
    • 2022
  • In a container-based cloud environment, multiple containers can share a graphical processing unit (GPU), and GPU sharing can minimize idle time of GPU resources and improve resource utilization. However, in a cloud environment, GPUs, unlike CPU or memory, cannot logically multiplex computing resources to provide users with some of the resources in an isolated form. In addition, containers occupy GPU resources only when performing GPU operations, and resource usage is also unknown because the timing or size of each container's GPU operations is not known in advance. Containers unrestricted use of GPU resources at any given point in time makes managing resource contention very difficult owing to where multiple containers run GPU tasks simultaneously, and GPU tasks are handled in black box form inside the GPU. In this paper, we propose a container management technique to prevent performance degradation caused by resource competition when multiple containers execute GPU tasks simultaneously. Also, this paper demonstrates the efficiency of container management techniques that analyze and propose the problem of degradation due to resource competition when multiple containers execute GPU tasks simultaneously through experiments.

Analysis of Job Scheduling and the Efficiency for Multi-core Mobile GPU (멀티코어형 모바일 GPU의 작업 분배 및 효율성 분석)

  • Lim, Hyojeong;Han, Donggeon;Kim, Hyungshin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.7
    • /
    • pp.4545-4553
    • /
    • 2014
  • Mobile GPU has led to the rapid development of smart phone graphic technology. Most recent smart phones are equipped with high-performance multi-core GPU. How a multi-core mobile GPU can be utilized efficiently will be a critical issue for improving the smart phone performance. On the other hand, most current research has focused on a single-core mobile GPU; studies of multi-core mobile GPU are rare. In this paper, the job scheduling patterns and the efficiency of multi-core mobile GPU are analyzed. In the profiling result, despite the higher number of GPU cores, the total processing time required for certain graphics applications were increased. In addition, when GPU is processing for 3D games, a substantial amount of overhead is caused by communication between not only the CPU and GPU, but also within the GPUs. These results confirmed that more active research for multi-core mobile GPU should be performed to optimize the present mobile GPUs.

GPU Memory Management Technique to Improve the Performance of GPGPU Task of Virtual Machines in RPC-Based GPU Virtualization Environments (RPC 기반 GPU 가상화 환경에서 가상머신의 GPGPU 작업 성능 향상을 위한 GPU 메모리 관리 기법)

  • Kang, Jihun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.5
    • /
    • pp.123-136
    • /
    • 2021
  • RPC (Remote Procedure Call)-based Graphics Processing Unit (GPU) virtualization technology is one of the technologies for sharing GPUs with multiple user virtual machines. However, in a cloud environment, unlike CPU or memory, general GPUs do not provide a resource isolation technology that can limit the resource usage of virtual machines. In particular, in an RPC-based virtualization environment, since GPU tasks executed in each virtual machine are performed in the form of multi-process, the lack of resource isolation technology causes performance degradation due to resource competition. In addition, the GPU memory competition accelerates the performance degradation as the resource demand of the virtual machines increases, and the fairness decreases because it cannot guarantee equal performance between virtual machines. This paper, in the RPC-based GPU virtualization environment, analyzes the performance degradation problem caused by resource contention when the GPU memory requirement of virtual machines exceeds the available GPU memory capacity and proposes a GPU memory management technique to solve this problem. Also, experiments show that the GPU memory management technique proposed in this paper can improve the performance of GPGPU tasks.

A Performance Study on CPU-GPU Data Transfers of NVIDIA Tegra and Tesla GPUs (NVIDIA Tegra와 Tesla GPU에서의 CPU-GPU 데이터 전송성능 연구)

  • Kwon, Oh-Kyoung;Gu, Gibeom
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.39-42
    • /
    • 2021
  • 최근 HPC, 인공지능에서 GPU 성능이 향상되면서 사용이 보편화되고 있지만 GPU 프로그래밍은 난이도 측면에서 여전히 큰 장애물이다. 특히 호스트(host) 메모리와 GPU 메모리를 따로 관리해야 하는 어려움 때문에 편의성과 성능 측면에서 연구가 활발히 진행되고 있으며, 다양한 CPU-GPU 메모리 전송프로그래밍 방법들이 제시되고 있다. 본 연구는 NVIDIA Tegra 장치들과 NVIDIA SMX 기반 V100 GPU 카드에서 CPU-GPU 데이터 전송 기법별로 성능비교를 하고자 한다. 특히 NVIDIA Tegra 장치는 CPU와 GPU 통합메모리를 제공하고 있어서 CPU-GPU 메모리 전송방법의 관점에서 기존 GPU 장치와 다른 성능 특징을 보여준다. 성능비교를 위한 실험 워크로드는 HPC 응용프로그램에서 빈번하게 사용하는 2차원 행렬 전치 예제를 사용하였다. 실험을 통해 각 GPU 장치별로 CPU-GPU 메모리 전송 방법에 따른 GPU 커널 성능차이, 페이지 잠긴 메모리와 페이지 가능 메모리의 전송 성능차이, 마지막으로 전체 성능비교를 하였다.

Analyzing the performance of training tasks based on GPU memory use manner of TensorFlow in Container environments (컨테이너 환경에서 텐서플로의 GPU 메모리 사용방식에 따른 학습 작업의 성능 분석)

  • Jihun Kang;Joon-Min Gil
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.60-62
    • /
    • 2023
  • 인공지능의 학습 작업은 연산량이 많아 고성능 연산 장치인 GPU(Graphics Processing Unit)를 필요로 하며, GPU 장치의 성능은 학습 작업의 실행 성능에 직접적으로 영향을 미치는 요소 중 하나로 작용한다. 인공지능 작업을 처리하기 위해 많이 사용되는 텐서플로의 경우 GPU를 사용해 연산을 수행할 때 기본적으로 거의 모든 GPU 메모리 영역을 단일 학습 작업이 점유하도록 GPU 메모리를 관리한다. 이 방법은 컴퓨팅 자원 중 확장성이 가장 낮은 GPU 메모리의 단편화를 방지하기 위해 사용되는 방법이지만, 하나의 학습 작업이 GPU를 점유하게 되면, 실제 GPU 메모리 사용량과 상관없이 다른 프로세스는 GPU를 사용할 수 없는 문제를 유발한다. 특히, 전이학습, 소규모 학습과 같이 상대적으로 작업 규모가 작은 경우에는 전체 GPU 메모리 용량 중 대부분의 영역이 낭비된다. 본 논문에서는 컨테이너 환경에서 텐서플로의 기본 GPU 메모리 사용 방식으로 인해 다수의 학습 작업을 동시 실행하는 것이 불가능한 문제를 확인하고 GPU 메모리 사용량을 제한한 경우와 하지 않은 경우에 실제 GPU 메모리 사용량과 학습 작업의 실행 시간에 대한 성능 비교를 통해 GPU 메모리의 단편화 방지가 성능에 유의미한 요소인지 검증한다.

Intelligent Face Recognition and Tracking System to Distribute GPU Resources using CUDA (쿠다를 사용하여 GPU 리소스를 분배하는 지능형 얼굴 인식 및 트래킹 시스템)

  • Kim, Jae-Heong;Lee, Seung-Ho
    • Journal of IKEEE
    • /
    • v.22 no.2
    • /
    • pp.281-288
    • /
    • 2018
  • In this paper, we propose an intelligent face recognition and tracking system that distributes GPU resources using CUDA. The proposed system consists of five steps such as GPU allocation algorithm that distributes GPU resources in optimal state, face area detection and face recognition using deep learning, real time face tracking, and PTZ camera control. The GPU allocation algorithm that distributes multi-GPU resources optimally distributes the GPU resources flexibly according to the activation level of the GPU, unlike the method of allocating the GPU to the thread fixedly. Thus, there is a feature that enables stable and efficient use of multiple GPUs. In order to evaluate the performance of the proposed system, we compared the proposed system with the non - distributed system. As a result, the system which did not allocate the resource showed unstable operation, but the proposed system showed stable resource utilization because it was operated stably. Thus, the utility of the proposed system has been demonstrated.

A design of GPU container co-execution framework measuring interference among applications (GPU 컨테이너 동시 실행에 따른 응용의 간섭 측정 프레임워크 설계)

  • Kim, Sejin;Kim, Yoonhee
    • KNOM Review
    • /
    • v.23 no.1
    • /
    • pp.43-50
    • /
    • 2020
  • As General Purpose Graphics Processing Unit (GPGPU) recently plays an essential role in high-performance computing, several cloud service providers offer GPU service. Most cluster orchestration platforms in a cloud environment using containers allocate the integer number of GPU to jobs and do not allow a node shared with other jobs. In this case, resource utilization of a GPU node might be low if a job does not intensively require either many cores or large size of memory in GPU. GPU virtualization brings opportunities to realize kernel concurrency and share resources. However, performance may vary depending on characteristics of applications running concurrently and interference among them due to resource contention on a node. This paper proposes GPU container co-execution framework with multiple server creation and execution based on Kubernetes, container orchestration platform for measuring interference which may be occurred by sharing GPU resources. Performance changes according to scheduling policies were investigated by executing several jobs on GPU. The result shows that optimal scheduling is not possible only considering GPU memory and computing resource usage. Interference caused by co-execution among applications is measured using the framework.

Analysis of the CPU/GPU Temperature and Energy Efficiency depending on Executed Applications (응용프로그램 실행에 따른 CPU/GPU의 온도 및 컴퓨터 시스템의 에너지 효율성 분석)

  • Choi, Hong-Jun;Kang, Seung-Gu;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.5
    • /
    • pp.9-19
    • /
    • 2012
  • As the clock frequency increases, CPU performance improves continuously. However, power and thermal problems in the CPU become more serious as the clock frequency increases. For this reason, utilizing the GPU to reduce the workload of the CPU becomes one of the most popular methods in recent high-performance computer systems. The GPU is a specialized processor originally designed for graphics processing. Recently, the technologies such as CUDA which utilize the GPU resources more easily become popular, leading to the improved performance of the computer system by utilizing the CPU and GPU simultaneously in executing various kinds of applications. In this work, we analyze the temperature and the energy efficiency of the computer system where the CPU and the GPU are utilized simultaneously, to figure out the possible problems in upcoming high-performance computer systems. According to our experimentation results, the temperature of both CPU and GPU increase when the application is executed on the GPU. When the application is executed on the CPU, CPU temperature increases whereas GPU temperature remains unchanged. The computer system shows better energy efficiency by utilizing the GPU compared to the CPU, because the throughput of the GPU is much higher than that of the CPU. However, the temperature of the system tends to be increased more easily when the application is executed on the GPU, because the GPU consumes more power than the CPU.

Implementation of a 3D Graphics Simulator for GP-GPU (GP-GPU 개발을 위한 3차원 그래픽 시뮬레이터 구현)

  • Yeo, Dong-young;Kim, Woo-young;Jung, Hyung-Ki;Lee, Kwang-Yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.337-340
    • /
    • 2009
  • Since a hardware accelerator for 3D graphics processing GPU(Graphics Processing Unit)'s performance has been improving constantly. This is the efficient way was introduced for complex graphics application, but it is rarely used to utilize 100% resources on GPU. GP-GPU(general-purpose GPU), including operations on the GPU and supporting common operations can be handled by the processor, is noted by depending on the distribution of resources that can be effectively controlled. In this paper, the simulator was implemented that supports virtual environment of GP-GPU and available for program design and debugging. Through this, the co-design development environment support simultaneous design fast and reliable verification that are available to build the interface of three-dimensional graphics display.

  • PDF

Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU (GPU 성능 저하 해결을 위한 내부 자원 활용/비활용 상태 분석)

  • Choi, Hongjun;Son, Dongoh;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.1-11
    • /
    • 2015
  • In recent high performance computing system, GPGPU has been widely used to process general-purpose applications as well as graphics applications, since GPU can provide optimized computational resources for massive parallel processing. Unfortunately, GPGPU doesn't exploit computational resources on GPU in executing general-purpose applications fully, because the applications cannot be optimized to GPU architecture. Therefore, we provide GPU research guideline to improve the performance of computing systems using GPGPU. To accomplish this, we analyze the negative factors on GPU performance. In this paper, in order to clearly classify the cause of the negative factors on GPU performance, GPU core status are defined into 5 status: fully active status, partial active status, idle status, memory stall status and GPU core stall status. All status except fully active status cause performance degradation. We evaluate the ratio of each GPU core status depending on the characteristics of benchmarks to find specific reasons which degrade the performance of GPU. According to our simulation results, partial active status, idle status, memory stall status and GPU core stall status are induced by computational resource underutilization problem, low parallelism, high memory requests, and structural hazard, respectively.