• Title/Summary/Keyword: GPU 프로세스

Search Result 17, Processing Time 0.023 seconds

The Need of Cache Partitioning on Shared Cache of Integrated Graphics Processor between CPU and GPU (내장형 GPU 환경에서 CPU-GPU 간의 공유 캐시에서의 캐시 분할 방식의 필요성)

  • Sung, Hanul;Eom, Hyeonsang;Yeom, HeonYoung
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.507-512
    • /
    • 2014
  • Recently, Distributed computing processing begins using both CPU(Central processing unit) and GPU(Graphic processing unit) to improve the performance to overcome darksilicon problem which cannot use all of the transistors because of the electric power limitation. There is an integrated graphics processor that CPU and GPU share memory and Last level cache(LLC). But, There is no LLC access rules between CPU and GPU, so if GPU and CPU processes run together at the same time, performance of both processes gets worse because of the contention on the LLC. This Paper gives evidence to prove the need of the Cache Partitioning and is mentioned about the cache partitioning design using page coloring to allocate the L3 Cache space only for the GPU process to guarantee GPU process performance.

Analyzing the performance of training tasks based on GPU memory use manner of TensorFlow in Container environments (컨테이너 환경에서 텐서플로의 GPU 메모리 사용방식에 따른 학습 작업의 성능 분석)

  • Jihun Kang;Joon-Min Gil
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.60-62
    • /
    • 2023
  • 인공지능의 학습 작업은 연산량이 많아 고성능 연산 장치인 GPU(Graphics Processing Unit)를 필요로 하며, GPU 장치의 성능은 학습 작업의 실행 성능에 직접적으로 영향을 미치는 요소 중 하나로 작용한다. 인공지능 작업을 처리하기 위해 많이 사용되는 텐서플로의 경우 GPU를 사용해 연산을 수행할 때 기본적으로 거의 모든 GPU 메모리 영역을 단일 학습 작업이 점유하도록 GPU 메모리를 관리한다. 이 방법은 컴퓨팅 자원 중 확장성이 가장 낮은 GPU 메모리의 단편화를 방지하기 위해 사용되는 방법이지만, 하나의 학습 작업이 GPU를 점유하게 되면, 실제 GPU 메모리 사용량과 상관없이 다른 프로세스는 GPU를 사용할 수 없는 문제를 유발한다. 특히, 전이학습, 소규모 학습과 같이 상대적으로 작업 규모가 작은 경우에는 전체 GPU 메모리 용량 중 대부분의 영역이 낭비된다. 본 논문에서는 컨테이너 환경에서 텐서플로의 기본 GPU 메모리 사용 방식으로 인해 다수의 학습 작업을 동시 실행하는 것이 불가능한 문제를 확인하고 GPU 메모리 사용량을 제한한 경우와 하지 않은 경우에 실제 GPU 메모리 사용량과 학습 작업의 실행 시간에 대한 성능 비교를 통해 GPU 메모리의 단편화 방지가 성능에 유의미한 요소인지 검증한다.

GPU Memory Management Technique to Improve the Performance of GPGPU Task of Virtual Machines in RPC-Based GPU Virtualization Environments (RPC 기반 GPU 가상화 환경에서 가상머신의 GPGPU 작업 성능 향상을 위한 GPU 메모리 관리 기법)

  • Kang, Jihun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.5
    • /
    • pp.123-136
    • /
    • 2021
  • RPC (Remote Procedure Call)-based Graphics Processing Unit (GPU) virtualization technology is one of the technologies for sharing GPUs with multiple user virtual machines. However, in a cloud environment, unlike CPU or memory, general GPUs do not provide a resource isolation technology that can limit the resource usage of virtual machines. In particular, in an RPC-based virtualization environment, since GPU tasks executed in each virtual machine are performed in the form of multi-process, the lack of resource isolation technology causes performance degradation due to resource competition. In addition, the GPU memory competition accelerates the performance degradation as the resource demand of the virtual machines increases, and the fairness decreases because it cannot guarantee equal performance between virtual machines. This paper, in the RPC-based GPU virtualization environment, analyzes the performance degradation problem caused by resource contention when the GPU memory requirement of virtual machines exceeds the available GPU memory capacity and proposes a GPU memory management technique to solve this problem. Also, experiments show that the GPU memory management technique proposed in this paper can improve the performance of GPGPU tasks.

Scheduling of Artificial Intelligence Workloads in Could Environments Using Genetic Algorithms (유전 알고리즘을 이용한 클라우드 환경의 인공지능 워크로드 스케줄링)

  • Seokmin Kwon;Hyokyung Bahn
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.3
    • /
    • pp.63-67
    • /
    • 2024
  • Recently, artificial intelligence (AI) workloads encompassing various industries such as smart logistics, FinTech, and entertainment are being executed on the cloud. In this paper, we address the scheduling issues of various AI workloads on a multi-tenant cloud system composed of heterogeneous GPU clusters. Traditional scheduling decreases GPU utilization in such environments, degrading system performance significantly. To resolve these issues, we present a new scheduling approach utilizing genetic algorithm-based optimization techniques, implemented within a process-based event simulation framework. Trace driven simulations with diverse AI workload traces collected from Alibaba's MLaaS cluster demonstrate that the proposed scheduling improves GPU utilization compared to conventional scheduling significantly.

An Implementation of Graphic Offloading Computing using GPU Virtualization based on API Remoting on a Server-based Software Service (서버 기반 SW 서비스에서 API 리모팅 기반의 GPU 가상화를 이용한 그래픽 분할 실행의 구현)

  • Choi, Won-Hyuk;Kim, Won-Young
    • Journal of Internet Computing and Services
    • /
    • v.12 no.6
    • /
    • pp.53-62
    • /
    • 2011
  • In this paper, we introduce a method of graphic offloading computing using a GPU virtualization technology in order to provide high demanding software like 3D software as an on-line software service. When the offloading software is executed on server's software virtualization environment, its graphic works are processed on a client's GPU using GPU virtualization, while on the other its data works are processed on server's CPU. To do that, we propose a method of rendering graphics information on client side GPU using API Remoting method. Also, we show the better performance than server based rendering method when we serve offloading software which include dynamical 3D graphics that display images are frequently changed through on-line. Moreover, we describe a method to virtualize offloading software by a process level and manage client's configuration information in order to decrease server's load when we provide software service to multiple clients.

A GPU-based Filter Algorithm for Noise Improvement in Realtime Ultrasound Images (실시간 초음파 영상에서 노이즈 개선을 위한 GPU 기반의 필터 알고리즘)

  • Cho, Young-Bok;Woo, Sung-Hee
    • Journal of Digital Contents Society
    • /
    • v.19 no.6
    • /
    • pp.1207-1212
    • /
    • 2018
  • The ultrasound image uses ultrasonic pulses to receive the reflected waves and construct an image necessary for diagnosis. At this time, when the signal becomes weak, noise is generated and a slight difference in brightness occurs. In addition, fluctuation of image due to breathing phenomenon, which is the characteristic of ultrasound image, and change of motion in real time occurs. Such a noise is difficult to recognize and diagnose visually in the analysis process. In this paper, morphological features are automatically extracted by using image processing technique on ultrasound acquired images. In this paper, we implemented a GPU - based fast filter using a cloud big data processing platform for image processing. In applying the GPU - based high - performance filter, the algorithm was run with performance 4.7 times faster than CPU - based and the PSNR was 37.2dB, which is very similar to the original.

A Execution Performance Analysis of Applications using Multi-Process Service over GPU (다중 프로세스 서비스를 이용한 GPU 응용 동시 실행 성능 분석)

  • Kim, Se-Jin;Oh, Ji-Sun;Kim, Yoonhee
    • KNOM Review
    • /
    • v.22 no.1
    • /
    • pp.60-67
    • /
    • 2019
  • Graphical Processing Units(GPUs) achieve high performance undertaking from relatively uniformed computation in parallel. The technology related to General Purpose GPU(GPGPU) has been enhanced, which provides concurrent kernel execution of multi and diverse applications at the same time, but it is still limited to support resource sharing or planning. NVIDIA recently introduces Multi-Process Service(MPS), which allows kernels from different applications can be execute concurrently. However, the strength of MPS comes along with the characteristics of applications and the order of their execution. This paper shows the performance analysis of diverse scientific applications in real world. Based on the analysis, we prove that it is important to the identify characteristics of co-run applications, and to schedule multiple applications via profiling to maximize MPS functionality.

Object Tracking Based on Gaussian Mixture Model Algorithm by Using Cuda (Cuda를 이용한 가우시언 믹스처 모델 기반 객체 추적 알고리즘)

  • Kim, In-Su;Choi, Hyung-Il
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2011.01a
    • /
    • pp.273-275
    • /
    • 2011
  • 본 논문에서는 효과적인 객체 추적을 위해 가우시언 믹스처 기반의 그림자 제거 알고리즘을 제안하고, GPGPU(General Purpose GPU) 아키텍처인 NVIDIA 사의 CUDA(Compute Unified Device Architecture)를 이용하여 기존의 객체 추적 알고리즘의 컴퓨팅 시간을 개선하는 모델을 제안한다. 이 시스템은 GPU를 이용한 가우시언 믹스처 모델 기반의 객체 추적 알고리즘으로 전경과 배경 분리 시 CPU와 GPU의 프로세스 시간을 적절히 분배하여 소모되는 연산시간을 줄이고, 고 해상도의 이미지에서의 객체 분리 및 추적의 시스템 처리량을 최대화 한다. 객체 추출 후 효과적인 추적을 위해 예측 모델인 칼만 필터를 사용한다.

  • PDF

Implementation of real-time FD-OCT system based on asynchronous triple buffering and parallel processing using GPU (GPU 병렬처리와 비동기 트리플 버퍼를 적용한 실시간 FD-OCT 시스템 구현)

  • Jeon, Jun-Young;Kim, Young-Bong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.858-860
    • /
    • 2014
  • 최근 영상처리 기법과 하드웨어의 발달로 의학 분야에서는 질병의 진단에 다양한 영상 시스템을 활용하고 있다. 특히 OCT 기술은 인체조직의 고해상도 이미지 획득과 혈류속도 측정을 동시에 할 수 있어 의료분야에 다양하게 적용이 가능하여 많은 관심을 받고 있다. 이에 더욱더 선명한 OCT 영상을 획득하기 위해 다양한 알고리즘과 필터를 사용함에 따라 빠른 프로세스 처리가 요구되고 있는 실정이다. 본 논문에서는 듀얼 코어 이상급의 CPU 를 탑재한 시스템에서 데이터 처리 모듈과 렌더링 모듈을 트리플 버퍼를 통해 비동기식으로 멀티스레드화 하였고, GPU 기반의 병렬처리를 통한 데이터 처리를 하여 속도를 향상시켰다. 이에 광학 카메라 촬영 시 선명한 실시간 OCT 영상을 확인할 수 있었다.

An Efficient k-D tree Traversal Algorithm for Ray Tracing on a GPU (GPU상에서 동작하는 Ray Tracing을 위한 효과적인 k-D tree 탐색 알고리즘)

  • Kang, Yoon-Sig;Park, Woo-Chan;Seo, Choong-Won;Yang, Sung-Bong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.3
    • /
    • pp.133-140
    • /
    • 2008
  • This paper proposes an effective k-D tree traversal algorithm for ray tracing on a GPU. The previous k-D tree traverse algorithm based on GPU uses bottom-up searching from a leaf to the root after failing to find the ray intersected primitive in the leaf node. During the bottom-up search the algorithm decides the current node is visited or not from the parent node. In such a way, we need to visit the parent node which was already visited and the duplicated bounding box intersection tests. The new k-D tree traverse algorithm reduces the brother and parent duplicated visit by using an efficient method which decides whether the brother node is already visited or not during the bottom-up search. Also the algorithm take place bounding box intersection tests only for the nodes which is not yet done. As a result our experiment shows the new algorithm is about 30% faster than the previous.