• Title/Summary/Keyword: 병렬 워크로드

Search Result 12, Processing Time 0.027 seconds

Analyzing Fine-Grained Resource Utilization for Efficient GPU Workload Allocation (GPU 작업 배치의 효율화를 위한 자원 이용률 상세 분석)

  • Park, Yunjoo;Shin, Donghee;Cho, Kyungwoon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.111-116
    • /
    • 2019
  • Recently, GPU expands application domains from graphic processing to various kinds of parallel workloads. However, current GPU systems focus on the maximization of each workload's parallelism through simplified control rather than considering various workload characteristics. This paper classifies the resource usage characteristics of GPU workloads into computing-bound, memory-bound, and dependency-latency-bound, and quantifies the fine-grained bottleneck for efficient workload allocation. For example, we identify the exact bottleneck resources such as single function unit, double function unit, or special function unit even for the same computing-bound workloads. Our analysis implies that workloads can be allocated together if fine-grained bottleneck resources are different even for the same computing-bound workloads, which can eventually contribute to efficient workload allocation in GPU.

Analysis of Parallel and Distributed File System Workloads on Tachyon Cluster System (타키온 클러스터 시스템의 병렬 분산 파일 시스템 워크로드 분석)

  • Cho, Hyeyoung;Kim, Sungho;Lee, Sik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.113-114
    • /
    • 2009
  • 클러스터 시스템의 응용 분야가 다양화되고 복잡해짐에 따라, 대규모 클러스터 시스템을 보다 효율적으로 사용하기 위해서 실제 사용자의 이용 패턴을 예측할 수 있는 워크로드 분석의 필요성이 높아지고 있다. 이에 본 논문에서는 현재 가동중인 188개의 계산 노드, 3008개 CPU 자원을 보유한 대규모 클러스터 시스템에서 병렬 분산 파일 시스템에 대한 워크로드를 분석하였다.

Thread Block Scheduling for GPGPU based on Fine-Grained Resource Utilization (상세 자원 이용률에 기반한 병렬 가속기용 스레드 블록 스케줄링)

  • Bahn, Hyokyung;Cho, Kyungwoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.5
    • /
    • pp.49-54
    • /
    • 2022
  • With the recent widespread adoption of general-purpose GPUs (GPGPUs) in cloud systems, maximizing the resource utilization through multitasking in GPGPU has become an important issue. In this article, we show that resource allocation based on the workload classification of computing-bound and memory-bound is not sufficient with respect to resource utilization, and present a new thread block scheduling policy for GPGPU that makes use of fine-grained resource utilizations of each workload. Unlike previous approaches, the proposed policy reduces scheduling overhead by separating profiling and scheduling, and maximizes resource utilizations by co-locating workloads with different bottleneck resources. Through simulations under various virtual machine scenarios, we show that the proposed policy improves the GPGPU throughput by 130.6% on average and up to 161.4%.

Log Generation for File Access on PVFS2 (PVFS2 를 위한 파일 접근 로그 생성)

  • Cha, Kwangho;Cho, Hyeyoung;Kim, Sungho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.11a
    • /
    • pp.675-677
    • /
    • 2007
  • 클러스터 시스템의 응용 분야가 다양화되고 복잡해짐에 따라, 대규모 클러스터 시스템을 보다 효율적으로 사용하기 위해서 실제 사용자의 이용 패턴을 예측할 수 있는 워크로드 분석의 필요성이 높아지고 있다. 워크로드 분석으로는 다양한 작업이 진행되는데 특히 파일 단위의 동적 접근 분석도 이에 포함된다. 본 논문에서는 실험용 병렬 파일 시스템으로 많이 보급된 PVFS2 에 파일 단위접근 기록을 가능케하는 방안을 모색하고 이 기능의 활용 가능성을 살펴 보았다.

Thread Block Scheduling for Multi-Workload Environments in GPGPU (다중 워크로드 환경을 위한 GPGPU 스레드 블록 스케줄링)

  • Park, Soyeon;Cho, Kyung-Woon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.2
    • /
    • pp.71-76
    • /
    • 2022
  • Round-robin is widely used for the scheduling of large-scale parallel workloads in the computing units of GPGPU. Round-robin is easy to implement by sequentially allocating tasks to each computing unit, but the load balance between computing units is not well achieved in multi-workload environments like cloud. In this paper, we propose a new thread block scheduling policy to resolve this situation. The proposed policy manages thread blocks generated by various GPGPU workloads with multiple queues based on their computation loads and tries to maximize the resource utilization of each computing unit by selecting a thread block from the queue that can maximally utilize the remaining resources, thereby inducing load balance between computing units. Through simulation experiments under various load environments, we show that the proposed policy improves the GPGPU performance by 24.8% on average compared to Round-robin.

Job Packing Technique in Backfilling Scheduler using Genetic Algorithm (유전자 알고리즘을 이용한 Backfilling 스케줄러의 작업 패킹 기법)

  • Lee HyoYoung;Lee Dongwoo;Ramakrishna R.S.
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07a
    • /
    • pp.46-48
    • /
    • 2005
  • 본 논문에서는 병렬 컴퓨팅의 스케줄링 시스템인 EASY Backfilling 알고리즘에 기반한 작업 패킹 기법의 최적화에 대해 논의한다. 이를 위해 최적의 작업 집합을 구성하기 위한 탐색 기법으로 유전자 알고리즘을 활용하여 작업 패킹을 효율적으로 수행함과 동시에, 적은 노드를 요청한 작업에 가중치를 부여함으로써 다수 작업의 동시 실행을 우선 고려하게 하였다. 스케줄링 정책은 컴퓨터 성능에 직접적인 영향을 미치는 요소이기 때문에 시스템 부하별로 각 워크로드의 평균 대기 시간을 측정한 실험을 통해 제안 기법이 전반적인 병렬 컴퓨팅의 성능을 개선함을 확인하였다.

  • PDF

A Scheme on High-Performance Caching and High-Capacity File Transmission for Cloud Storage Optimization (클라우드 스토리지 최적화를 위한 고속 캐싱 및 대용량 파일 전송 기법)

  • Kim, Tae-Hun;Kim, Jung-Han;Eom, Young-Ik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.8C
    • /
    • pp.670-679
    • /
    • 2012
  • The recent dissemination of cloud computing makes the amount of data storage to be increased and the cost of storing the data grow rapidly. Accordingly, data and service requests from users also increases the load on the cloud storage. There have been many works that tries to provide low-cost and high-performance schemes on distributed file systems. However, most of them have some weaknesses on performing parallel and random data accesses as well as data accesses of frequent small workloads. Recently, improving the performance of distributed file system based on caching technology is getting much attention. In this paper, we propose a CHPC(Cloud storage High-Performance Caching) framework, providing parallel caching, distributed caching, and proxy caching in distributed file systems. This study compares the proposed framework with existing cloud systems in regard to the reduction of the server's disk I/O, prevention of the server-side bottleneck, deduplication of the page caches in each client, and improvement of overall IOPS. As a results, we show some optimization possibilities on the cloud storage systems based on some evaluations and comparisons with other conventional methods.

Efficient Hardware Transactional Memory Scheme for Processing Transactions in Multi-core In-Memory Environment (멀티코어 인메모리 환경에서 트랜잭션을 처리하기 위한 효율적인 HTM 기법)

  • Jang, Yeonwoo;Kang, Moonhwan;Yoon, Min;Chang, Jaewoo
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.8
    • /
    • pp.466-472
    • /
    • 2017
  • Hardware Transactional Memory (HTM) has greatly changed the parallel programming paradigm for transaction processing. Since Intel has recently proposed Transactional Synchronization Extension (TSX), a number of studies based on HTM have been conducted. However, the existing studies support conflict prediction for a single cause of the transaction processing and provide a standardized TSX environment for all workloads. To solve the problems, we propose an efficient hardware transactional memory scheme for processing transactions in multi-core in-memory environment. First, the proposed scheme determines whether to use Software Transactional Memory (STM) or the serial execution as a fallback path of HTM by using a prediction matrix to collect the information of previously executed transactions. Second, the proposed scheme performs efficient transaction processing according to the characteristic of a given workload by providing a retry policy based on machine learning algorithms. Finally, through the experimental performance evaluation using Stanford transactional applications for multi-processing (STAMP), the proposed scheme shows 10~20% better performance than the existing schemes.

Hybrid Transactional Memory using Sampling-based Retry Policy in Multi-Core Environment (멀티코어 환경에서 샘플링 기반 재시도 정책을 이용한 하이브리드 트랜잭셔널 메모리)

  • Kang, Moon-Hwan;Jang, Yeon-Woo;Yoon, Min;Chang, Jae-Woo
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.2
    • /
    • pp.49-61
    • /
    • 2017
  • Transactional Memory (TM) has greatly changed the parallel programming paradigm for transaction processing and is classified into STM, HTM, HyTM according to hardware or software frameworks. However, the existing studies have a problem that they provide static retry policy for all workloads. To solve the problems, we propose an hybrid transactional memory scheme using sampling-based adaptive retry policy in multi-core environment. First, the proposed scheme determines whether to use STM or HTM according to the characteristic of a transaction. Otherwise, it executes HTM and STM concurrently by using a bloom filter. Second, the proposed scheme provides adaptive retry policy for HTM according to the characteristic of transactions in each workload. Finally, through the experimental performance evaluation using STAMP, the proposed scheme shows 10~20% better performance than the existing schemes.

Parallel Interworking Model and Performance Analysis of a Connection Control Protocol for Broadband Access Network (광대역 액세스 망을 위한 연결 제어 프로토콜의 병렬형 연동 모델과 성능 분석)

  • Kim, Chun-Hui;Cha, Yeong-Uk;Kim, Jae-Geun;Han, Gi-Jun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.12
    • /
    • pp.1529-1538
    • /
    • 1999
  • 광대역 액세스 망은 B-ISDN 환경에서 다양한 유형의 가입자들을 서비스 노드로 집선하는 기능을 수행한다. ITU-T의 SG13에서는 광대역 액세스 망과 서비스 노드 사이에서 ATM 자원의 동적인 할당 및 해제를 위하여 연결 제어 프로토콜의 표준화를 수행하고 있다. ATM 연결의 설정을 위한 연결 제어 프로토콜과 신호 프로토콜의 연동 기능은 서비스 노드에서 수행되며, ITU-T에서는 순차적 연동 모델을 채택하고 있다. 순차적 연동 모델은 SN에서 각 프로토콜의 절차가 순차적 방식으로 연동되므로 연결 지연이 크다. 본 논문에서는 광대역 액세스 망의 도입으로 인한 연결 설정 지연을 최소화하기 위하여 병렬형으로 수행되는 연동 모델을 제시하였다. 그리고 본 논문에서 제안한 병렬형 연동 모델의 성능 분석을 통하여 순차적 연동 모델과의 연결 설정 지연 및 완료비에 대한 비교, 분석을 수행하였다. 성능 분석에서 사용된 워크로드 파라미터는 RACE MAGIC 프로젝트의 결과를 적용하였다. Abstract In B-ISDN environment, various subscribers are concentrated into a service node via a broadband access network. The SG13 of ITU-T is standardizing a connection control protocol which provides dynamic allocation of ATM resources between an access network and a service node. To establish an ATM connection, interworking functions between the connection control protocol and the signaling protocol are performed at the service node. ITU-T adopts the sequential interworking model. In this paper, we propose the parallel interworking model which minimizes the overall connection setup delay by introducing the access network. Using the performance analysis, we compare our proposed parallel model with the existing sequential model in terms of connection setup delay and completion ratio. The workload parameters of RACE MAGIC project are applied to the analysis and simulation.