• Title/Summary/Keyword: thread scheduling

Search Result 34, Processing Time 0.019 seconds

Evaluation Of The Content-Based Packet Scheduling Policies On The Multithreaded Multiprocessor Network System

  • Yim Kangbin
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.39-41
    • /
    • 2004
  • In this paper, I propose a thread scheduling policy for faster packet processing on the network processors with multithreaded multiprocessor architecture. To implement the proposed policy, I derived several basic parameters related to the thread scheduling and included a new parameter representing the packet contents and the features of the multithreaded architecture. Through the empirical study using a network processor, I proved the proposed scheduling ploicy provides better throughput and load balancing compared to the generally used thread scheduling policy.

  • PDF

Multi-thread Scheduling for the Network Processor (네트워크 프로세서를 위한 다중 쓰레드 스케줄링)

  • Yim, Kang-Bin;Park, Jun-Ku;Jung, Gi-Hyun;Choi, Kyung-Hee
    • The KIPS Transactions:PartC
    • /
    • v.11C no.3
    • /
    • pp.337-344
    • /
    • 2004
  • In this paper, we propose a thread scheduling algorithm for faster packet processing on the network processors with multithreaded multiprocessor architecture. To implement the proposed algorithm. we derived several basic parameters related to the thread scheduling and included a new parameter representing the packet contents and the multithreaded architecture. Through the empirical study using a simulator, we proved the proposed scheduling algorithm provides better throughput and load balancing compared to the general thread scheduling algorithm.

Multicore Real-Time Scheduling to Reduce Inter-Thread Cache Interferences

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.1
    • /
    • pp.67-80
    • /
    • 2013
  • The worst-case execution time (WCET) of each real-time task in multicore processors with shared caches can be significantly affected by inter-thread cache interferences. The worst-case inter-thread cache interferences are dependent on how tasks are scheduled to run on different cores. Therefore, there is a circular dependence between real-time task scheduling, the worst-case inter-thread cache interferences, and WCET in multicore processors, which is not the case for single-core processors. To address this challenging problem, we present an offline real-time scheduling approach for multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model. Our evaluation indicates that the enhanced scheduling approach is more likely to generate feasible and safe schedules with stricter timing constraints in multicore real-time systems.

SimTBS: Simulator For GPGPU Thread Block Scheduling (SimTBS: GPGPU 스레드블록 스케줄링 시뮬레이터)

  • Cho, Kyung-Woon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.87-92
    • /
    • 2020
  • Although GPGPU (General-Purpose GPU) can maximize performance by parallelizing a task with tens of thousands of threads, those threads are internally grouped into a thread block, which is a base unit for processing and resource allocation. A thread block scheduler is a specialized hardware gadget whose role is to allocate thread blocks to GPGPU processing hardware in a round-robin manner. However, round-robin is a sequential allocation policy and is not optimized for GPGPU resource utilization. In this paper, we propose a thread block scheduler model which can analyze and quantify performances for various thread block scheduling policies. Experiment results from the implemented simulator of our model show that the legacy hardware thread block scheduling does not behave well when workload becomes heavy.

An IPC-based Dynamic Cooperative Thread Array Scheduling Scheme for GPUs

  • Son, Dong Oh;Kim, Jong Myon;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.2
    • /
    • pp.9-16
    • /
    • 2016
  • Recently, many research groups have focused on GPGPUs in order to improve the performance of computing systems. GPGPUs can execute general-purpose applications as well as graphics applications by using parallel GPU hardware resources. GPGPUs can process thousands of threads based on warp scheduling and CTA scheduling. In this paper, we utilize the traditional CTA scheduler to assign a various number of CTAs to SMs. According to our simulation results, increasing the number of CTAs assigned to the SM statically does not improve the performance. To solve the problem in traditional CTA scheduling schemes, we propose a new IPC-based dynamic CTA scheduling scheme. Compared to traditional CTA scheduling schemes, the proposed dynamic CTA scheduling scheme can increase the GPU performance by up to 13.1%.

Shortest-Frame-First Scheduling Algorithm of Threads On Multithreaded Models (다중스레드 모델에서 최단 프레임 우선 스레드 스케줄링 알고리즘)

  • Sim, Woo-Ho;Yoo, Weon-Hee;Yang, Chang-Mo
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.5
    • /
    • pp.575-582
    • /
    • 2000
  • Because FIFO thread scheduling used in the existing multithreaded models does not consider locality in programs, it may result in the decrease of the performance of execution, caused by the frequent context switching overhead and delay of execution of relatively short frames. Quantum unit scheduling enhances the performance a little, but it still has the problems such as the decrease in the processor utilization and the longer delay due to its heavy dependency on the priority of the quantum units. In this paper, we propose shortest-frame-first(SFF) thread scheduling algorithm. Our algorithm selects and schedules the frame that is expected to take the shortest execution time using thread size and synchronization information analyzed at compile-time. We can estimate the relative execution time of each frame at compile-time. Using SFF thread scheduling algorithm on the multithreaded models, we can expect the faster execution, better utilization of the processor, increased throughput and short waiting time compared to FIFO scheduling.

  • PDF

Thread Block Scheduling for Multi-Workload Environments in GPGPU (다중 워크로드 환경을 위한 GPGPU 스레드 블록 스케줄링)

  • Park, Soyeon;Cho, Kyung-Woon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.2
    • /
    • pp.71-76
    • /
    • 2022
  • Round-robin is widely used for the scheduling of large-scale parallel workloads in the computing units of GPGPU. Round-robin is easy to implement by sequentially allocating tasks to each computing unit, but the load balance between computing units is not well achieved in multi-workload environments like cloud. In this paper, we propose a new thread block scheduling policy to resolve this situation. The proposed policy manages thread blocks generated by various GPGPU workloads with multiple queues based on their computation loads and tries to maximize the resource utilization of each computing unit by selecting a thread block from the queue that can maximally utilize the remaining resources, thereby inducing load balance between computing units. Through simulation experiments under various load environments, we show that the proposed policy improves the GPGPU performance by 24.8% on average compared to Round-robin.

A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization (GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법)

  • Thuan, Do Cong;Choi, Yong;Kim, Jong Myon;Kim, Cheol Hong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.5
    • /
    • pp.219-230
    • /
    • 2017
  • General-Purpose Graphics Processing Units (GPGPUs) build massively parallel architecture and apply multithreading technology to explore parallelism. By using programming models like CUDA, and OpenCL, GPGPUs are becoming the best in exploiting plentiful thread-level parallelism caused by parallel applications. Unfortunately, modern GPGPU cannot efficiently utilize its available hardware resources for numerous general-purpose applications. One of the primary reasons is the inefficiency of existing warp/thread block schedulers in hiding long latency instructions, resulting in lost opportunity to improve the performance. This paper studies the effects of hardware thread scheduling policy on GPGPU performance. We propose a novel warp scheduling policy that can alleviate the drawbacks of the traditional round-robin policy. The proposed warp scheduler first classifies the warps of a thread block into two groups, warps with long latency and warps with short latency and then schedules the warps with long latency before the warps with short latency. Furthermore, to support the proposed warp scheduler, we also propose a supplemental technique that can dynamically reduce the number of streaming multiprocessors to which will be assigned thread blocks when encountering a high contention degree at the memory and interconnection network. Based on our experiments on a 15-streaming multiprocessor GPGPU platform, the proposed warp scheduling policy provides an average IPC improvement of 7.5% over the baseline round-robin warp scheduling policy. This paper also shows that the GPGPU performance can be improved by approximately 8.9% on average when the two proposed techniques are combined.

Adaptive Memory Controller for High-performance Multi-channel Memory

  • Kim, Jin-ku;Lim, Jong-bum;Cho, Woo-cheol;Shin, Kwang-Sik;Kim, Hoshik;Lee, Hyuk-Jun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.6
    • /
    • pp.808-816
    • /
    • 2016
  • As the number of CPU/GPU cores and IPs in SOC increases and applications require explosive memory bandwidth, simultaneously achieving good throughput and fairness in the memory system among interfering applications is very challenging. Recent works proposed priority-based thread scheduling and channel partitioning to improve throughput and fairness. However, combining these different approaches leads to performance and fairness degradation. In this paper, we analyze the problems incurred when combining priority-based scheduling and channel partitioning and propose dynamic priority thread scheduling and adaptive channel partitioning method. In addition, we propose dynamic address mapping to further optimize the proposed scheme. Combining proposed methods could enhance weighted speedup and fairness for memory intensive applications by 4.2% and 10.2% over TCM or by 19.7% and 19.9% over FR-FCFS on average whereas the proposed scheme requires space less than TCM by 8%.

Kernel Thread Scheduling in Real-Time Linux for Wearable Computers

  • Kang, Dong-Wook;Lee, Woo-Joong;Park, Chan-Ik
    • ETRI Journal
    • /
    • v.29 no.3
    • /
    • pp.270-280
    • /
    • 2007
  • In Linux, real-time tasks are supported by separating real-time task priorities from non-real-time task priorities. However, this separation of priority ranges may not be effective when real-time tasks make the system calls that are taken care of by the kernel threads. Thus, Linux is considered a soft real-time system. Moreover, kernel threads are configured to have static priorities for throughputs. The static assignment of priorities to kernel threads causes trouble for real-time tasks when real-time tasks require kernel threads to be invoked to handle the system calls because kernel threads do not discriminate between real-time and non-real-time tasks. We present a dynamic kernel thread scheduling mechanism with weighted average priority inheritance protocol (PIP), a variation of the PIP. The scheduling algorithm assigns proper priorities to kernel threads at runtime by monitoring the activities of user-level real-time tasks. Experimental results show that the algorithms can greatly improve the unexpected execution latency of real-time tasks.

  • PDF