• Title/Summary/Keyword: thread scheduling

Search Result 34, Processing Time 0.022 seconds

Performance Evaluation of ARX Thread Library in Java Virtual Machine (자바 가상 머신을 통한 ARX 쓰레드 라이브러리의 성능 측정)

  • 서양민;박정근;김기정;홍기정
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.157-159
    • /
    • 1998
  • 쓰레드는 프로그램의 동시성을 표현하는데 적합하고, 프로세서 모델에 비하여 동기화나 문맥교환의 비용을 줄일 수 있어 기존의 멀티 프로세스 프로그래밍을 대체하고 있다. 운영체계에서 멀티쓰레팅 환경의 제공은 이제 필수적이며, 좋은 성능을 위해서는 운영체계의 지원이 필요하다. ARX 실시간 운영체계는 유저 레벨 멀티쓰레팅을 지원하고 있으며 쓰레드의 성능을 높이고 유저 레벨에서 실시간 스케쥴링이 가능하도록 하기 위하여 동적 가상 쓰레드 바인딩(dynamic virtual stack binding)과 스케쥴링 이벤트 업콜(scheduling event upcall)등의 기법을 지원한다. 본 논문에서는 자바 가상 머신을 통하여 ARX 운영체계의 쓰레드 라이브러리의 성능을 측정하고 다른 운영체계의 멀티쓰레드 라이브러리와 성능 비교를 하였다. 실험결과 ARX 가 제공하는 쓰레드 라이브러리가 다른 운영체계에 비해 우수한 성능을 보여줌을 확인하였다.

  • PDF

The thread scheduling method based on the priority of threads on the multithread models (다중 스레드 모델에서 스레드 우선 순위에 따른 스레드 스케쥴링 기법)

  • 이정호;고훈준;양창모;유원희
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10c
    • /
    • pp.659-661
    • /
    • 2000
  • 폰 노이만 모델의 지역성과 데이터플로우 모델의 병렬성을 결합하여 등장한 모델이 다중 스레드 모델이다. 다중스레드 모델의 목적은 통신시간과 계산 시간을 겹침으로써 프로세서의 활용도를 높이고자 하는 것이다. 기존의 대부분의 다중 스레드 모델의 스레드 스케쥴링 기법은 FIFO 혹은 FILO 방식을 사용하고 있다. 본 논문에서는 프로세서의 활용도를 높이고 프로세서의 휴지 시간을 줄이기 위해서 원격 함수 호출 혹은 원격 메모리 참조 기능의 스레드(이후로는 원격 스레드라 부름)와 계산 기능의 스레드가 동시에 활성화되었을 때 원격 스레드들을 먼저 수행하는 것이 프로세서의 지연 시간을 줄이고 병렬성을 높이는 데 효과적임을 제안한다. 이것을 구현하기 위해서 프레임 내부의 지속 벡터(CV)를 CCV(call continuation vector)와 LCV(local continuation vector) 둘로 구분하였다. 스레드가 활성화될 때 CCV에는 원격 스레드들을, LCV에는 계산 스레드들을 저장한 후, CCV에 저장된 스레드들을 먼저 수행하고 LCV를 나중에 수행한다.

  • PDF

A new warp scheduling technique for improving the performance of GPUs by utilizing MSHR information (GPU 성능 향상을 위한 MSHR 정보 기반 워프 스케줄링 기법)

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.72-83
    • /
    • 2017
  • GPUs can provide high throughput with latency hiding by executing many warps in parallel. MSHR(Miss Status Holding Registers) for L1 data cache tracks cache miss requests until required data is serviced from lower level memory. In recent GPUs, excessive requests for cache resources cause underutilization problem of GPU resources due to cache resource reservation fails. In this paper, we propose a new warp scheduling technique to reduce stall cycles under MSHR resource shortage. Cache miss rates for each warp is predicted based on the observation that each warp shows similar cache miss rates for long period. The warps showing low miss rates or computation-intensive warps are given high priority to be issued when MSHR is full status. Our proposal improves GPU performance by utilizing cache resource more efficiently based on cache miss rate prediction and monitoring the MSHR entries. According to our experimental results, reservation fail cycles can be reduced by 25.7% and IPC is increased by 6.2% with the proposed scheduling technique compared to loose round robin scheduler.

Implementation and Performance analysis of a Framework to Support Real-Time of Robot Components (로봇 컴포넌트에 실시간성을 지원하기 위한 프레임워크 구현 및 성능분석)

  • Choi, Chan-Woo;Cho, Moon-Haeng;Park, Seong-Jong;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.4
    • /
    • pp.81-94
    • /
    • 2009
  • In ubiquitous environments, the real-time features are necessary to insure the QoS of the intelligent service robots. In this paper, we design and implement a real-time framework for intelligent service robots to support real-time features. The real-time framework to support real-time scheduling services is implemented on the general operating systems. We solve the problem that the scheduler of a general operating system can not support real-time features. This paper also proposes realtime scheduling services to guarantee the QoS of real-time robot applications. We implemented the proposed real-time framework on the Windows operating system and conducted some performance experiments. The experimental results show that the proposed real-time framework can improve thread response times and it has slight performance overhead of $62{\mu}s$.

[ ${\mu}TMO$ ] Model based Real-Time Operating System for Sensor Network (${\mu}TMO$ 모델 기반 실시간 센서 네트워크 운영체제)

  • Yi, Jae-An;Heu, Shin;Choi, Byoung-Kyu
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.12
    • /
    • pp.630-640
    • /
    • 2007
  • As the range of sensor network's applicability is getting wider, it creates new application areas which is required real-time operation, such as military and detection of radioactivity. However, existing researches are focused on effective management for resources, existing sensor network operating system cannot support to real-time areas. In this paper, we propose the ${\mu}TMO$ model which is lightweight real-time distributed object model TMO. We design the real-time sensor network operation system ${\mu}TMO-NanoQ+$ which is based on ETRI's sensor network operation system Nano-Q+. We modify the Nano-Q+'s timer module to support high resolution and apply Context Switch Threshold, Power Aware scheduling techniques to realize lightweight scheduler which is based on EDF. We also implement channel based communication way ITC-Channel and periodic thread management module WTMT.

A Parallelization Technique with Integrated Multi-Threading for Video Decoding on Multi-core Systems

  • Hong, Jung-Hyun;Kim, Won-Jin;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.10
    • /
    • pp.2479-2496
    • /
    • 2013
  • Increasing demand for Full High-Definition (FHD) video and Ultra High-Definition (UHD) video services has led to active research on high speed video processing. Widespread deployment of multi-core systems has accelerated studies on high resolution video processing based on parallelization of multimedia software. Even if parallelization of a specific decoding step may improve decoding performance partially, such partial parallelization may not result in sufficient performance improvement. Particularly, entropy decoding has often been considered separately from other decoding steps since the entropy decoding step could not be parallelized easily. In this paper, we propose a parallelization technique called Integrated Multi-Threaded Parallelization (IMTP) which takes parallelization of the entropy decoding step, with other decoding steps, into consideration in an integrated fashion. We used the Simultaneous Multi-Threading (SMT) technique with appropriate thread scheduling techniques to achieve the best performance for the entire decoding step. The speedup of the proposed IMTP method is up to 3.35 times faster with respect to the entire decoding time over a conventional decoding technique for H.264/AVC videos.

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

  • Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2648-2668
    • /
    • 2016
  • Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.

Parallel LDPC Decoder for CMMB on CPU and GPU Using OpenCL (OpenCL을 활용한 CPU와 GPU 에서의 CMMB LDPC 복호기 병렬화)

  • Park, Joo-Yul;Hong, Jung-Hyun;Chung, Ki-Seok
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.6
    • /
    • pp.325-334
    • /
    • 2016
  • Recently, Open Computing Language (OpenCL) has been proposed to provide a framework that supports heterogeneous computing platforms. By using an OpenCL framework, digital communication systems can support various protocols in a unified computing environment to achieve both high portability and high performance. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes for China Multimedia Mobile Broadcasting (CMMB) on a heterogeneous platform. Each step of LDPC decoding has different parallelization characteristics. In this paper, steps suitable for task-level parallelization are executed on the CPU, and steps suitable for data-level parallelization are processed by the GPU. To improve the performance of the proposed OpenCL kernels for LDPC decoding operations, explicit thread scheduling, loop-unrolling, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance by using heterogeneous multi-core processors on a unified computing framework.

Realization of CAT Interface supporting Multitask (다중처리를 지원하는 CAT 인터페이스에 관한 연구)

  • 전동근;노승환;차균현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.12
    • /
    • pp.1423-1436
    • /
    • 1992
  • In the paper, a CAT interface supporting multitask is realized. To interface a computer with measuring instruments, a GPIB card is designed and implemented. Controlling and displaying software using OOP and GUI are programmed with C++. A spectrum analyzer and a power meter are chosen as object instrument to be controlled. Total 9 modules are configured to manage the various resources and each module in integrated system. Also in case that several instruments are used, the system is realized to be capable of multitasking to exchange the data mutually. The multitasking is implemented under the time-sharing DOS environment. Thread-based method is used for processing, and Round Robin method for scheduling. Provided proper software modules for other object instruments are integrated, the system can control more measuring instruments simultaneously by the computer. Users can save the time and errors even without expert knowledge.

  • PDF

Java Garbage Collection for a Small Interactive System (소규모 대화형 시스템을 위한 자바 가비지 콜렉션)

  • 권혜은;김상훈
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.12
    • /
    • pp.957-965
    • /
    • 2002
  • Garbage collection in the CLDC typically employs a stop-the-world GC algorithm which is performing a complete garbage collection when needed. This technique is unsuitable for the interactive Java embedded system because this can lead to long and unpredictable delays. In this paper, We present a garbage collection algorithm which reduces the average delay time and supports the interactive environment. Our garbage collector is composed of the allocator and the collector. The allocator determines the allocation position of free-list according to object size, and the collector uses an incremental mark-sweep algorithm. The garbage collector is called periodically by the thread scheduling policy and the allocator allocates the objects of marked state during collection cycle. Also, we introduce a color toggle mechanism that changes the meaning of the bit patterns at the end of the collection cycle. We compared the performance of our implementation with stop-the-world mark-sweep GC. The experimental results show that our algorithm reduces the average delay time and that it provides uniformly low response times.