• Title/Summary/Keyword: kernel threads

Search Result 11, Processing Time 0.021 seconds

Kernel Thread Scheduling in Real-Time Linux for Wearable Computers

  • Kang, Dong-Wook;Lee, Woo-Joong;Park, Chan-Ik
    • ETRI Journal
    • /
    • v.29 no.3
    • /
    • pp.270-280
    • /
    • 2007
  • In Linux, real-time tasks are supported by separating real-time task priorities from non-real-time task priorities. However, this separation of priority ranges may not be effective when real-time tasks make the system calls that are taken care of by the kernel threads. Thus, Linux is considered a soft real-time system. Moreover, kernel threads are configured to have static priorities for throughputs. The static assignment of priorities to kernel threads causes trouble for real-time tasks when real-time tasks require kernel threads to be invoked to handle the system calls because kernel threads do not discriminate between real-time and non-real-time tasks. We present a dynamic kernel thread scheduling mechanism with weighted average priority inheritance protocol (PIP), a variation of the PIP. The scheduling algorithm assigns proper priorities to kernel threads at runtime by monitoring the activities of user-level real-time tasks. Experimental results show that the algorithms can greatly improve the unexpected execution latency of real-time tasks.

  • PDF

Fixed Time Synchronous IPC in Zephyr Kernel (Zephyr 커널에서 고정 시간 동기식 IPC 구현)

  • Jung, Jooyoung;Kim, Eunyoung;Shin, Dongha
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.12 no.4
    • /
    • pp.205-212
    • /
    • 2017
  • Linux Foundation has announced a real-time kernel, called Zephyr, for IoT applications recently. Zephyr kernel provides synchronous and asynchronous IPC for data communication between threads. Synchronous IPC is useful for programming multi-threads that need to be executed synchronously, since the sender thread is blocked until the data is delivered to the receiver thread and the completion of data transfer can be known to two threads. In general, 'IPC execution time' is defined as the time duration between the sender thread sends data and the receiver thread receives the data sent. Especially, it is important that 'IPC execution time' in the synchronous IPC should be fixed in real-time kernel like Zephyr. However, we have found that the execution time of the synchronous IPC in Zephyr kernel increases in proportion to the number of threads executing in the kernel. In this paper, we propose a method to implement a fixed time synchronous IPC in Zephyr kernel using Direct Thread Switching(DTS) technique. Using the technique, the receiver thread executes directly after the sender thread sends a data during the remaining time slice of the sender thread and we can archive a fixed IPC execution time even when the number of threads executing in the kernel increases. In this paper, we implemented synchronous IPC using DTS in the Zephyr kernel and found the IPC execution time of the IPC is always 389 cycle that is relatively small and fixed.

A Model for Reducing Priority Inversion in Real Time Server System (실시간 서버 시스템에서 우선 순위 반전현상을 감소하기 위한 모델)

  • Choe, Dae-Su;Im, Jong-Gyu;Gu, Yong-Wan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.11
    • /
    • pp.3131-3139
    • /
    • 1999
  • Satisfying the rigid timing requirements of various real-time activities in real-time systems often requires some special methods to tune the systems run-time behaviors. Unbounded blocking can be caused when a high priority activity cannot preempt a low priority activity. In such situation, it is said that a priority inversion has occurred. The priority inversion is one of the problems which may prevent threads from meeting the deadlines in the real-time systems. It is difficult to remove such priority inversion problems in the kernel at the same time to bound the worst case blocking time for the threads. A thread is a piece of executable code which has access to data and stack. In this paper, a new real-time systems. It is difficult to remove such priority inversion problems in the kernel at the same time to bound the worst case blocking time for the threads. A threads is a piece of executable code which has access to data and stack. In this paper, a new real-time server model, which minimizes the duration of priority inversion, is proposed to reduce the priority inversion problem. The proposed server model provides a framework for building a better server structure, which can not only minimize the duration of the priority inversion, but also reduce the deadline miss ratio of higher priority threads.

  • PDF

Analysis of Kernel-Thread Web Accelerator (커널 스레드 웹 가속기의 분석)

  • Hwang June;Nahm EuiSeok;Min Byungjo;Kim Hagbae
    • 한국컴퓨터산업교육학회:학술대회논문집
    • /
    • 2003.11a
    • /
    • pp.17-22
    • /
    • 2003
  • The surge of Internet traffic makes the bottleneck nowadays. This problem can be reduced by substituting the media of network, routers and switches with more high-performance goods. However, we focused radically the server performance of processing the service requests. We prepose the method improving performance of server in the Linux kernel stack. This accelerator accepts the requests from many clients, and processes them using not user threads but kernel thread. To do so, we can reduce the overhead caused by frequent calling of system calls and the overhead of context switching between threads. Furthermore, we implement CPN(Coloured Petri Net) model. By using the CPN model criteria, we can analyze the characteristics of operation times in addition to the reachability of system. Benchmark of the system proves the model is valid.

  • PDF

DEVS 형식론을 이용한 다중프로세서 운영체제의 모델링 및 성능평가

  • 홍준성
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 1994.10a
    • /
    • pp.32-32
    • /
    • 1994
  • In this example, a message passing based multicomputer system with general interdonnedtion network is considered. After multicomputer systems are developed with morm-hole routing network, topologies of interconecting network are not major considertion for process management and resource sharing. Tehre is an independeent operating system kernel oneach node. It communicates with other kernels using message passingmechanism. Based on this architecture, the problem is how mech does performance degradation will occur in the case of processor sharing on multicomputer systems. Processor sharing between application programs is veryimprotant decision on system performance. In almost cases, application programs running on massively parallel computer systems are not so much user-interactive. Thus, the main performance index is system throughput. Each application program has various communication patterns. and the sharing of processors causes serious performance degradation in hte worst case such that one processor is shared by two processes and another processes are waiting the messages from those processes. As a result, considering this problem is improtant since it gives the reason whether the system allows processor sharingor not. Input data has many parameters in this simulation . It contains the number of threads per task , communication patterns between threads, data generation and also defects in random inupt data. Many parallel aplication programs has its specific communication patterns, and there are computation and communication phases. Therefore, this phase informatin cannot be obtained random input data. If we get trace data from some real applications. we can simulate the problem more realistic . On the other hand, simualtion results will be waseteful unless sufficient trace data with varisous communication patterns is gathered. In this project , random input data are used for simulation . Only controllable data are the number of threads of each task and mapping strategy. First, each task runs independently. After that , each task shres one and more processors with other tasks. As more processors are shared , there will be performance degradation . Form this degradation rate , we can know the overhead of processor sharing . Process scheduling policy can affects the results of simulation . For process scheduling, priority queue and FIFO queue are implemented to support round-robin scheduling and priority scheduling.

  • PDF

CUDA based parallel design of a shot change detection algorithm using frame segmentation and object movement

  • Kim, Seung-Hyun;Lee, Joon-Goo;Hwang, Doo-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.7
    • /
    • pp.9-16
    • /
    • 2015
  • This paper proposes the parallel design of a shot change detection algorithm using frame segmentation and moving blocks. In the proposed approach, the high parallel processing components, such as frame histogram calculation, block histogram calculation, Otsu threshold setting function, frame moving operation, and block histogram comparison, are designed in parallel for NVIDIA GPU. In order to minimize memory access delay time and guarantee fast computation, the output of a GPU kernel becomes the input data of another kernel in a pipeline way using the shared memory of GPU. In addition, the optimal sizes of CUDA processing blocks and threads are estimated through the prior experiments. In the experimental test of the proposed shot change detection algorithm, the detection rate of the GPU based parallel algorithm is the same as that of the CPU based algorithm, but the average of processing time speeds up about 6~8 times.

The Design and Implementation of C Standard Library for RTOS Q+ (실시간 운영체계 Q+를 위한 C 표준 라이브러리 설계 및 구현)

  • Kim, Do-Hyeong;Park, Seung-Min
    • The KIPS Transactions:PartA
    • /
    • v.8A no.1
    • /
    • pp.1-8
    • /
    • 2001
  • This paper describes the design and implementation of C standard library for real-time operating system Q+, that is being developed for the internet appliance. The C library in the real-time operating system should be defined according to the standard interface and support the concurrent execution of threads. The implemented C standard library is reentrant and follows POSIX.l standard interface. And, the C standard library functions, which are adequate to the Q+ application and commonly provided by commercial real-time operating systems, are selected among POSIX.l standard functions. The C standard library is implemented on the Q+ kernel and D-TV set-top box according to the implementation sequence, which is determined by analyzing the relation of function calls.

  • PDF

Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU (GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법)

  • Kim, Mincheol;Lee, Kwangyeob
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.10
    • /
    • pp.935-943
    • /
    • 2017
  • CNN (Convolution neural network), which is used for image classification and speech recognition among neural networks learning based on positive data, has been continuously developed to have a high performance structure to date. There are many difficulties to utilize in an embedded system with limited resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread (SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing Convolution and Pooling operations with threads. The remaining threads have increased the operation speed by using the method used in the following feature maps and kernel calculations.

Middleware to Support Real-Time in the Linux User-Space (리눅스 사용자 영역에 실시간성 제공을 위한 미들웨어)

  • Lee, Sang-Gil;Lee, Seung-Yul;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.5
    • /
    • pp.217-228
    • /
    • 2016
  • Linux it self does not support real-time. To solve this problem RTiK-Linux was designed to support real-time in the kernel space. However, since the user space does not support real-time, it is not easy to develop application. In this paper, we designed and implemented a RTiK-middleware to support real-time in the user space. RTiK-middleware provides real-time scheduling for user space through signal request period after to register process information with request period using apis on application. To evaluate the performance of the proposed RTiK-middleware, we measured the periods of generated real-time threads using RDTSC instructions, and verified that RTiK-middleware operates correctly within the error ranges of 1ms.

An Optimization Method for Hologram Generation on Multiple GPU-based Parallel Processing (다중 GPU기반 홀로그램 생성을 위한 병렬처리 성능 최적화 기법)

  • Kook, Joongjin
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.9-15
    • /
    • 2019
  • Since the computational complexity for hologram generation increases exponentially with respect to the size of the point cloud, parallel processing using CUDA and/or OpenCL library based on multiple GPUs has recently become popular. The CUDA kernel for parallelization needs to consist of threads, blocks, and grids properly in accordance with the number of cores and the memory size in the GPU. In addition, in case of multiple GPU environments, the distribution in grid-by-grid, in block-by-block, or in thread-by-thread is needed according to the number of GPUs. In order to evaluate the performance of CGH generation, we compared the computational speed in CPU, in single GPU, and in multi-GPU environments by gradually increasing the number of points in a point cloud from 10 to 1,000,000. We also present a memory structure design and a calculation method required in the CUDA-based parallel processing to accelerate the CGH (Computer Generated Hologram) generation operation in multiple GPU environments.