• 제목/요약/키워드: kernel threads

검색결과 11건 처리시간 0.018초

Kernel Thread Scheduling in Real-Time Linux for Wearable Computers

  • Kang, Dong-Wook;Lee, Woo-Joong;Park, Chan-Ik
    • ETRI Journal
    • /
    • 제29권3호
    • /
    • pp.270-280
    • /
    • 2007
  • In Linux, real-time tasks are supported by separating real-time task priorities from non-real-time task priorities. However, this separation of priority ranges may not be effective when real-time tasks make the system calls that are taken care of by the kernel threads. Thus, Linux is considered a soft real-time system. Moreover, kernel threads are configured to have static priorities for throughputs. The static assignment of priorities to kernel threads causes trouble for real-time tasks when real-time tasks require kernel threads to be invoked to handle the system calls because kernel threads do not discriminate between real-time and non-real-time tasks. We present a dynamic kernel thread scheduling mechanism with weighted average priority inheritance protocol (PIP), a variation of the PIP. The scheduling algorithm assigns proper priorities to kernel threads at runtime by monitoring the activities of user-level real-time tasks. Experimental results show that the algorithms can greatly improve the unexpected execution latency of real-time tasks.

  • PDF

Zephyr 커널에서 고정 시간 동기식 IPC 구현 (Fixed Time Synchronous IPC in Zephyr Kernel)

  • 정주영;김은영;신동하
    • 대한임베디드공학회논문지
    • /
    • 제12권4호
    • /
    • pp.205-212
    • /
    • 2017
  • Linux Foundation has announced a real-time kernel, called Zephyr, for IoT applications recently. Zephyr kernel provides synchronous and asynchronous IPC for data communication between threads. Synchronous IPC is useful for programming multi-threads that need to be executed synchronously, since the sender thread is blocked until the data is delivered to the receiver thread and the completion of data transfer can be known to two threads. In general, 'IPC execution time' is defined as the time duration between the sender thread sends data and the receiver thread receives the data sent. Especially, it is important that 'IPC execution time' in the synchronous IPC should be fixed in real-time kernel like Zephyr. However, we have found that the execution time of the synchronous IPC in Zephyr kernel increases in proportion to the number of threads executing in the kernel. In this paper, we propose a method to implement a fixed time synchronous IPC in Zephyr kernel using Direct Thread Switching(DTS) technique. Using the technique, the receiver thread executes directly after the sender thread sends a data during the remaining time slice of the sender thread and we can archive a fixed IPC execution time even when the number of threads executing in the kernel increases. In this paper, we implemented synchronous IPC using DTS in the Zephyr kernel and found the IPC execution time of the IPC is always 389 cycle that is relatively small and fixed.

실시간 서버 시스템에서 우선 순위 반전현상을 감소하기 위한 모델 (A Model for Reducing Priority Inversion in Real Time Server System)

  • 최대수;임종규;구용완
    • 한국정보처리학회논문지
    • /
    • 제6권11호
    • /
    • pp.3131-3139
    • /
    • 1999
  • Satisfying the rigid timing requirements of various real-time activities in real-time systems often requires some special methods to tune the systems run-time behaviors. Unbounded blocking can be caused when a high priority activity cannot preempt a low priority activity. In such situation, it is said that a priority inversion has occurred. The priority inversion is one of the problems which may prevent threads from meeting the deadlines in the real-time systems. It is difficult to remove such priority inversion problems in the kernel at the same time to bound the worst case blocking time for the threads. A thread is a piece of executable code which has access to data and stack. In this paper, a new real-time systems. It is difficult to remove such priority inversion problems in the kernel at the same time to bound the worst case blocking time for the threads. A threads is a piece of executable code which has access to data and stack. In this paper, a new real-time server model, which minimizes the duration of priority inversion, is proposed to reduce the priority inversion problem. The proposed server model provides a framework for building a better server structure, which can not only minimize the duration of the priority inversion, but also reduce the deadline miss ratio of higher priority threads.

  • PDF

커널 스레드 웹 가속기의 분석 (Analysis of Kernel-Thread Web Accelerator)

  • 황준;남의석;민병조;김학배
    • 한국컴퓨터산업교육학회:학술대회논문집
    • /
    • 한국컴퓨터산업교육학회 2003년도 제4회 종합학술대회 논문집
    • /
    • pp.17-22
    • /
    • 2003
  • The surge of Internet traffic makes the bottleneck nowadays. This problem can be reduced by substituting the media of network, routers and switches with more high-performance goods. However, we focused radically the server performance of processing the service requests. We prepose the method improving performance of server in the Linux kernel stack. This accelerator accepts the requests from many clients, and processes them using not user threads but kernel thread. To do so, we can reduce the overhead caused by frequent calling of system calls and the overhead of context switching between threads. Furthermore, we implement CPN(Coloured Petri Net) model. By using the CPN model criteria, we can analyze the characteristics of operation times in addition to the reachability of system. Benchmark of the system proves the model is valid.

  • PDF

DEVS 형식론을 이용한 다중프로세서 운영체제의 모델링 및 성능평가

  • 홍준성
    • 한국시뮬레이션학회:학술대회논문집
    • /
    • 한국시뮬레이션학회 1994년도 추계학술발표회 및 정기총회
    • /
    • pp.32-32
    • /
    • 1994
  • In this example, a message passing based multicomputer system with general interdonnedtion network is considered. After multicomputer systems are developed with morm-hole routing network, topologies of interconecting network are not major considertion for process management and resource sharing. Tehre is an independeent operating system kernel oneach node. It communicates with other kernels using message passingmechanism. Based on this architecture, the problem is how mech does performance degradation will occur in the case of processor sharing on multicomputer systems. Processor sharing between application programs is veryimprotant decision on system performance. In almost cases, application programs running on massively parallel computer systems are not so much user-interactive. Thus, the main performance index is system throughput. Each application program has various communication patterns. and the sharing of processors causes serious performance degradation in hte worst case such that one processor is shared by two processes and another processes are waiting the messages from those processes. As a result, considering this problem is improtant since it gives the reason whether the system allows processor sharingor not. Input data has many parameters in this simulation . It contains the number of threads per task , communication patterns between threads, data generation and also defects in random inupt data. Many parallel aplication programs has its specific communication patterns, and there are computation and communication phases. Therefore, this phase informatin cannot be obtained random input data. If we get trace data from some real applications. we can simulate the problem more realistic . On the other hand, simualtion results will be waseteful unless sufficient trace data with varisous communication patterns is gathered. In this project , random input data are used for simulation . Only controllable data are the number of threads of each task and mapping strategy. First, each task runs independently. After that , each task shres one and more processors with other tasks. As more processors are shared , there will be performance degradation . Form this degradation rate , we can know the overhead of processor sharing . Process scheduling policy can affects the results of simulation . For process scheduling, priority queue and FIFO queue are implemented to support round-robin scheduling and priority scheduling.

  • PDF

CUDA based parallel design of a shot change detection algorithm using frame segmentation and object movement

  • Kim, Seung-Hyun;Lee, Joon-Goo;Hwang, Doo-Sung
    • 한국컴퓨터정보학회논문지
    • /
    • 제20권7호
    • /
    • pp.9-16
    • /
    • 2015
  • This paper proposes the parallel design of a shot change detection algorithm using frame segmentation and moving blocks. In the proposed approach, the high parallel processing components, such as frame histogram calculation, block histogram calculation, Otsu threshold setting function, frame moving operation, and block histogram comparison, are designed in parallel for NVIDIA GPU. In order to minimize memory access delay time and guarantee fast computation, the output of a GPU kernel becomes the input data of another kernel in a pipeline way using the shared memory of GPU. In addition, the optimal sizes of CUDA processing blocks and threads are estimated through the prior experiments. In the experimental test of the proposed shot change detection algorithm, the detection rate of the GPU based parallel algorithm is the same as that of the CPU based algorithm, but the average of processing time speeds up about 6~8 times.

실시간 운영체계 Q+를 위한 C 표준 라이브러리 설계 및 구현 (The Design and Implementation of C Standard Library for RTOS Q+)

  • 김도형;박승민
    • 정보처리학회논문지A
    • /
    • 제8A권1호
    • /
    • pp.1-8
    • /
    • 2001
  • This paper describes the design and implementation of C standard library for real-time operating system Q+, that is being developed for the internet appliance. The C library in the real-time operating system should be defined according to the standard interface and support the concurrent execution of threads. The implemented C standard library is reentrant and follows POSIX.l standard interface. And, the C standard library functions, which are adequate to the Q+ application and commonly provided by commercial real-time operating systems, are selected among POSIX.l standard functions. The C standard library is implemented on the Q+ kernel and D-TV set-top box according to the implementation sequence, which is determined by analyzing the relation of function calls.

  • PDF

GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법 (Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU)

  • 김민철;이광엽
    • 예술인문사회 융합 멀티미디어 논문지
    • /
    • 제7권10호
    • /
    • pp.935-943
    • /
    • 2017
  • 많은 양의 데이터 기반으로 학습하는 neural network 중 이미지 분류나 음성 인식 등에 사용되어 지고 있는 CNN(Convolution neural network)는 현재까지도 우수한 성능을 가진 구조로 계속적으로 발전되고 있다. 제한된 자원을 가진 임베디드 시스템에서 활용하기에는 많은 어려움이 있다. 그래서 미리 학습된 가중치를 사용하지만 여전히 한계점이 있기 때문에 이를 해결하기 위해 GPU의 범용 연산을 위해서 사용하는 GP-GPU(General-Purpose computing on Graphics Processing Units)를 활용하는 추세다. CNN은 단순하고 반복적인 연산을 수행하기 때문에 SIMT(Single Instruction Multiple Thread)기반의 GPGPU에서 스레드 할당과 활용 방법에 따라 연산 속도가 많이 달라진다. 스레드로 Convolution 연산과 Pooling 연산을 수행할 때 쉬어야 하는 스레드가 발생하는 데 이러한 문제를 해결하기 위해 남은 스레드가 다음 피쳐맵과 커널 계산에 활용되는 방법을 사용함으로써 연산 속도를 증가시켰다.

리눅스 사용자 영역에 실시간성 제공을 위한 미들웨어 (Middleware to Support Real-Time in the Linux User-Space)

  • 이상길;이승율;이철훈
    • 한국콘텐츠학회논문지
    • /
    • 제16권5호
    • /
    • pp.217-228
    • /
    • 2016
  • 리눅스는 범용 운영체제로 스케줄링 특성 상 실시간성을 제공할 수 없는 단점이 있으며, 이를 해결하기 위해 RTiK-Linux을 통해 커널 영역에 실시간성을 지원했다. 하지만 RTiK-Linux 개발 초기 단계로 사용자 영역을 지원하지 않아 실시간성을 요구하는 응용프로그램 개발에 어려움이 있다. 본 논문에서는 RTiK-Linux를 개선하여 사용자 영역에 실시간성을 제공하는 RTiK미들웨어를 설계 및 구현한다. RTiK 미들웨어는 응용 프로그램에서 프로세스 정보와 요청 주기 등록한 뒤, 시그널을 통해 요청한 주기에 따라 사용자 영역에 API를 통해 실시간성을 제공한다. 구현한 RTiK미들웨어의 성능 검증 및 평가를 위해 RDTSC 명령어를 사용하여 생성된 실시간 쓰레드의 주기를 측정하였고, 유저 영역의 1ms 주기에서 오차 범위 내에서 정상 동작함을 확인하였다.

다중 GPU기반 홀로그램 생성을 위한 병렬처리 성능 최적화 기법 (An Optimization Method for Hologram Generation on Multiple GPU-based Parallel Processing)

  • 국중진
    • 스마트미디어저널
    • /
    • 제8권2호
    • /
    • pp.9-15
    • /
    • 2019
  • 홀로그램의 생성을 위한 연산은 포인트 클라우드의 규모에 따라 연산량이 기하급수적으로 증가하기 때문에 최근에는 다중의 GPU를 기반으로 CUDA 또는 OpenCL 라이브러리를 활용한 병렬처리가 이루어지고 있다. GPU기반의 병렬처리를 위한 CUDA 커널은 GPU의 코어 개수와 메모리 크기를 고려하여 쓰레드(thread), 블록(block), 그리드(grid)를 구성해야 하며, 다중 GPU 환경인 경우 GPU의 개수에 따른 그리드, 블록, 또는 쓰레드 단위의 분산처리가 필요하다. 본 논문에서는 CGH 생성에 대한 성능평가를 위해 포인트 클라우드의 포인트 개수를 10~1,000,000개 범위에서 점진적으로 증가시키면서 CPU, 단일 GPU, 다중 GPU 환경에서 연산 속도를 비교해 보았으며, 다중 GPU 환경에서 CGH(Computer Generated Hologram) 생성 연산을 가속화하기 위한 CUDA 기반의 병렬처리 과정에서 요구되는 메모리 구조 설계와 연산 방법을 제안한다.