• Title/Summary/Keyword: CPU Processing Time

Search Result 332, Processing Time 0.023 seconds

Real time simulation using multiple DSPs for fossil power plants (병렬처리를 이용한 화력발전소의 실시간 시뮬레이션)

  • 박희준;김병국
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.480-483
    • /
    • 1997
  • A fossil power plant can be modeled by a lot of algebraic equations and differential equations. When we simulate a large, complicated fossil power plant by a computer such as workstation or PC, it takes much time until overall equations are completely calculated. Therefore, new processing systems which have high computing speed is ultimately needed to develope real-time simulators. Vital points of real-time simulators are accuracy, computing speed, and deadline observing. In this paper, we present a enhanced strategy in which we can provide powerful computing power by parallel processing of DSP processors with communication links. We designed general purpose DSP modules, and a VME interface module. Because the DSP module is designed for general purpose, we can easily expand the parallel system by just connecting new DSP modules to the system. Additionally we propose methods about downloading programs, initial data to each DSP module via VME bus, DPRAM and processing sequences about computing and updating values between DSP modules and CPU30 board when the simulator is working.

  • PDF

Parallelized Particle Swarm Optimization with GPU for Real-Time Ballistic Target Tracking (실시간 탄도 궤적 목표물 추적을 위한 GPU 기반 병렬적 입자군집최적화 기법)

  • Yunho, Han;Heoncheol, Lee;Hyeokhoon, Gwon;Wonseok, Choi;Bora, Jeong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.6
    • /
    • pp.355-365
    • /
    • 2022
  • This paper addresses the problem of real-time tracking a high-speed ballistic target. Particle filters can be considered to overcome the nonlinearity in motion and measurement models in the ballistic target. However, it is difficult to apply particle filters to real-time systems because particle filters generally require much computation time. This paper proposes an accelerated particle filter using graphics processing unit (GPU) for real-time ballistic target tracking. The real-time performance of the proposed method was tested and analyzed on a widely-used embedded system. The comparison results with the conventional particle filter on CPU (central processing unit) showed that the proposed method improved the real-time performance by reducing computation time significantly.

QoS-Aware Power Management of Mobile Games with High-Load Threads (CPU 부하가 큰 쓰레드를 가진 모바일 게임에서 QoS를 고려한 전력관리 기법)

  • Kim, Minsung;Kim, Jihong
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.328-333
    • /
    • 2017
  • Mobile game apps, which are popular in various mobile devices, tend to be power-hungry and rapidly drain the device's battery. Since a long battery lifetime is a key design requirement of mobile devices, reducing the power consumption of mobile game apps has become an important research topic. In this paper, we investigate the power consumption characteristics of popular mobile games with multiple threads, focusing on the inter-thread. From our power measurement study of popular mobile game apps, we observed that some of these apps have abnormally high-load threads that barely affect the user's gaming experience, despite the high energy consumption. In order to reduce the wasted power from these abnormal threads, we propose a novel technique that detects such abnormal threads during run time and reduces their power consumption without degrading user experience. Our experimental results on an Android smartphone show that the proposed technique can reduce the energy consumption of mobile game apps by up to 58% without any negative impact on the user's gaming experience.

Efficient Task Distribution for Pig Monitoring Applications Using OpenCL (OpenCL을 이용한 돈사 감시 응용의 효율적인 태스크 분배)

  • Kim, Jinseong;Choi, Younchang;Kim, Jaehak;Chung, Yeonwoo;Chung, Yongwha;Park, Daihee;Kim, Hakjae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.10
    • /
    • pp.407-414
    • /
    • 2017
  • Pig monitoring applications consisting of many tasks can take advantage of inherent data parallelism and enable parallel processing using performance accelerators. In this paper, we propose a task distribution method for pig monitoring applications into a heterogenous computing platform consisting of a multicore-CPU and a manycore-GPU. That is, a parallel program written in OpenCL is developed, and then the most suitable processor is determined based on the measured execution time of each task. The proposed method is simple but very effective, and can be applied to parallelize other applications consisting of many tasks on a heterogeneous computing platform consisting of a CPU and a GPU. Experimental results show that the performance of the proposed task distribution method on three different heterogeneous computing platforms can improve the performance of the typical GPU-only method where every tasks are executed on a deviceGPU by a factor of 1.5, 8.7 and 2.7, respectively.

Analysis of Tensor Processing Unit and Simulation Using Python (텐서 처리부의 분석 및 파이썬을 이용한 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.165-171
    • /
    • 2019
  • The study of the computer architecture has shown that major improvements in price-to-energy performance stems from domain-specific hardware development. This paper analyzes the tensor processing unit (TPU) ASIC which can accelerate the reasoning of the artificial neural network (NN). The core device of the TPU is a MAC matrix multiplier capable of high-speed operation and software-managed on-chip memory. The execution model of the TPU can meet the reaction time requirements of the artificial neural network better than the existing CPU and the GPU execution models, with the small area and the low power consumption even though it has many MAC and large memory. Utilizing the TPU for the tensor flow benchmark framework, it can achieve higher performance and better power efficiency than the CPU or CPU. In this paper, we analyze TPU, simulate the Python modeled OpenTPU, and synthesize the matrix multiplication unit, which is the key hardware.

The Design of Parallel Processing S/W Using CUDA for Realtime 3D Laser Ladar Imaging System (실시간 3차원 레이저 레이더 영상 생성을 위한 CUDA 기반 병렬처리 소프트웨어 설계)

  • Cho, Yong Il;Ha, Choong Lim;Yang, Ji Hyeon;Kim, Jae Hyup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.1
    • /
    • pp.1-10
    • /
    • 2013
  • In this paper, we propose a CUDA(Common Unified Device Architecture) based SW(software) design method for CPU(Central Processing Unit) and GPU(Graphic Processing Unit) parallel structure to implement real-time process in 3D Laser ladar(LADAR) imaging system. LADAR is a complex system to generate 3-dimensional image based on the laser ranging information, and requires massive process resources in each phase. Therefore, designing and implementing parallel structure are crucial to realize a real-time process within limited system resource. As a conclusion, we can meet the speed of required real-time process allocating separable work load to CUDA GPU by analyzing process algorithm in each phase and confirm the process speed increase by 46%.

Design and Implementation of A Dual CPU Based Embedded Web Camera Streaming Server (Dual CPU 기반 임베디드 웹 카메라 스트리밍 서버의 설계 및 구현)

  • 홍진기;문종려;백승걸;정선태
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.417-420
    • /
    • 2003
  • Most Embedded Web Camera Server products currently deployed on the market adopt JPEG for compression of video data continuously acquired from the cameras. However, JPEG does not efficiently compress the continuous video stream, and is not appropriate for the Internet where the transmission bandwidth is not guaranteed. In our previous work, we presented the result of designing and implementing an embedded web camera streaming server using MPEG4 codec. But the server in our previous work did not show good performance since one CPU had to both compress and process the network transmission. In this paper, we present our efforts to improve our previous result by using dual CPUs, where DSP is employed for data compression and StrongARM is used for network processing. Better performance has been observed, but it is found that still more time is needed to optimize the performance.

  • PDF

Real-time 3D Modeling using GPU and CPU in parallel processing (GPU와 CPU의 병렬처리를 이용한 실시간 3D 모델링)

  • Baek, Woon-Hyuk;Kyoung, Dong-Wuk;Han, Eun-Jung;Yang, Jong-Yeol;Jung, Kee-Chul
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.557-561
    • /
    • 2006
  • 3D 모델링 기술은 가상현실, 실감형 인터랙티브 등에서 많은 연구가 진행되고 있다. 실시간 3D 모델을 생성하는 연구는 많은 계산량으로 인해서 여러 대의 PC를 통합한 PC클러스터를 사용하고 있다. PC클러스터는 여러 대의 PC를 하나의 고성능 컴퓨터로 처리가 가능하지만, 여러 대의 PC를 효율적으로 제어 하는 문제와 고비용의 문제를 안고 있다. 본 논문은 한 대의 PC에서 멀티 코어를 동시에 수행하는 병렬처리 방법과 높은 계산 능력을 자랑하는 GPU와 CPU의 병렬처리 방법을 사용하여 한 대의 컴퓨터로 실시간 3D 모델 생성방법을 제안한다.

  • PDF

Real-Time Implementation of MPEG-1 Audio decoder on ARM RISC (ARM RISC 상에서의 MPEG-1 Audio decoder의 실시간 구현)

  • 김선태
    • Proceedings of the IEEK Conference
    • /
    • 2000.11d
    • /
    • pp.119-122
    • /
    • 2000
  • Recently, many complex DSP (Digital Signal Processing) algorithms have being realized on RISC CPU due to good compilation, low power consumption and large memory space. But, real-time implementation of multiple DSP algorithms on RISC requires the minimum and efficient memory usage and the lower occupancy of CPU. In this thesis, the original floating-point code of MPEG-1 audio decoder is converted to the fixed-point code and then optimized to the efficient assembly code in time-consuming function in accord with RISC feature. Finally, compared with floating-point and fixed-point, about 30 and 3 times speed enhancements are achieved respectively. And 3~4 times memory spaces are spared.

  • PDF

Implementation of GPU Acceleration of Object Detection Application with Drone Video (드론 영상 대상 물체 검출 어플리케이션의 GPU가속 구현)

  • Park, Si-Hyun;Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.20 no.3
    • /
    • pp.117-119
    • /
    • 2021
  • With the development of the industry, the use of drones in specific mission flight is being actively studied. These drones fly a specified path and perform repetitive tasks. if the drone system will detect objects in real time, the performance of these mission flight will increase. In this paper, we implement object detection system and mount GPU acceleration to maximize the efficiency of limited device resources with drone video using Tensorflow Lite which enables in-device inference from a mobile device and Mobile SDK of DJI, a drone manufacture. For performance comparison, the average processing time per frame was measured when object detection was performed using only the CPU and when object detection was performed using the CPU and GPU at the same time.