• Title/Summary/Keyword: Multi-core CPU

Search Result 76, Processing Time 0.023 seconds

An Improving Method of Android Boot Speed in Multi-core based Embedded System (멀티코어 기반의 임베디드 시스템에서 안드로이드 부팅 속도 향상 방법)

  • Choi, Jin-Yong;Lee, Jae-Heung
    • Journal of IKEEE
    • /
    • v.17 no.4
    • /
    • pp.564-569
    • /
    • 2013
  • The current embedded devices are growing rapidly in the multi-core, and these demand fast boot time. But method of previous boot uses core only one. The method includes parallel techniques and modification of CPU Frequency policy. Parallel methods, after analyzing the Android boot process with analysis tool, applied to location where a lot of CPU operation. CPU Frequency policy is modified for high performance of core. The proposed method was applied to S5PV310 dual core and Exynos4412 quad core embedded system. As a result of the experiment, we found that the proposed method makes boot time fast about 20.71% and 31.34% in dual core and quad core environment as compared with the previous method.

Design Considerations on Large-scale Parallel Finite Element Code in Shared Memory Architecture with Multi-Core CPU (멀티코어 CPU를 갖는 공유 메모리 구조의 대규모 병렬 유한요소 코드에 대한 설계 고려 사항)

  • Cho, Jeong-Rae;Cho, Keunhee
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.30 no.2
    • /
    • pp.127-135
    • /
    • 2017
  • The computing environment has changed rapidly to enable large-scale finite element models to be analyzed at the PC or workstation level, such as multi-core CPU, optimal math kernel library implementing BLAS and LAPACK, and popularization of direct sparse solvers. In this paper, the design considerations on a parallel finite element code for shared memory based multi-core CPU system are proposed; (1) the use of optimized numerical libraries, (2) the use of latest direct sparse solvers, (3) parallelism using OpenMP for computing element stiffness matrices, and (4) assembly techniques using triplets, which is a type of sparse matrix storage. In addition, the parallelization effect is examined on the time-consuming works through a large scale finite element model.

Accelerating 2D DCT in Multi-core and Many-core Environments (멀티코어와 매니코어 환경에서의 2 차원 DCT 가속)

  • Hong, Jin-Gun;Jung, Sung-Wook;Kim, Cheong-Ghil;Burgstaller, Bernd
    • Annual Conference of KIPS
    • /
    • 2011.04a
    • /
    • pp.250-253
    • /
    • 2011
  • Chip manufacture nowadays turned their attention from accelerating uniprocessors to integrating multiple cores on a chip. Moreover desktop graphic hardware is now starting to support general purpose computation. Desktop users are able to use multi-core CPU and GPU as a high performance computing resources these days. However exploiting parallel computing resources are still challenging because of lack of higher programming abstraction for parallel programming. The 2-dimensional discrete cosine transform (2D-DCT) algorithms are most computational intensive part of JPEG encoding. There are many fast 2D-DCT algorithms already studied. We implemented several algorithms and estimated its runtime on multi-core CPU and GPU environments. Experiments show that data parallelism can be fully exploited on CPU and GPU architecture. We expect parallelized DCT bring performance benefit towards its applications such as JPEG and MPEG.

VDI deployment and performance analysys for multi-core-based applications (멀티코어 기반 어플리케이션 운용을 위한 데스크탑 가상화 구성 및 성능 분석)

  • Park, Junyong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.10
    • /
    • pp.1432-1440
    • /
    • 2022
  • Recently, as Virtual Desktop Infrastructure(VDI) is widely used not only in office work environments but also in workloads that use high-spec multi-core-based applications, the requirements for real-time and stability of VDI are increasing. Accordingly, the display protocol used for remote access in VDI and performance optimization of virtual machines have also become more important. In this paper, we propose two ways to configure desktop virtualization for multi-core-based application operation. First, we propose a codec configuration of a display protocol with optimal performance in a high load situation due to multi-processing. Second, we propose a virtual CPU scheduling optimization method to reduce scheduling delay in case of CPU contention between virtual machines. As a result of the test, it was confirmed that the H.264 codec of Blast Extreme showed the best and stable frame, and the scheduling performance of the virtual CPU was improved through scheduling optimization.

The development of parallel computation method for the fire-driven-flow in the subway station (도시철도역사에서 화재유동에 대한 병렬계산방법연구)

  • Jang, Yong-Jun;Lee, Chang-Hyun;Kim, Hag-Beom;Park, Won-Hee
    • Proceedings of the KSR Conference
    • /
    • 2008.06a
    • /
    • pp.1809-1815
    • /
    • 2008
  • This experiment simulated the fire driven flow of an underground station through parallel processing method. Fire analysis program FDS(Fire Dynamics Simulation), using LES(Large Eddy Simulation), has been used and a 6-node parallel cluster, each node with 3.0Ghz_2set installed, has been used for parallel computation. Simulation model was based on the Kwangju-geumnan subway station. Underground station, and the total time for simulation was set at 600s. First, the whole underground passage was divided to 1-Mesh and 8-Mesh in order to compare the parallel computation of a single CPU and Multi-CPU. With matrix numbers($15{\times}10^6$) more than what a single CPU can handle, fire driven flow from the center of the platform and the subway itself was analyzed. As a result, there seemed to be almost no difference between the single CPU's result and the Multi-CPU's ones. $3{\times}10^6$ grid point one employed to test the computing time with 2CPU and 7CPU computation were computable two times and fire times faster than 1CPU respectively. In this study it was confirmed that CPU could be overcome by using parallel computation.

  • PDF

Modular platform techniques for multi-sensor/communication of wearable devices (웨어러블 디바이스를 위한 다중 센서/통신용 모듈형 플랫폼 기술)

  • Park, Sung Hoon;Kim, Ju Eon;Yoon, Dong-Hyun;Baek, Kwang-Hyun
    • Journal of IKEEE
    • /
    • v.21 no.3
    • /
    • pp.185-194
    • /
    • 2017
  • In this paper, a modular platform for wearable devices is proposed which can be easily assembled by exchanging functions according to various field and environment conditions. The proposed modular platform consists of a 32-bit RISC CPU, a 32-bit symmetric multi-core processor, and a 16-bit DSP. It also includes a plug & play features which can quickly respond to various environments. The sensing and communication modules are connected in the form of a chain. This work is implemented in a standard 130 nm CMOS technology and the proposed modular wearable platforms are verified with temperature and humidity sensors.

Implementation of IQ/IDCT in H.264/AVC Decoder Using Mobile Multi-Core GPGPU (모바일 멀티 코어 GP-GPU를 이용한 H.264/AVC 디코더 구현)

  • Kim, Dong-Han;Lee, Kwang-Yeob;Jeong, Jun-Mo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.321-324
    • /
    • 2010
  • There have been lots of researches on a multi-core processor. The enhancement has been performed through parallelization method. Multi-core architecture in the mobile environment has emerged. But, there is a limit to a mobile CPU's performance. GP-GPU(General-Purpose computing on Graphics Processing Units) can improve performance without adding other dedicated hardware. This paper presents the implementation of Inverse Quantization, Inverse DCT and Color Space Conversion module in H.264/AVC decoder using Multi-Core GP-GPU for a mobile environments. The proposed architecture improves approximately 50% of performance when it use all the features.

  • PDF

Fast and Efficient Implementation of Neural Networks using CUDA and OpenMP (CUDA와 OPenMP를 이용한 빠르고 효율적인 신경망 구현)

  • Park, An-Jin;Jang, Hong-Hoon;Jung, Kee-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.253-260
    • /
    • 2009
  • Many algorithms for computer vision and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation has two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job that needs much cooperation between CPU and GPU, which is usual in image processing and pattern recognition contrary to the graphic area, CPU should generate raw feature data for GPU processing as much as possible to effectively utilize GPU performance. This paper proposes more quick and efficient implementation of neural networks on both GPU and multi-core CPU. We use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style instead of GPU to solve the first problem. Moreover, OpenMP (Open Multi-Processing) is used to concurrently process multiple data with single instruction on multi-core CPU, which results in effectively utilizing the memories of GPU. In the experiments, we implemented neural networks-based text extraction system using the proposed architecture, and the computational times showed about 15 times faster than implementation on only GPU without OpenMP.

Performance Improvement of a Real-time Traffic Identification System on a Multi-core CPU Environment (멀티 코어 환경에서 실시간 트래픽 분석 시스템 처리속도 향상)

  • Yoon, Sung-Ho;Park, Jun-Sang;Kim, Myung-Sup
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.5B
    • /
    • pp.348-356
    • /
    • 2012
  • The application traffic analysis is getting more and more challenging due to the huge amount of traffic from high-speed network link and variety of applications running on wired and wireless Internet devices. Multi-level combination of various analysis methods is desired to achieve high completeness and accuracy of analysis results for a real-time analysis system, while requires much of processing burden on the contrary. This paper proposes a novel architecture for a real-time traffic analysis system which improves the processing performance on multi-core CPU environment. The main contribution of the proposed architecture is an efficient parallel processing mechanism with multiple threads of various analysis methods. The feasibility of the proposed architecture was proved by implementing and deploying it on our campus network.

Analyzing Thermal Variations on a Multi-core Processor (멀티코아 프로세서의 온도변화 분석)

  • Lee, Sang-Jeong;Yew, Pen-Chung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.6
    • /
    • pp.57-67
    • /
    • 2010
  • This paper studies thermal characteristics of a mix of CPU-intensive and memory-intensive application workloads on a multi-core processor. Especially, we focus on thermal variations during program execution because thermal variations are more critical than average temperatures and their ranges for thermal management. New metrics are proposed to quantify such thermal variations for a workload. We study the thermal variations using SPEC CPU2006 benchmarks with varying cooling conditions and frequencies on an Intel Core 2 Duo processor. The results show that applications have distinct thermal variations characteristics. Such variations are affected by cooling conditions,operating frequencies and multiprogramming workload. Also, there are distinct spatial thermal variations between cores. Our new metrics and their results from this study provide useful insight for future research on multi-core thermal management.