• Title/Summary/Keyword: multi-core CPU

Search Result 76, Processing Time 0.025 seconds

Implementation of the AMBA AXI4 Bus interface for effective data transaction and optimized hardware design (효율적인 데이터 전송과 하드웨어 최적화를 위한 AMBA AXI4 BUS Interface 구현)

  • Kim, Hyeon-Wook;Kim, Geun-Jun;Jo, Gi-Ppeum;Kang, Bong-Soon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.15 no.2
    • /
    • pp.70-75
    • /
    • 2014
  • Recently, the demand for high-integrated, low-powered, and high-powered SoC design has been increasing due to the multi-functionality and the miniaturization of digital devices and the high capacity of service informations. With the rapid evolution of the system, the required hardware performances have become diversified, the FPGA system has been increasingly adopted for the rapid verification, and SoC system using the FPGA and the ARM core for control has been growingly chosen. While the AXI bus is used in these kinds of systems in various ways, it is traditionally designed with AXI slave structure. In slave structure, there are problems with the CPU resources because CPU is continually involved in the data transfer and can't be used in other jobs, and with the decreased transmission efficiency because the time not used of AXI bus beomes longer. In this paper, an efficient AXI master interface is proposed to solve this problem. The simulation results show that the proposed system achieves reductions in the consumption clock by an average of 51.99% and in the slice by 31% and that the maximum operating frequency is increased to 107.84MHz by about 140%.

Real-Time Kernel for Linux based on ARM Processor, RTiKA (Real-Time Implant Kernel For ARMLinux) (ARM 프로세서 기반의 리눅스를 위한 실시간 확장 커널 (RTiKA, Real-Time implant Kernel for ARMLinux))

  • Lee, Seung-Yul;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.10
    • /
    • pp.587-597
    • /
    • 2017
  • Recently, the demand for real-time performance in mobile environment is increasing due to the improvement of hardware performance, however a GPOS(General-Purpose Operating System) such as Android and Linux do not provide real-time performance. We developed RTiK(Real-Time implant Kernel) for this problem, but it has the disadvantage of supporting only x86 Architecture. In this paper, we designed and implemented a RTiKA(Real-Time implanted Kernel for ARM) to support real-time in ARM Linux. We used MCT(Multi-Core Timer) timer which replaces Local APIC Timer for real-time support, and we measured the period of generated real-time task for performance verification and evaluation. As the recent the RTiKA can guarantee the operating of several real-time tasks based on the cycle of 1ms.

The Study on the Design and Optimization of Storage for the Recording of High Speed Astronomical Data (초고속 관측 데이터 수신 및 저장을 위한 기록 시스템 설계 및 성능 최적화 연구)

  • Song, Min-Gyu;Kang, Yong-Woo;Kim, Hyo-Ryoung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.1
    • /
    • pp.75-84
    • /
    • 2017
  • It becomes more and more more important for the storage that supports high speed recording and stable access from network environment. As one field of basic science which produces massive astronomical data, VLBI(: Very Long Baseline Interferometer) is now demanding more data writing performance and which is directly related to astronomical observation with high resolution and sensitivity. But most of existing storage are cloud model based for the high throughput of general IT, finance, and administrative service, and therefore it not the best choice for recording of big stream data. Therefore, in this study, we design storage system optimized for high performance of I/O and concurrency. To solve this problem, we implement packet read and writing module through the use of libpcap and pf_ring API on the multi core CPU environment, and build a scalable storage based on software RAID(: Redundant Array of Inexpensive Disks) for the efficient process of incoming data from external network.

A Design and Implementation of Educational Mobile Robot System including Remote Control Function (원격 제어 기능을 포함한 교육용 모바일 로봇 시스템의 설계 및 구현)

  • Chung, Joong-Soo;Jung, Kwang-Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.4
    • /
    • pp.33-40
    • /
    • 2015
  • This paper presents the design and implementation of the educational remote controlled robot system including remote sensing in the embedded environment. The design of sensing information processing, software design and template design mechanism for the programming practice are introduced. LPC1769 using Cortex-M3 core as CPU, LPCXPRESSO as debugging environment, C language as firmware development language and FreeRTOS as OS are used in development environment. The control command is received via RF communication by the server and the robot system which is operated by driving the various sensors. The educational procedure is from robot demo operation program as hands-on practice and then compiling, loading of the basic robot operation program, already supplied. Thereafter the verification is checked by using the basic robot operation to allow demo operation such as hands-on-training procedure. The original protocol is designed via RF communication between server and robot system, and the satisfied performance result is presented by analyzing the robot sensing data processing.

The Development of a MATLAB-based Discrete Event Simulation Framework for the Engagement Simulations of the Weapon Systems (무기체계 교전 시뮬레이션을 위한 매트랩 기반 이산사건시뮬레이션 프레임워크의 개발)

  • Hwang, Kun-Chul;Lee, Min-Gyu;Kim, Jung-Hoon
    • Journal of the Korea Society for Simulation
    • /
    • v.21 no.2
    • /
    • pp.31-39
    • /
    • 2012
  • Simulation Framework is a basic software tool used to develop simulation applications. This paper describes the development of a discrete event simulation framework based on DEVS(Discrete EVent System Specification) formalism, using MATLAB language which is widely used in technical computing and engineering disciplines. The newly developed framework utilizing MATLAB object oriented programming combines the convenience of MATLAB language and the sophisticated architecture of the DEVS formalism. Hence, it supports the productivity, flexibility, extensibility that are required for the simulation application software development of the weapon systems engagement. Moreover, it promises a simulation application the increased the computation speed proportional to the number of CPU of a multi-core processor, providing the batch simulation functionality based on MATLAB parallel computing technology.

Parallel Processing of K-means Clustering Algorithm for Unsupervised Classification of Large Satellite Imagery (대용량 위성영상의 무감독 분류를 위한 K-means 군집화 알고리즘의 병렬처리)

  • Han, Soohee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.3
    • /
    • pp.187-194
    • /
    • 2017
  • The present study introduces a method to parallelize k-means clustering algorithm for fast unsupervised classification of large satellite imagery. Known as a representative algorithm for unsupervised classification, k-means clustering is usually applied to a preprocessing step before supervised classification, but can show the evident advantages of parallel processing due to its high computational intensity and less human intervention. Parallel processing codes are developed by using multi-threading based on OpenMP. In experiments, a PC of 8 multi-core integrated CPU is involved. A 7 band and 30m resolution image from LANDSAT 8 OLI and a 8 band and 10m resolution image from Sentinel-2A are tested. Parallel processing has shown 6 time faster speed than sequential processing when using 10 classes. To check the consistency of parallel and sequential processing, centers, numbers of classified pixels of classes, classified images are mutually compared, resulting in the same results. The present study is meaningful because it has proved that performance of large satellite processing can be significantly improved by using parallel processing. And it is also revealed that it easy to implement parallel processing by using multi-threading based on OpenMP but it should be carefully designed to control the occurrence of false sharing.

Memory Management based Hybrid Transactional Memory Scheme for Efficiently Processing Transactions in Multi-core Environment (멀티코어 환경에서 효율적인 트랜잭션 처리를 위한 메모리 관리 기반 하이브리드 트랜잭셔널 메모리 기법)

  • Jang, Yeon-Woo;Kang, Moon-Hwan;Chang, Jae-Woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.795-798
    • /
    • 2017
  • 최근 멀티코어 프로세서가 개발됨에 따라 병렬 프로그래밍은 멀티코어를 효과적으로 활용하기 위한 기법으로 그 중요성이 높아지고 있다. 트랜잭셔널 메모리는 처리 방식에 따라 HTM, STM, HyTM으로 구분되며, 최근 HTM 및 STM 결합한 HyTM 이 활발히 연구되고 있다. 그러나 기존의 HyTM 는 HTM과 STM의 동시성 제어를 위해 블룸필터를 사용하는 반면, 블룸필터의 자체적인 긍정 오류를 해결하지 못한다. 아울러, 트랜잭션 처리를 위한 메모리 할당/해제를 기존의 락 메커니즘을 사용하여 관리한다. 따라서 멀티코어 환경에서 스레드 수가 증가할수록 트랜잭션 처리 효율이 떨어진다. 본 논문에서는 멀티코어 환경에서 효율적인 트랜잭션 처리를 위한 메모리 관리 기반 하이브리드 트랜잭셔널 메모리 기법을 제안한다. 제안하는 기법은 트랜잭션 처리에 최적화된 블룸필터를 제공함으로써, 병렬적으로 동시에 수행되는 서로 다른 환경의 트랜잭션에 대해 일관성 있는 처리를 지원한다. 아울러, CPU 캐시라인에 최적화된 메모리 기법을 통해, 메모리 할당량이 적은 트랜잭션은 로컬 캐시에 할당함으로써 트랜잭션의 빠른 처리를 지원한다.

The study on the Efficient methodology to apply the GPU for military information system improvement (국방정보시스템 성능향상을 위한 효율적인 GPU적용방안 연구)

  • Kauh, Janghyuk;Lee, Dongho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.1
    • /
    • pp.27-35
    • /
    • 2015
  • Increasing the number of GPU (Graphic Processor Unit) cores, the studies on High Performance Computing Platform using GPU have actively been made in recent. This trend has led to the development of GPGPU (General Purpose GPU) and CUDA (Compute Unified Device Architecture) Framework. In this paper, we explain the many benefits of the GPU based system, and propose the ICIDF(Identify Compute-Intensive Data set and Function) methodology to apply GPU technology to legacy military information system for performance improvement. To demonstrate the efficiency of this methodology, we applied this method to AES CPU based program obtained from the Internet web site. Simply changing the data structure made improved the performance of AES program. As a result, the performance of AES based GPU program is improved gradually up to 10 times. Depending on the developer's ability, additional performance improvement can be expected. The problem to be solved is heat issue, but this problem has been much improved by the development of the cooling technology.

VHDL Design for Out-of-Order Superscalar Processor of A Fully Pipelined Scheme (완전한 파이프라인 방식의 비순차실행 수퍼스칼라 프로세서의 VHDL 설계)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.99-105
    • /
    • 2021
  • Today, a superscalar processor is the basic unit or an essential component of a multi-core processor, SoCs, and GPUs. Hence, a high-performance out-of-order superscalar processor must be adopted for these systems to maximize its performance. The superscalar processor fetches, issues, executes, and writes back multiple instructions per cycle by utilizing reorder buffers and reservation stations to dynamically schedule instructions in a pipelined scheme. In this paper, a fully pipelined out-of-order superscalar processor with speculative execution is designed with VHDL and verified with GHDL. As a result of the simulation, the program composed of ARM instructions is successfully performed.

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.