• Title/Summary/Keyword: Processor Core

Search Result 397, Processing Time 0.028 seconds

Performance Comparison between Hardware & Software Cache Partitioning Techniques (하드웨어 캐시 파티셔닝과 소프트웨어 캐시 파티셔닝의 성능 비교)

  • Park, JiWoong;Yeom, HeonYoung;Eom, Hyeonsang
    • Journal of KIISE
    • /
    • v.42 no.2
    • /
    • pp.177-182
    • /
    • 2015
  • The era of multi-core processors has begun since the limit of the clock speed has been reached. These days, multi-core technology is used not only in desktops, servers, and table PCs, but also in smartphones. In this architecture, there is always interference between processes, because of the sharing of system resources. To address this problem, cache partitioning is used, which can be roughly divided into two types: software and hardware cache partitioning. When it comes to dynamic cache partitioning, hardware cache partitioning is superior to software cache partitioning, because it needs no page copy. In this paper, we compare the effectiveness of hardware and software cache partitioning on the AMD Opteron 6282 SE, which is the only commodity processor providing hardware cache partitioning, to see whether this technique can be effectively deployed in dynamic environments.

Tile Partitioning-based HEVC Parallel Decoding Optimization for Asymmetric Multicore Processor (비대칭 멀티코어 시스템 상의 HEVC 병렬 디코딩 최적화를 위한 타일 분할 기법)

  • Ryu, Yeongil;Roh, Hyun-Joon;Ryu, Eun-Seok
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.1060-1065
    • /
    • 2016
  • Recently, there is an emerging need for parallel UHD video processing, and the usage of computing systems that have an asymmetric processor such as ARM big.LITTLE is actively increasing. Thus, a new parallel UHD video processing method that is optimized for the asymmetric multicore systems is needed. This paper proposes a novel HEVC tile partitioning method for parallel processing by analyzing the computational power of asymmetric multicores. The proposed method analyzes (1) the computing power of asymmetric multicores and (2) the regression model of computational complexity per video resolution. Finally, the model (3) determines the optimal HEVC tile resolution for each core and partitions/allocates the tiles to suitable cores. The proposed method minimizes the gap in the decoding time between the fastest CPU core and the slowest CPU core. Experimental results with the 4K UHD official test sequences show average 20% improvement in the decoding speedup on the ARM asymmetric multicore system.

A High-performance Digital Hearing Aid Processor Based on a Programmable DSP Core (Programmable DSP 코어를 사용한 고성능 디지털 보청기 프로세서)

  • 박영철;김동욱;김인영;김원기
    • Journal of Biomedical Engineering Research
    • /
    • v.18 no.4
    • /
    • pp.467-476
    • /
    • 1997
  • This paper presents a designing of a digital hearing aid processor (DHAP) chip being operated by a dedicated DSP core. The DHAP for hearing aid devices must be feasible within a size and power consumption required. Furthermore, it should be able to compensate for wide range of hearing losses and allow sufficient flexibility for the algorithm development. In this paper, a programmable 16-bit fixed-point DSP core is employed thor the designing of the DHAP. The designed DHAP performs a nonlinear loudness correction of 8 frequency bands based on audiometric measurements of impaired subjects. By employing a programmable DSP, the DHAP provides all the flexibility needed to implement audiological algorithms. In addition, the chip has low-power feature and $5, 500\times5000$$\mu$$m^2$ dimensions that fit for wearable hearing aids.

  • PDF

Multiple Signature Comparison of LogTM-SE for Fast Conflict Detection (다중 시그니처 비교를 통한 트랜잭셔널 메모리의 충돌해소 정책의 성능향상)

  • Kim, Deok-Ho;Oh, Doo-Hwan;Ro, Won-W.
    • The KIPS Transactions:PartA
    • /
    • v.18A no.1
    • /
    • pp.19-24
    • /
    • 2011
  • As era of multi-core processors has arrived, transactional memory has been considered as an effective method to achieve easy and fast multi-threaded programming. Various hardware transactional memory systems such as UTM, VTM, FastTM, LogTM, and LogTM-SE, have been introduced in order to implement high-performance multi-core processors. Especially, LogTM-SE has provided study performance with an efficient memory management policy and a practical thread scheduling method through conflict detection based on signatures. However, increasing number of cores on a processor imposes the hardware complexity for signature processing. This causes overall performance degradation due to the heavy workload on signature comparison. In this paper, we propose a new architecture of multiple signature comparison to improve conflict detection of signature based transactional memory systems.

Design and Performance Analysis of DSP Prototype for High Data Rate Transmission of Lunar Orbiter (달 탐사선의 데이터 고속 전송을 위한 DSP 프로토타입 설계 및 성능 분석)

  • Jang, Yeon-Soo;Kim, Sang-Goo;Cho, Kyong-Kuk;Yoon, Dong-Weon
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.39 no.1
    • /
    • pp.63-68
    • /
    • 2011
  • Many countries all over the world have been doing lunar exploration projects. Korea has also been doing basic research on lunar exploration. The development of communication systems for lunar exploration projects is one of the most important aspects of performing a successful lunar mission. In this paper, we design a DSP (Digital Signal Processor) prototype based on the requirement analysis of a communication link for lunar exploration and implement its core module considering the international standards for deep space communications to perform a basic research on baseband processor development. It is verified by comparing the bit error rate of the DSP prototype with that of a computer simulation.

A Study on Statistical Simulation of Multicore Processor Architectures (멀티코어 프로세서의 통계적 모의실험에 관한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.259-265
    • /
    • 2014
  • When the trace-driven simulation is used for the performance analysis of widely used multicore processors in the initial design stage, much time and disk space is necessary. In this paper, statistical simulations are performed for a high performance multicore processor with various hardware configurations. For the experiment, SPEC2000 benchmarks programs are used for profiling and synthesizing new instruction traces. As a result, the performance obtained by our statistical simulation is comparable to that of the trace-driven simulation with the benefit of tremendous reduction in the simulation time.

OpenGL ES 2.0 based Shader Compilation Method for the Instruction-Level Parallelism (OpenGL ES 2.0 기반 셰이더 명령어 병렬 처리를 위한 컴파일 기법)

  • Kim, Jong-Ho;Kim, Tae-Young
    • Journal of Korea Game Society
    • /
    • v.8 no.2
    • /
    • pp.69-76
    • /
    • 2008
  • In this paper, we present the architecture of graphics processor and its instruction format for the mobile device. In addition, we introduce tile shader data structure for the on/off-line compilation based on the OpenGL ES 2.0 and a new optimization method based on the ILP(Instruction-Level Parallelism). This paper shows where a processor with the sane core clock is being used, the shader instruction resulted from the compile structure and method in this paper is approximately 1.5 to 2 times faster than a code based on the single instruction.

  • PDF

Design and Analysis of MPEG-2 MP@HL Decoder in Multi-Processor Environments

  • Yoo, Seung-Hwan;Lee, Hyun-Seung;Lee, Sang-Jo;Park, Rae-Hong;Kim, Do-Hyung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.211-216
    • /
    • 2009
  • As demands for high-definition television (HDTV) increase, the implementation of real-time decoding of high-definition (HD) video becomes an important issue. The data size for HD video is so large that real-time processing of the data is difficult to implement, especially with software. In order to implement a fast moving picture expert group-2 decoder for HDTV, we compose five scenarios that use parallel processing techniques such as data decomposition, task decomposition, and pipelining. Assuming the multi digital signal processor environments, we analyze each scenario in three aspects: decoding speed, L1 memory size, and bandwidth. By comparing the scenarios, we decide the most suitable cases for different situations. We simulate the scenarios in the dual-core and dual-central processing unit environment by using OpenMP and analyze the simulation results.

  • PDF

Development of Processing System of the Direct-broadcast Data from the Atmospheric Infrared Sounder (AIRS) on Aqua Satellite

  • Lee Jeongsoon;Kim Moongyu;Lee Chol;Yang Minsil;Park Jeonghyun;Park Jongseo
    • Korean Journal of Remote Sensing
    • /
    • v.21 no.5
    • /
    • pp.371-382
    • /
    • 2005
  • We present a processing system for the Atmospheric Infrared Sounder (AIRS) sounding suite onboard Aqua satellite. With its unprecedented 2378 channels in IR bands, AIRS aims at achieving the sounding accuracy of radiosonde (1 K in 1-km layer for temperature and $10\%$ in 2-km layer for humidity). The core of the processor is the International MODIS/AIRS Processing Package (IMAPP) that performs the geometric and radiometric correction for generation of Level 1 brightness temperature and Level 2 geophysical parameters retrieval. The processor can produce automatically from received raw data to Level 2 geophysical parameters. As we process the direct-broadcast data almost for the first time among the AIRS direct-broadcast community, a special attention is paid to understand and verify the Level 2 products. This processor includes sub-systems, that is, the near real time validation system which made the comparison results with in-situ measurement data, and standard digital information system which carry out the data format conversion into GRIdded Binary II (GRIB II) standard format to promote active data communication between meteorological societies. This processing system is planned to encourage the application of geophysical parameters observed by AIRS to research the aqua cycle in the Korean peninsula.

Application-Adaptive Performance Improvement in Mobile Systems by Using Persistent Memory

  • Bahn, Hyokyung
    • International journal of advanced smart convergence
    • /
    • v.8 no.1
    • /
    • pp.9-17
    • /
    • 2019
  • In this article, we present a performance enhancement scheme for mobile applications by adopting persistent memory. The proposed scheme supports the deadline guarantee of real-time applications like a video player, and also provides reasonable performances for non-real-time applications. To do so, we analyze the program execution path of mobile software platforms and find two sources of unpredictable time delays that make the deadline-guarantee of real-time applications difficult. The first is the irregular activation of garbage collection in flash storage and the second is the blocking and time-slice based scheduling used in mobile platforms. We resolve these two issues by adopting high performance persistent memory as the storage of real-time applications. By maintaining real-time applications and their data in persistent memory, I/O latency can become predictable because persistent memory does not need garbage collection. Also, we present a new scheduler that exclusively allocates a processor core to a real-time application. Although processor cycles can be wasted while a real-time application performs I/O, we depict that the processor utilization is not degraded significantly due to the acceleration of I/O by adopting persistent memory. Simulation experiments show that the proposed scheme improves the deadline misses of real-time applications by 90% in comparison with the legacy I/O scheme used in mobile systems.