• Title/Summary/Keyword: Many-core Processors

Search Result 36, Processing Time 0.023 seconds

New Thermal-Aware Voltage Island Formation for 3D Many-Core Processors

  • Hong, Hyejeong;Lim, Jaeil;Lim, Hyunyul;Kang, Sungho
    • ETRI Journal
    • /
    • v.37 no.1
    • /
    • pp.118-127
    • /
    • 2015
  • The power consumption of 3D many-core processors can be reduced, and the power delivery of such processors can be improved by introducing voltage island (VI) design using on-chip voltage regulators. With the dramatic growth in the number of cores that are integrated in a processor, however, it is infeasible to adopt per-core VI design. We propose a 3D many-core processor architecture that consists of multiple voltage clusters, where each has a set of cores that share an on-chip voltage regulator. Based on the architecture, the steady state temperature is analyzed so that the thermal characteristic of each voltage cluster is known. In the voltage scaling and task scheduling stages, the thermal characteristics and communication between cores is considered. The consideration of the thermal characteristics enables the proposed VI formation to reduce the total energy consumption, peak temperature, and temperature gradients in 3D many-core processors.

Deep Packet Inspection Time-Aware Load Balancer on Many-Core Processors for Fast Intrusion Detection

  • Choi, Yoon-Ho;Park, Woojin;Choi, Seok-Hwan;Seo, Seung-Woo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.3
    • /
    • pp.169-177
    • /
    • 2016
  • To realize high-speed intrusion detection by accommodating many regular expression (regex)-based signatures and growing network link capacities, we propose the Service TimE-Aware Load-balancing (STEAL) algorithm. This work is motivated from the observation that utilization of a many-core network intrusion detection system (NIDS) is influenced by unfair computational distribution among many-core NIDS nodes. To avoid such unfair computational distribution, STEAL is designed to dynamically distribute a large volume of traffic among many-core NIDS nodes based on packet service time, which is represented by the deep packet time in many-core NIDS nodes. From experiments, we show that compared to the commonly used load-balancing algorithm based on arrival rate, STEAL increases the number of received packets (i.e., decreases the number of dropped packets) in many-core NIDS. Specifically, by integrating an open source NIDS (i.e. Bro) with STEAL, we show that even under attack-dominant traffic and with many signatures, STEAL can rapidly improve the performance of many-core NIDS to realize high-speed intrusion detection.

Design Space Exploration of Embedded Many-Core Processors for Real-Time Fire Feature Extraction (실시간 화재 특징 추출을 위한 임베디드 매니코어 프로세서의 디자인 공간 탐색)

  • Suh, Jun-Sang;Kang, Myeongsu;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.10
    • /
    • pp.1-12
    • /
    • 2013
  • This paper explores design space of many-core processors for a fire feature extraction algorithm. This paper evaluates the impact of varying the number of cores and memory sizes for the many-core processor and identifies an optimal many-core processor in terms of performance, energy efficiency, and area efficiency. In this study, we utilized 90 samples with dimensions of $256{\times}256$ (60 samples containing fire and 30 samples containing non-fire) for experiments. Experimental results using six different many-core architectures (PEs=16, 64, 256, 1,024, 4,096, and 16,384) and the feature extraction algorithm of fire indicate that the highest area efficiency and energy efficiency are achieved at PEs=1,024 and 4,096, respectively, for all fire/non-fire containing movies. In addition, all the six many-core processors satisfy the real-time requirement of 30 frames-per-second (30 fps) for the algorithm.

Architecture Exploration of Optimal Many-Core Processors for a Vector-based Rasterization Algorithm (래스터화 알고리즘을 위한 최적의 매니코어 프로세서 구조 탐색)

  • Son, Dong-Koo;Kim, Cheol-Hong;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.1
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we implement and evaluate the performance of a vector-based rasterization algorithm for 3D graphics by using a SIMD (single instruction multiple data) many-core processor architecture. In addition, we evaluate the impact of a data-per-processing elements (DPE) ratio that is defined as the amount of data directly mapped to each processing element (PE) within many-core in terms of performance, energy efficiency, and area efficiency. For the experiment, we utilize seven different PE configurations by varying the DPE ratio (or the number PEs), which are implemented in the same 130 nm CMOS technology with a 500 MHz clock frequency. Experimental results indicate that the optimal PE configuration is achieved as the DPE ratio is in the range from 16,384 to 256 (or the number of PEs is in the range from 16 and 1,024), which meets the requirements of mobile devices in terms of the optimal performance and efficiency.

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

  • Lee, Sang-Pil;Kim, Deok-Ho;Yi, Jae-Young;Ro, Won-Woo
    • Journal of Information Processing Systems
    • /
    • v.8 no.1
    • /
    • pp.159-174
    • /
    • 2012
  • This paper presents a study on a high-performance design for a block cipher algorithm implemented on modern many-core graphics processing units (GPUs). The recent emergence of VLSI technology makes it feasible to fabricate multiple processing cores on a single chip and enables general-purpose computation on a GPU (GPGPU). The GPU strategy offers significant performance improvements for all-purpose computation and can be used to support a broad variety of applications, including cryptography. We have proposed an efficient implementation of the encryption/decryption operations of a block cipher algorithm, SEED, on off-the-shelf NVIDIA many-core graphics processors. In a thorough experiment, we achieved high performance that is capable of supporting a high network speed of up to 9.5 Gbps on an NVIDIA GTX285 system (which has 240 processing cores). Our implementation provides up to 4.75 times higher performance in terms of encoding and decoding throughput as compared to the Intel 8-core system.

Implementation and Performance Evaluation of Vector based Rasterization Algorithm using a Many-Core Processor (매니코어 프로세서를 이용한 벡터 기반 래스터화 알고리즘 구현 및 성능평가)

  • Shon, Dong-Koo;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.2
    • /
    • pp.87-93
    • /
    • 2013
  • In this paper, we implemented and evaluated the performance of a vector-based rasterization algorithm of 3D graphics using a SIMD-based many-core processor that consists of 4,096 processing elements. In addition, we compared the performance and efficiency of the rasterization algorithm using the many-core processor and commercial GPU (Graphics Processing Unit) system which consists of 7 GPUs and each of which have 512 cores. Experimental results showed that the SIMD-based many-core processor outperforms the commercial GPU system in terms of execution time (3.13x speedup), energy efficiency (17.5x better), and area efficiency (13.3x better). These results demonstrate that the SIMD-based many-core processor has potential as an embedded mobile processor.

A Performance Study on Many-core Processor Architectures with SPEC Benchmark Programs (SPEC 벤치마크 프로그램에 대한 매니코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.2
    • /
    • pp.252-256
    • /
    • 2013
  • In order to overcome the complexity and performance limit problems of superscalar processors, the multi-core architecture has been prevalent recently. Usually, the number of cores mostly used for the multi-core processor architecture ranges from 2 to 16. However in the near future, more than 32-cores are likely to be utilized, which is called as many-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 32 to 1024 many-core architectures extensively. For 1024-cores, the average performance scores 15.7 IPC, but the performance increase rate is saturated.

Performance Evaluation and Analysis for Discrete Wavelet Transform on Many-Core Processors (매니코어 프로세서 상에서 이산 웨이블릿 변환을 위한 성능 평가 및 분석)

  • Park, Yong-Hun;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.7 no.5
    • /
    • pp.277-284
    • /
    • 2012
  • To meet the usage of discrete wavelet transform (DWT) on potable devices, this paper implements 2-level DWT using a reference many-core processor architecture and determine the optimal many-core processor. To explore the optimal many-core processor, we evaluate the impacts of a data-per-processing element ratio that is defined as the amount of data mapped directly to each processing element (PE) on system performance, energy efficiency, and area efficiency, respectively. This paper utilized five PE configurations (PEs=16, 64, 256, 1,024, and 4,096) that were implemented in 130nm CMOS technology with a 720MHz clock frequency. Experimental results indicated that maximum energy and area efficiencies were achieved at PEs=1,024. However, the system area must be limited 140mm2 and the power should not exceed 3 watts in order to implement 2-level DWT on portable devices. When we consider these restrictions, the most reasonable energy and area efficiencies were achieved at PEs=256.

Analysis on the Performance and Temperature of the 3D Quad-core Processor according to Cache Organization (캐쉬 구성에 따른 3차원 쿼드코어 프로세서의 성능 및 온도 분석)

  • Son, Dong-Oh;Ahn, Jin-Woo;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.6
    • /
    • pp.1-11
    • /
    • 2012
  • As the process technology scales down, multi-core processors cause serious problems such as increased interconnection delay, high power consumption and thermal problems. To solve the problems in 2D multi-core processors, researchers have focused on the 3D multi-core processor architecture. Compared to the 2D multi-core processor, the 3D multi-core processor decreases interconnection delay by reducing wire length significantly, since each core on different layers is connected using vertical through-silicon via(TSV). However, the power density in the 3D multi-core processor is increased dramatically compared to that in the 2D multi-core processor, because multiple cores are stacked vertically. Unfortunately, increased power density causes thermal problems, resulting in high cooling cost, negative impact on the reliability. Therefore, temperature should be considered together with performance in designing 3D multi-core processors. In this work, we analyze the temperature of the cache in quad-core processors varying cache organization. Then, we propose the low-temperature cache organization to overcome the thermal problems. Our evaluation shows that peak temperature of the instruction cache is lower than threshold. The peak temperature of the data cache is higher than threshold when the cache is composed of many ways. According to the results, our proposed cache organization not only efficiently reduces the peak temperature but also reduces the performance degradation for 3D quad-core processors.

Implementation of SIMD-based Many-Core Processor for Efficient Image Data Processing (효율적인 영상데이터 처리를 위한 SIMD기반 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2011
  • Recently, as mobile multimedia devices are used more and more, the needs for high-performance and low-energy multimedia processors are increasing. Application-specific integrated circuits (ASIC) can meet the needed high performance for mobile multimedia, but they provide limited, if any, generality needed for various application requirements. DSP based systems can used for various types of applications due to their generality, but they require higher cost and energy consumption as well as less performance than ASICs. To solve this problem, this paper proposes a single instruction multiple data (SIMD) based many-core processor which supports high-performance and low-power image data processing while keeping generality. The proposed SIMD based many-core processor composed of 16 processing elements (PEs) exploits large data parallelism inherent in image data processing. Experimental results indicate that the proposed SIMD-based many-core processor higher performance (22 times better), energy efficiency (7 times better), and area efficiency (3 times better) than conversional commercial high-performance processors.