• 제목/요약/키워드: multi-core architecture

Search Result 158, Processing Time 0.02 seconds

A Performance Study of Multi-core Out-of-Order Superscalar Processor Architecture (멀티코어 비순차 수퍼스칼라 프로세서의 성능 연구)

  • Lee, Jong-Bok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.10
    • /
    • pp.1502-1507
    • /
    • 2012
  • In order to overcome the hardware complexity and power consumption problems, recently the multi-core architecture has been prevalent. For hardware simplicity, usually RISC processor is adopted as the unit core processor. However, if the performance of unit core processor is enhanced, the overall performance of the multi-core processor architecture can be further increased. In this paper, out-of-order superscalar processor is utilized for the multi-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar cores between 2 and 16 extensively. As a result, the 16-core out-of-order superscalar processor for the window size of 16 resulted in 17.4 times speed up over the single-core out-of-order superscalar processor, and 50 times speed up over the single core RISC processor. When compared for the same number of cores on the average, the multi-core out-of-order superscalar processor performance achieved 3.2 times speed up over the multi-core RISC processor and 1.6 times speed up over the multi-core in-order superscalar processor.

Variable latency L1 data cache architecture design in multi-core processor under process variation

  • Kong, Joonho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.9
    • /
    • pp.1-10
    • /
    • 2015
  • In this paper, we propose a new variable latency L1 data cache architecture for multi-core processors. Our proposed architecture extends the traditional variable latency cache to be geared toward the multi-core processors. We added a specialized data structure for recording the latency of the L1 data cache. Depending on the added latency to the L1 data cache, the value stored to the data structure is determined. It also tracks the remaining cycles of the L1 data cache which notifies data arrival to the reservation station in the core. As in the variable latency cache of the single-core architecture, our proposed architecture flexibly extends the cache access cycles considering process variation. The proposed cache architecture can reduce yield losses incurred by L1 cache access time failures to nearly 0%. Moreover, we quantitatively evaluate performance, power, energy consumption, power-delay product, and energy-delay product when increasing the number of cache access cycles.

Energy-Efficient and High Performance CGRA-based Multi-Core Architecture

  • Kim, Yoonjin;Kim, Heesun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.3
    • /
    • pp.284-299
    • /
    • 2014
  • Coarse-grained reconfigurable architecture (CGRA)-based multi-core architecture aims at achieving high performance by kernel level parallelism (KLP). However, the existing CGRA-based multi-core architectures suffer from much energy and performance bottleneck when trying to exploit the KLP because of poor resource utilization caused by insufficient flexibility. In this work, we propose a new ring-based sharing fabric (RSF) to boost their flexibility level for the efficient resource utilization focusing on the kernel-stream type of the KLP. In addition, based on the RSF, we introduce a novel inter-CGRA reconfiguration technique for the efficient pipelining of kernel-stream on CGRA-based multi-core architectures. Experimental results show that the proposed approaches improve performance by up to 50.62 times and reduce energy by up to 50.16% when compared with the conventional CGRA-based multi-core architectures.

Efficient Fault-Recovery Technique for CGRA-based Multi-Core Architecture

  • Kim, Yoonjin;Sohn, Seungyeon
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.2
    • /
    • pp.307-311
    • /
    • 2015
  • In this paper, we propose an efficient fault-recovery technique for CGRA (Coarse-Grained Reconfigurable Architecture) based multi-core architecture. The proposed technique is intra/inter-CGRA co-reconfiguration technique based on a ring-based sharing fabric (RSF) and it enables exploiting the inherent redundancy and reconfigurability of the multi-CGRA for fault-recovery. Experimental results show that the proposed approaches achieve up to 73% fault recoverability when compared with completely connected fabric (CCF).

Implementation of a 'Rasterization based on Vector Algorithm' suited for a Multi-thread Shader architecture (Multi-Thread 쉐이더 구조에 적합한 Vector 기반의 Rasterization 알고리즘의 구현)

  • Lee, Ju-Suk;Kim, Woo-Young;Lee, Bo-Haeng;Lee, Kwang-Yeob
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.10
    • /
    • pp.46-52
    • /
    • 2009
  • A Multi-Core/Multi-Thread architecture is adopted for the Shader processor to enhance the processing performance. The Shader processor is designed to utilize its processing core IP for multiple purposes, such as Vertex-Shading, Rasterization, Pixel-Shading, etc. In this paper, we propose a 'Rasterization based on Vector Algorithm' that makes parallel pixels processing possible with Multi-Core and Multi-Thread architecture on the Shader Core. The proposed algorithm takes only 2% operation counts of the Scan-Line Algorithm and processes pixels independently.

Performance Study of Multi-core In-Order Superscalar Processor Architecture (멀티코어 순차 수퍼스칼라 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.5
    • /
    • pp.123-128
    • /
    • 2012
  • In order to overcome the hardware complexity and performance limit problems, recently the multi-core architecture has been prevalent. For hardware simplicity, usually RISC processor is adopted as the unit core processor. However, if the performance of unit core processor is enhanced, the overall performance of the multi-core processor architecture can be further enhanced. In this paper, in-order superscalar processor is utilized as the core for the multi-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the number of superscalar cores between 2 and 16 and the window size of 4 to 16 extensively. As a result, the 16-core superscalar processor for the window size of 16 results in 8.4 times speed up over the single core superscalar processor. When compared with the same number of cores, the multi-core superscalar processor performance doubles that of the multi-core RISC processor.

PERFORMANCE OF A KNIGHT TOUR PARALLEL ALGORITHM ON MULTI-CORE SYSTEM USING OPENMP

  • VIJAYAKUMAR SANGAMESVARAPPA;VIDYAATHULASIRAMAN
    • Journal of applied mathematics & informatics
    • /
    • v.41 no.6
    • /
    • pp.1317-1326
    • /
    • 2023
  • Today's computers, desktops and laptops were build with multi-core architecture. Developing and running serial programs in this multi-core architecture fritters away the resources and time. Parallel programming is the only solution for proper utilization of resources available in the modern computers. The major challenge in the multi-core environment is the designing of parallel algorithm and performance analysis. This paper describes the design and performance analysis of parallel algorithm by taking the Knight Tour problem as an example using OpenMP interface. Comparison has been made with performance of serial and parallel algorithm. The comparison shows that the proposed parallel algorithm achieves good performance compared to serial algorithm.

Performance Analysis of Shared Buffer Router Architecture for Low Power Applications

  • Deivakani, M.;Shanthi, D.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.6
    • /
    • pp.736-744
    • /
    • 2016
  • Network on chip (NoC) is an emerging technology in the field of multi core interconnection architecture. The routers plays an essential components of Network on chip and responsible for packet delivery by selecting shortest path between source and destination. State-of-the-art NoC designs used routing table to find the shortest path and supports four ports for packet transfer, which consume high power consumption and degrades the system performance. In this paper, the multi port multi core router architecture is proposed to reduce the power consumption and increasing the throughput of the system. The shared buffer is employed between the multi ports of the router architecture. The performance of the proposed router is analyzed in terms of power and current consumption with conventional methods. The proposed system uses Modelsim software for simulation purposes and Xilinx Project Navigator for synthesis purposes. The proposed architecture consumes 31 mW on CPLD XC2C64A processor.

A Performance Study on Many-core Processor Architectures with SPEC Benchmark Programs (SPEC 벤치마크 프로그램에 대한 매니코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.2
    • /
    • pp.252-256
    • /
    • 2013
  • In order to overcome the complexity and performance limit problems of superscalar processors, the multi-core architecture has been prevalent recently. Usually, the number of cores mostly used for the multi-core processor architecture ranges from 2 to 16. However in the near future, more than 32-cores are likely to be utilized, which is called as many-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 32 to 1024 many-core architectures extensively. For 1024-cores, the average performance scores 15.7 IPC, but the performance increase rate is saturated.

Dynamic Redundancy-based Fault-Recovery Scheme for Reliable CGRA-based Multi-Core Architecture

  • Kim, Yoonjin;Sohn, Seungyeon
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.6
    • /
    • pp.615-628
    • /
    • 2015
  • CGRA (Coarse-Grained Reconfigurable Architecture) based multi-core architecture can be considered as a suitable solution for the fault-tolerant computing. However, there have been a few research projects based on fault-tolerant CGRA without exploiting the strengths of CGRA as well as their works are limited to single CGRA. Therefore, in this paper, we propose two approaches to enable exploiting the inherent redundancy and reconfigurability of the multi-CGRA for fault-recovery. One is a resilient inter-CGRA fabric that is ring-based sharing fabric (RSF) with minimal interconnection overhead. Another is a novel intra/inter-CGRA reconfiguration technique on RSF for maximizing utilization of the resources when faults occur. Experimental results show that the proposed approaches achieve up to 94% faulty recoverability with reducing area/delay/power by up to 15%/28.6%/31% when compared with completely connected fabric (CCF).