• Title/Summary/Keyword: superscalar architecture

Search Result 39, Processing Time 0.032 seconds

A Performance Study of Multi-core Out-of-Order Superscalar Processor Architecture (멀티코어 비순차 수퍼스칼라 프로세서의 성능 연구)

  • Lee, Jong-Bok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.10
    • /
    • pp.1502-1507
    • /
    • 2012
  • In order to overcome the hardware complexity and power consumption problems, recently the multi-core architecture has been prevalent. For hardware simplicity, usually RISC processor is adopted as the unit core processor. However, if the performance of unit core processor is enhanced, the overall performance of the multi-core processor architecture can be further increased. In this paper, out-of-order superscalar processor is utilized for the multi-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar cores between 2 and 16 extensively. As a result, the 16-core out-of-order superscalar processor for the window size of 16 resulted in 17.4 times speed up over the single-core out-of-order superscalar processor, and 50 times speed up over the single core RISC processor. When compared for the same number of cores on the average, the multi-core out-of-order superscalar processor performance achieved 3.2 times speed up over the multi-core RISC processor and 1.6 times speed up over the multi-core in-order superscalar processor.

Performance Study of Multi-core In-Order Superscalar Processor Architecture (멀티코어 순차 수퍼스칼라 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.5
    • /
    • pp.123-128
    • /
    • 2012
  • In order to overcome the hardware complexity and performance limit problems, recently the multi-core architecture has been prevalent. For hardware simplicity, usually RISC processor is adopted as the unit core processor. However, if the performance of unit core processor is enhanced, the overall performance of the multi-core processor architecture can be further enhanced. In this paper, in-order superscalar processor is utilized as the core for the multi-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the number of superscalar cores between 2 and 16 and the window size of 4 to 16 extensively. As a result, the 16-core superscalar processor for the window size of 16 results in 8.4 times speed up over the single core superscalar processor. When compared with the same number of cores, the multi-core superscalar processor performance doubles that of the multi-core RISC processor.

A Design of a High Performance Stream Processor without Superscalar Architecture (슈퍼스칼라 구조를 갖지 않는 고성능 Stream Processor 설계)

  • Lee, Kwan-Ho;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.21 no.1
    • /
    • pp.77-80
    • /
    • 2017
  • In this paper, we proposed a way to improve performance of GP-GPU by deletion of superscalar issue from its original form. At first, we simplified the structure of stream processor in order to eliminate superscalar issue. Under this condition, preservation of hardware size and increasing of thread number were followed by functional improvement of GP-GPU. As the number of thread was getting larger, we proposed the new model of warp scheduler which adjusts the group of thread. This superscalar issue-deleted warp scheduler transferred the instructions to warp which was activated by Round Robin Scheduling. Performance comparison was conducted by Gaussian filtering and the results indicated that our newly designed GP-GPU showing 7.89 times better in its performance than original one.

Design of a Load/store Unit for ARM-SMI Microprocessors (ARM-SMI용 Load/store Unit(LSU) 설계)

  • 김재억;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1387-1390
    • /
    • 2003
  • The superscalar architecture shows limit in performance improvement recently. While, SMT(Simultaneous Multi-Threading) architecture is receiving remark. The purpose of SMT architecture is to improve the performance of superscalar microprocessors by executing multi threads at the same time. In this paper, a load/store unit(LSU) suitable for ARM-compatible SMT microprocessors is presented. This LSU supports load instructions and store instructions of ARM ISA. This LSU keeps away the degradation of SMT by cache miss.

  • PDF

Performance Analysis of Multicore Out-of-Order Superscalar Processor with Multiple Basic Block Execution (다중블럭을 실행하는 멀티코어 비순차 수퍼스칼라 프로세서의 성능 분석)

  • Lee, Jong Bok
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.2
    • /
    • pp.198-205
    • /
    • 2013
  • In this paper, the performance of multicore processor architecture is analyzed which utilizes out-of-order superscalar processor core using multiple basic block execution. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar processor with the window size from 32 to 64 and the number of cores between 1 and 16, exploiting multiple basic block execution from 1 to 4 extensively. As a result, the multicore out-of-order superscalar processor with 4 basic block execution achieves 22.0 % average performance increase over the same architecture with the single basic block execution.

The Analytic Performance Model of the Superscalar Processor Using Multiple Branch Prediction (독립시행의 정리를 이용하는 수퍼스칼라 프로세서의 다중 분기 예측 성능 모델)

  • 이종복
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.1009-1012
    • /
    • 1999
  • An analytical performance model that can predict the performance of a superscalar processor employing multiple branch prediction is introduced. The model is based on the conditional independence probability and the basic block size of instructions, with the degree of multiple branch prediction, the fetch rate, and the window size of a superscalar architecture. Trace driven simulation is performed for the subset of SPEC integer benchmarks, and the measured IPCs are compared with the results derived from the model. As the result, our analytic model could predict the performance of the superscalar processor using multiple branch prediction within 6.6 percent on the average.

  • PDF

A Theoretical Superscalar Microprocessor Performance Model with Limited Functional Units Using Instruction Dependencies (한정된 연산유닛에서 명령어 종속성을 이용하는 수퍼스칼라 프로세서의 이론적 성능 모델)

  • Lee, Jong-Bok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.2
    • /
    • pp.423-428
    • /
    • 2010
  • In the initial design phase of superscalar microprocessors, a performance model is necessary. A theoretic performance model is very useful since performance for various architecture parameters can be obtained by simply computing equations, without repeating simulations, Previous studies established theoretic performance models using the relation between the instruction window size and the issue width, with the penalties due to branch mispredictions and cache misses. However, the study was intended for unlimited number of functional units, which is insufficient for the real case application. This paper proposes a superscalar microprocessor theoretical performance model which also works for the limited functional units. To enhance the accuracy of our limited functional unit model, instruction dependency rates are employed. By using trace-driven data of SPEC 2000 integer programs as input, this paper shows that the theoretically computed performance of superscalar microprocessor with limited number of functional units is quite similar to the measured performance.

Statistical Simulation for Superscalar DSP Processors (수퍼스칼라 디지털 신호처리 프로세서에 대한 통계적 모의실험)

  • Lee, Jong-Bok
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.1217-1220
    • /
    • 2005
  • In this paper, statistical simulation is applied to a superscalar digital signal processor architecture using DSP kernel and DSP application benchmarks. As a result, the performance of a digital signal processor with several microarchitecture configurations can be estimated with the relative error of 3.7 ${\backslash}%$ on the average.

  • PDF

A Design of Superscalar Digital Signal Processor (다중 명령어 처리 DSP 설계)

  • Park, Sung-Wook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.3
    • /
    • pp.323-328
    • /
    • 2008
  • This paper presents a Digital Signal Processor achieving high through-put for both decision intensive and computation intensive tasks. The proposed processor employees a multiplier, two ALU and load/store. Unit as operational units. Those four units are controlled and works parallel by superscalar control scheme, which is different from prior DSP architecture. The performance evaluation was done by implementing AC-3 decoding algorithm and 37.8% improvement was achieved. This study is valuable especially for the consumer electronics applications, which require very low cost.

Embedded Multithreading Processor Architecture for Personal Information Devices (개인용 정보 단말장치를 위한 내장형 멀티스레딩 프로세서 구조)

  • Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.9
    • /
    • pp.7-13
    • /
    • 2010
  • In this paper, we proposed a processor architecture that is suitable for next generation embedded applications, especially for personal information devices such as smart phones, tablet PC. Latest high performance embedded processors are developed to achieve high clock speed. Because increasing performance makes design more difficult and induces large overhead, architectural evolution in embedded processor field is necessary. Among more enhanced processor types, out-of-order superscalar cannot be a candidate for embedded applications due to its excessive complexity and relatively low performance gain compared to its overhead. Therefore, new architecture with moderate complexity must be designed. In this paper, we developed a low-cost SMT architecture model and compared its performance to other architectures including scalar, superscalar and multiprocessor. Because current personal information devices have a tendency to execute multiple tasks simultaneously, SMT or CMP can be a good choice. And our simulation result shows that the efficiency of SMT is the best among the architectures considered.