• Title/Summary/Keyword: Branch Prediction

Search Result 166, Processing Time 0.031 seconds

Design of an Instruction Fetch Unit for RAPTOR, a On-Chip Multiprocessor (RAPTOR의 명령어 페치 유닛 설계)

  • 이성권;오형철이상원한우종
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.767-770
    • /
    • 1998
  • This paper introduces an instruction fetch unit which is designed for RAPTOR, an on-chip multiprocessor. In order to reduce control hazards, the proposed fetch unit supports a hybrid branch prediction scheme which consists of a static scheme and the 2bC branch prediction scheme. The fetch unit also utilizes the branch folding technique with two instruction buffers to avoid the branch penalty caused by imspredictions. Instructions are predecoded in the fetch unit to achieve extra performance gain.

  • PDF

A Combined BTB Architecture for effective branch prediction (효율적인 분기 예측을 위한 공유 구조의 BTB)

  • Lee Yong-hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.7
    • /
    • pp.1497-1501
    • /
    • 2005
  • Branch instructions which make the sequential instruction flow changed cause pipeline stalls in microprocessor. The pipeline hazard due to branch instructions are the most serious problem that degrades the performance of microprocessors. Branch target buffer predicts whether a branch will be taken or not and supplies the address of the next instruction on the basis of that prediction. If the hanch target buffer predicts correctly, the instruction flow will not be stalled. This leads to the better performance of microprocessor. In this paper, the architecture of a ta8 memory that branch target buffer and TLB can share is presented. Because the two tag memories used for branch target buffer and TLB each is replaced by single combined tag memory, we can expect the smaller chip size and the faster prediction. This shared tag architecture is more advantageous for the microprocessors that uses more bits of address and exploits much more instruction level parallelism.

Branch Predictor Design and Its Performance Evaluation for A High Performance Embedded Microprocessor (고성능 내장형 마이크로프로세서를 위한 분기예측기의 설계 및 성능평가)

  • Lee, Sang-Hyuk;Kim, Il-Kwan;Choi, Lynn
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.129-132
    • /
    • 2002
  • AE64000 is the 64-bit high-performance microprocessor that ADC Co. Ltd. is developing for an embedded environment. It has a 5-stage pipeline and uses Havard architecture with a separated instruction and data caches. It also provides SIMD-like DSP and FP operation by enabling the 8/16/32/64-bit MAC operation on 64-bit registers. AE64000 processor implements the EISC ISA and uses the instruction folding mechanism (Instruction Folding Unit) that effectively deals with LERI instruction in EISC ISA. But this unit makes branch prediction behavior difficult. In this paper, we designs a branch predictor optimized for AE64000 Pipeline and develops a AES4000 simulator that has cycle-level precision to validate the performance of the designed branch predictor. We makes TAC(Target address cache) and BPT(branch prediction table) seperated for effective branch prediction and uses the BPT(removed indexed) that has no address tags.

  • PDF

Performance Improvement and Power Consumption Reduction of an Embedded RISC Core

  • Jung, Hong-Kyun;Jin, Xianzhe;Ryoo, Kwang-Ki
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.1
    • /
    • pp.78-84
    • /
    • 2012
  • This paper presents a branch prediction algorithm and a 4-way set-associative cache for performance improvement of an embedded RISC core and a clock-gating algorithm with observability don’t care (ODC) operation to reduce the power consumption of the core. The branch prediction algorithm has a structure using a branch target buffer (BTB) and 4-way set associative cache that has a lower miss rate than a direct-mapped cache. Pseudo-least recently used (LRU) policy is used for reducing the number of LRU bits. The clock-gating algorithm reduces dynamic power consumption. As a result of estimation of the performance and the dynamic power, the performance of the OpenRISC core applied to the proposed architecture is improved about 29% and the dynamic power of the core with the Chartered 0.18 ${\mu}m$ technology library is reduced by 16%.

Analytical Models of Instruction Fetch on Superscalar Processors

  • Kim, Sun-Mo;Jung, Jin-Ha;Park, Sang-Bang
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.619-622
    • /
    • 2000
  • This research presents an analytical model to predict the instruction fetch rate on superscalar Processors. The proposed model is also able to analyze the performance relationship between cache miss and branch prediction miss. The proposed model takes into account various kind of architectural parameters such as branch instruction probability, cache miss rate, branch prediction miss rate, and etc.. To prove the correctness of the proposed model, we performed extensive simulations and compared the results with those of the analytical models. Simulation results showed that the pro-posed model can estimate the instruction fetch rate accurately within 10% error in most cases. The model is also able to show the effects of the cache miss and branch prediction miss on the performance of instruction fetch rate, which can provide an valuable information in designing a balanced system.

  • PDF

Exploring Branch Target Buffer Architecture on Intel Processors with Performance Monitor Counter (Performance Monitor Counter를 이용한 Intel Processor의 Branch Target Buffer 구조 탐구)

  • Jeong, Juhye;Kim, Han-Yee;Suh, Taeweon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.24-27
    • /
    • 2019
  • Meltdown, Spectre 등 하드웨어의 취약점을 이용하는 side-channel 공격이 주목을 받으면서 주요 microarchitecture 구조에 대한 철저한 이해의 필요성이 커지고 있다. 현대 마이크로프로세서에서 branch prediction이 갖는 중요성에도 불구하고 세부적인 사항은 거의 알려지지 않았으며 잠재적 공격에 대비하기 위해서는 반드시 현재 드러난 정보 이상의 detail을 탐구하기 위한 시도가 필요하다. 본 연구에서는 Performance Monitor Counter를 이용해 branch 명령어를 포함한 프로그램이 실행되는 동안 Branch Prediction Unit에 의한 misprediction 이벤트가 발생하는 횟수를 체크하여 인텔 하스웰, 스카이레이크에서 사용되는 branch target buffer의 구조를 파악하기 위한 실험을 수행하였다. 연구를 통해 해당 프로세서의 BTB의 size, number of way를 추정할 수 있었다.

Development of an integrated machine learning model for rheological behaviours and compressive strength prediction of self-compacting concrete incorporating environmental-friendly materials

  • Pouryan Hadi;KhodaBandehLou Ashkan;Hamidi Peyman;Ashrafzadeh Fedra
    • Structural Engineering and Mechanics
    • /
    • v.86 no.2
    • /
    • pp.181-195
    • /
    • 2023
  • To predict the rheological behaviours along with the compressive strength of self-compacting concrete that incorporates environmentally friendly ingredients as cement substitutes, a comparative evaluation of machine learning methods is conducted. To model four parameters, slump flow diameter, L-box ratio, V-funnel time, as well as compressive strength at 28 days-a complete mix design dataset from available pieces of literature is gathered and used to construct the suggested machine learning standards, SVM, MARS, and Mp5-MT. Six input variables-the amount of binder, the percentage of SCMs, the proportion of water to the binder, the amount of fine and coarse aggregates, and the amount of superplasticizer are grouped in a particular pattern. For optimizing the hyper-parameters of the MARS model with the lowest possible prediction error, a gravitational search algorithm (GSA) is required. In terms of the correlation coefficient for modelling slump flow diameter, L-box ratio, V-funnel duration, and compressive strength, the prediction results showed that MARS combined with GSA could improve the accuracy of the solo MARS model with 1.35%, 11.1%, 2.3%, as well as 1.07%. By contrast, Mp5-MT often demonstrates greater identification capability and more accurate prediction in comparison to MARS-GSA, and it may be regarded as an efficient approach to forecasting the rheological behaviors and compressive strength of SCC in infrastructure practice.

Simple Recovery Mechanism for Branch Misprediction in Global-History-Based Branch Predictors Allowing the Speculative Update of Branch History (분기 히스토리의 모험적 갱신을 허용하는 전역 히스토리 기반 분기예측기에서 분기예측실패를 위한 간단한 복구 메커니즘)

  • Ko, Kwang-Hyun;Cho, Young-Il
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.6
    • /
    • pp.306-313
    • /
    • 2005
  • Conditional branch prediction is an important technique for improving processor performance. Branch mispredictions, however, waste a large number of cycles, inhibit out-of-order execution, and waste electric power on mis-speculated instructions. Hence, the branch predictor with higher accuracy is necessary for good processor performance. In global-history-based predictors like gshare and GAg, many mispredictions come from commit update of the history. Some works on this subject have discussed the need for speculative update of the history and recovery mechanisms for branch mispredictions. In this paper, we present a simple mechanism for recovering the branch history after a misprediction. The proposed mechanism adds an age_counter to the original predictor and doubles the size of the branch history register. The age_counter counts the number of outstanding branches and uses it to recover the branch history register. Simulation results on the Simplescalar 3.0/PISA tool set and the SPECINTgS benchmarks show that gshare and GAg with the proposed recovery mechanism improved the average prediction accuracy by 2.14$\%$ and 9.21$\%$, respectively and the average IPC by 8.75$\%$ and 18.08$\%$, respectively over the original predictor.

An Improved Dynamic Branch Predictor by Selective Access of a Specific Element in 4-Way Cache (4-Way 캐쉬의 선택된 Element를 이용한 향상된 동적 분기 예측기 구현)

  • Hwang, In-Sung;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.12
    • /
    • pp.1094-1101
    • /
    • 2013
  • This paper proposes an improved branch predictor that reduces the number execution cycles of applications by selectively accessing a specific element in 4-way associative cache. When a branch instruction is fetched, the proposed branch predictor acquires a branch target address from the selected element in the cache by referring to MRU buffer. Branch prediction rate and application execution speed are considerably improved by increasing the number of BTAC entries in restricted power condition, when compared with that of previous branch predictor which accesses all elements. The effectiveness of the proposed dynamic branch predictor is verified by executing benchmark applications on the core simulator. Experimental results show that number of execution cycles decreases by an average of 10.1%, while power consumption increases an average of 7.4%, when compared to that of a core without a dynamic branch predictor. Execution cycles are reduced by 4.1% in comparison with a core which employs previous dynamic branch predictor.