Search | Korea Science

A Branch Target Buffer Using Shared Tag Memory with TLB (TLB 태그 공유 구조의 분기 타겟 버퍼)

Lee, Yong-Hwan
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- v.9 no.2
- /
- pp.899-902
- /
- 2005
Pipeline hazard due to branch instructions is the major factor of the degradation on the performance of microprocessors. Branch target buffer predicts whether a branch will be taken or not and supplies the address of the next instruction on the basis of that prediction. If the branch target buffer predicts correctly, the instruction flow will not be stalled. This leads to the better performance of microprocessor. In this paper, the architecture of a tag memory that branch target buffer and TLB can share is presented. Because the two tag memories used for branch target buffer and TLB each is replaced by single shared tag memory, we can expect the smaller ship size and the faster prediction. This hared tag architecture is more advantageous for the microprocessors that uses more bits of address and exploits much more instruction level parallelism.
PDF

A Combined BTB Architecture for effective branch prediction (효율적인 분기 예측을 위한 공유 구조의 BTB)

Lee Yong-hwan
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.9 no.7
- /
- pp.1497-1501
- /
- 2005
Branch instructions which make the sequential instruction flow changed cause pipeline stalls in microprocessor. The pipeline hazard due to branch instructions are the most serious problem that degrades the performance of microprocessors. Branch target buffer predicts whether a branch will be taken or not and supplies the address of the next instruction on the basis of that prediction. If the hanch target buffer predicts correctly, the instruction flow will not be stalled. This leads to the better performance of microprocessor. In this paper, the architecture of a ta8 memory that branch target buffer and TLB can share is presented. Because the two tag memories used for branch target buffer and TLB each is replaced by single combined tag memory, we can expect the smaller chip size and the faster prediction. This shared tag architecture is more advantageous for the microprocessors that uses more bits of address and exploits much more instruction level parallelism.
PDF KSCI

Exploring Branch Target Buffer Architecture on Intel Processors with Performance Monitor Counter (Performance Monitor Counter를 이용한 Intel Processor의 Branch Target Buffer 구조 탐구)

Jeong, Juhye;Kim, Han-Yee;Suh, Taeweon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2019.10a
- /
- pp.24-27
- /
- 2019
Meltdown, Spectre 등 하드웨어의 취약점을 이용하는 side-channel 공격이 주목을 받으면서 주요 microarchitecture 구조에 대한 철저한 이해의 필요성이 커지고 있다. 현대 마이크로프로세서에서 branch prediction이 갖는 중요성에도 불구하고 세부적인 사항은 거의 알려지지 않았으며 잠재적 공격에 대비하기 위해서는 반드시 현재 드러난 정보 이상의 detail을 탐구하기 위한 시도가 필요하다. 본 연구에서는 Performance Monitor Counter를 이용해 branch 명령어를 포함한 프로그램이 실행되는 동안 Branch Prediction Unit에 의한 misprediction 이벤트가 발생하는 횟수를 체크하여 인텔 하스웰, 스카이레이크에서 사용되는 branch target buffer의 구조를 파악하기 위한 실험을 수행하였다. 연구를 통해 해당 프로세서의 BTB의 size, number of way를 추정할 수 있었다.
https://doi.org/10.3745/PKIPS.y2019m10a.24 인용 PDF

Accurate Prediction of Polymorphic Indirect Branch Target (간접 분기의 타형태 타겟 주소의 정확한 예측)

백경호;김은성
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.41 no.6
- /
- pp.1-11
- /
- 2004
Modern processors achieve high performance exploiting avaliable Instruction Level Parallelism(ILP) by using speculative technique such as branch prediction. Traditionally, branch direction can be predicted at very high accuracy by 2-level predictor, and branch target address is predicted by Branch Target Buffer(BTB). Except for indirect branch, each of the branch has the unique target, so its prediction is very accurate via BTB. But because indirect branch has dynamically polymorphic target, indirect branch target prediction is very difficult. In general, the technique of branch direction prediction is applied to indirect branch target prediction, and much better accuracy than traditional BTB is obtained for indirect branch. We present a new indirect branch target prediction scheme which combines a indirect branch instruction with its data dependent register of the instruction executed earlier than the branch. The result of SPEC benchmark simulation which are obtained on SimpleScalar simulator shows that the proposed predictor obtains the most perfect prediction accuracy than any other existing scheme.
PDF KSCI

Design of a G-Share Branch Predictor for EISC Processor

Kim, InSik;Jun, JaeYung;Na, Yeoul;Kim, Seon Wook
- IEIE Transactions on Smart Processing and Computing
- /
- v.4 no.5
- /
- pp.366-370
- /
- 2015
This paper proposes a method for improving a branch predictor for the extendable instruction set computer (EISC) processor. The original EISC branch predictor has several shortcomings: a small branch target buffer, absence of a global history, a one-bit local branch history, and unsupported prediction of branches following LERI, which is a special instruction to extend an immediate value. We adopt a G-share branch predictor and eliminate the existing shortcomings. We verified the new branch predictor on a field-programmable gate array with the Dhrystone benchmark. The newly proposed EISC branch predictor also accomplishes higher branch prediction accuracy than a conventional branch predictor.
https://doi.org/10.5573/IEIESPC.2015.4.5.366 인용 PDF KSCI

A Design of Multimedia Application SoC based with Processor using BTB (BTB를 이용한 프로세서 기반 멀티미디어 응용 SoC 설계)

Jung, Younjin;Lee, Byungyup;Ryoo, Kwangki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2009.10a
- /
- pp.397-400
- /
- 2009
This paper describes ASIC design of Multimedia application SoC platform based RISC processor with BTB(Branch Target Buffer). For performance enhancement of platform, we use a simple branch prediction scheme, BTB structure, that stores a target address for branch instruction to remove pipeline harzard. Also, the platform includes a number of peripheral such as VGA controller, AC97 controller, UART controller, SRAM interface and Debug interface. The platform is designed and verified on a Xilinx VERTEX-4 FPGA using a number of test programs for functional tests and timing constraints. Finally, the platform is implemented into a single ASIC chip which can be operated at 100MHz clock frequency using the Chartered 0.18um process. As a result of performance estimation, the proposed platform shows about 5~9% performance improvement in comparison with the previous SoC Platform.
PDF

Performance and Power Consumption Improvement of Embedded RISC Core (임베디드 RISC 코어의 성능 및 전력 개선)

Jung, Hong-Kyun;Ryoo, Kwang-Ki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.2
- /
- pp.453-461
- /
- 2010
This paper presents a branch prediction algorithm and a 4-way set-associative cache for performance improvement of embedded RISC core and a clock-gating algorithm using ODC (Observability Don't Care) operation to improve the power consumption of the core. The branch prediction algorithm has a structure using BTB(Branch Target Buffer) and 4-way set associative cache has lower miss rate than direct-mapped cache. Pseudo-LRU Policy, which is one of the Line Replacement Policies, is used for decreasing the number of bits that store LRU value. The clock gating algorithm reduces dynamic power consumption. As a result of estimation of performance and dynamic power, the performance of the OpenRISC core applied the proposed architecture is improved about 29% and dynamic power of the core using Chartered $0.18{\mu}m$ technology library is reduced by 16%.
https://doi.org/10.6109/jkiice.2010.14.2.453 인용 PDF KSCI

Effective Branch Prediction Schemes in AE32000 (AE32000에서의 효율적인 분기 예측 기법)

정주영;김현규;오형철
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.10c
- /
- pp.25-27
- /
- 2001
본 논문에서는 AE32000 프로세서에 적응 가능한 효율적인 분기 예측 기법에 관하여 연구하였다. 실험결과, 내장형 응용분야에서의 비용 효율성이란 측면에, AE32000 프로세서에서는 1비트의 분기 예측기와 한 개의 엔트리를 갖는 BTB(Branch Target Buffer)를 사용하는 것이 가장 적합함을 관찰하였다. 또한, 분기 목적 주소에서 나타나는 LERI 명령을 폴딩하여 분기 손실을 줄이는 방안은, BTB와 LERI 폴딩 유닛을 사용하는 설계에서, 가져오는 성능 향상이 미미함을 확인하였다.
PDF

An Improved Dynamic Branch Predictor by Selective Access of a Specific Element in 4-Way Cache (4-Way 캐쉬의 선택된 Element를 이용한 향상된 동적 분기 예측기 구현)

Hwang, In-Sung;Hwang, Sun-Young
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.38A no.12
- /
- pp.1094-1101
- /
- 2013
This paper proposes an improved branch predictor that reduces the number execution cycles of applications by selectively accessing a specific element in 4-way associative cache. When a branch instruction is fetched, the proposed branch predictor acquires a branch target address from the selected element in the cache by referring to MRU buffer. Branch prediction rate and application execution speed are considerably improved by increasing the number of BTAC entries in restricted power condition, when compared with that of previous branch predictor which accesses all elements. The effectiveness of the proposed dynamic branch predictor is verified by executing benchmark applications on the core simulator. Experimental results show that number of execution cycles decreases by an average of 10.1%, while power consumption increases an average of 7.4%, when compared to that of a core without a dynamic branch predictor. Execution cycles are reduced by 4.1% in comparison with a core which employs previous dynamic branch predictor.
https://doi.org/10.7840/kics.2013.38A.12.1094 인용 PDF KSCI

Performance Improvement and Power Consumption Reduction of an Embedded RISC Core

Jung, Hong-Kyun;Jin, Xianzhe;Ryoo, Kwang-Ki
- Journal of information and communication convergence engineering
- /
- v.10 no.1
- /
- pp.78-84
- /
- 2012
This paper presents a branch prediction algorithm and a 4-way set-associative cache for performance improvement of an embedded RISC core and a clock-gating algorithm with observability don’t care (ODC) operation to reduce the power consumption of the core. The branch prediction algorithm has a structure using a branch target buffer (BTB) and 4-way set associative cache that has a lower miss rate than a direct-mapped cache. Pseudo-least recently used (LRU) policy is used for reducing the number of LRU bits. The clock-gating algorithm reduces dynamic power consumption. As a result of estimation of the performance and the dynamic power, the performance of the OpenRISC core applied to the proposed architecture is improved about 29% and the dynamic power of the core with the Chartered 0.18 ${\mu}m$ technology library is reduced by 16%.
https://doi.org/10.6109/jicce.2012.10.1.078 인용 PDF KSCI

Search Result 15, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)