Search | Korea Science

A Prefetch Architecture with Efficient Branch Prediction for a 64-bit 4-way Superscalar Microprocessor (64비트 4-way 수퍼스칼라 마이크로프로세서의 효율적인 분기 예측을 수행하는 프리페치 구조)

문상국;문병인;이용환;이용석
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.25 no.11B
- /
- pp.1939-1947
- /
- 2000
본 논문에서는 명령어의 효율적인 페치를 위해 분기 타겟 주소 전체를 사용하지 않고 캐쉬 메모리(cache memory) 내의 적은 비트 수로 인덱싱 하여 한 클럭 사이클 안에 최대 4개의 명령어를 다음 파이프라인으로 보내줄 수 있는 방법을 제시한다. 본 프리페치 유닛은 크게 나누어 3개의 영역으로 나눌 수 있는데, 분기에 관련하여 미리 부분적으로 명령어를 디코드 하는 프리디코드(predecode) 블록, 타겟 주소(NTA : Next Target Address) 테이블 영역을 추가시킨 명령어 캐쉬(instruction cache) 블록, 전체 유닛을 제어하고 가상 주소를 관리하는 프리페치(prefetch) 블록으로 나누어진다. 사용된 명령어들은 SPARC(Scalable Processor ARChitecture) V9에 기준 하였고 구현은 Verilog-HDL(Hardwave Description Language)을 사용하여 기능 수준으로 기술되고 검증되었다. 구현된 프리페치 유닛은 명령어 흐름에 분기가 존재하더라도 단일 사이클 안에 4개까지의 명령어들을 정확한 예측 하에 다음 파이프라인으로 보내줄 수 있다. 또한 NTA를 사용한 방법은 같은 수의 레지스터 비트를 사용하였을 때 BTB(Branch Target Buffer)를 사용하는 방법과 비교하여 2배정도 많은 개수의 분기 명령 주소를 저장할 수 있는 장점이 있다.
PDF

A Branch Misprediction Recovery Mechanism using Control Independence (제어 독립성을 이용한 분기 예상 실패 복구 메커니즘)

윤성룡;신영호;박홍준;조영일
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.10c
- /
- pp.636-638
- /
- 2000
제어 독립성(Control Independence)은 슈퍼스칼라 프로세서에서 명령어 수준 병렬성(Instruction-Level Parallelism)을 향상시키기 위한 중요한 요소로 작용하고 있다. 분기 예상기법(Branch Prediction Mechanism)에서 잘못 예상될 경우에는 예상한 분기 방향의 명령어들을 제거하고 올바른 분기 방향의 명령어들을 다시 반입하여 수행해야 한다. 본 논문에서는 컴파일 시 프로파일링을 통한 정적인 방법과 프로그램상의 제어 흐름을 통해 동적으로 제어 독립적인 명령어를 탐지함으로써 분기 명령어의 잘못된 예상으로 인해 제거되는 명령어를 효과적으로 감소시켜 프로세서의 성능을 향상시키는 메커니즘을 제안한다. SPECint95 벤치마크 프로그램에 대해 기존의 방법과 본 논문에서 제안한 방법 사이의 사이클 당 수행된 명령어 수를 분석한 결과, 4-width 프로세서에서 4%~6%, 8-width 프로세서에서 11%~18%, 16-width 프로세서에서 15%~17%의 성능 향상을 보이고 있다.
PDF

A Code Optimization Algorithm of RISC Pipelined Architecture (RISC 파이프라인 아키텍춰의 코드 최적화 알고리듬)

김은성;임인칠
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.25 no.8
- /
- pp.937-949
- /
- 1988
This paper proposes a code optimization algorithm for dealing with hazards which are occurred in pipelined architecture due to resource dependence between executed instructions. This algorithm solves timing hazard which results from resource conflict between concurrently executing instructions, and sequencing hazard due to the delay time for branch target decision by reconstructing of instruction sequence without pipeline interlock. The reconstructed codes can be generated efficiently by considering timing hazard and sequencing hazard simultaneously. And dynamic execution time of program is improved by considering structral hazard which can be existed when pipeline is controlled dynamically.
PDF

Analysis on the Thermal Efficiency of Branch Prediction Techniques in 3D Multicore Processors (3차원 구조 멀티코어 프로세서의 분기 예측 기법에 관한 온도 효율성 분석)

Ahn, Jin-Woo;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
- The KIPS Transactions:PartA
- /
- v.19A no.2
- /
- pp.77-84
- /
- 2012
Speculative execution for improving instruction-level parallelism is widely used in high-performance processors. In the speculative execution technique, the most important factor is the accuracy of branch predictor. Unfortunately, complex branch predictors for improving the accuracy can cause serious thermal problems in 3D multicore processors. Thermal problems have negative impact on the processor performance. This paper analyzes two methods to solve the thermal problems in the branch predictor of 3D multi-core processors. First method is dynamic thermal management which turns off the execution of the branch predictor when the temperature of the branch predictor exceeds the threshold. Second method is thermal-aware branch predictor placement policy by considering each layer's temperature in 3D multi-core processors. According to our evaluation, the branch predictor placement policy shows that average temperature is $87.69^{\circ}C$, and average maximum temperature gradient is $11.17^{\circ}C$. And, dynamic thermal management shows that average temperature is $89.64^{\circ}C$ and average maximum temperature gradient is $17.62^{\circ}C$. Proposed branch predictor placement policy has superior thermal efficiency than the dynamic thermal management. In the perspective of performance, the proposed branch predictor placement policy degrades the performance by 3.61%, while the dynamic thermal management degrades the performance by 27.66%.
https://doi.org/10.3745/KIPSTA.2012.19A.2.077 인용 PDF KSCI

Indirect Branch Target Address Verification for Defense against Return-Oriented Programming Attacks (Return-Oriented Programming 공격 방어를 위한 간접 분기 목적 주소 검증 기법)

Park, Soohyun;Kim, Sunil
- KIPS Transactions on Computer and Communication Systems
- /
- v.2 no.5
- /
- pp.217-222
- /
- 2013
Return-Oriented Programming(ROP) is an advanced code-reuse attack like a return-to-libc attack. ROP attacks combine gadgets in program code area and make functions like a Turing-complete language. Some of previous defense methods against ROP attacks show high performance overhead because of dynamic execution flow analysis and can defend against only certain types of ROP attacks. In this paper, we propose Indirect Branch Target Address Verification (IBTAV). IBTAV detects ROP attacks by checking if target addresses of indirect branches are valid. IBTAV can defends against almost all ROP attacks because it verifies a target address of every indirect branch instruction. Since IBTAV does not require dynamic execution flow analysis, the performance overhead of IBTAV is relatively low. Our evaluation of IBTAV on SPEC CPU 2006 shows less than 15% performance overhead.
https://doi.org/10.3745/KTCCS.2013.2.5.217 인용 PDF KSCI

Dynamic Per-Branch History Length Fitting for High-Performance Processor (고성능 프로세서를 위한 분기 명령어의 동적 History 길이 조절 기법)

Kwak, Jong-Wook;Jhang, Seong-Tae;Jhon, Chu-Shik
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.44 no.2 s.314
- /
- pp.1-10
- /
- 2007
Branch prediction accuracy is critical for the overall system performance. Branch miss-prediction penalty is the one of the significant performance limiters for improving processor performance, as the pipeline deepens and the instruction issued per cycle increases. In this paper, we propose "Dynamic Per-Branch History Length Fitting Method" by tracking the data dependencies among the register writing instructions. The proposed solution first identifies the key branches, and then it selectively uses the histories of the key branches. To support this mechanism, we provide a history length adjustment algorithm and a required hardware module. As the result of simulation, the proposed mechanism outperforms the previous fixed static method, up to 5.96% in prediction accuracy. Furthermore, our method introduces the performance improvement, compared to the profiled results which are generally considered as the optimal ones.
PDF KSCI

A Design of Multimedia Application SoC based with Processor using BTB (BTB를 이용한 프로세서 기반 멀티미디어 응용 SoC 설계)

Jung, Younjin;Lee, Byungyup;Ryoo, Kwangki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2009.10a
- /
- pp.397-400
- /
- 2009
This paper describes ASIC design of Multimedia application SoC platform based RISC processor with BTB(Branch Target Buffer). For performance enhancement of platform, we use a simple branch prediction scheme, BTB structure, that stores a target address for branch instruction to remove pipeline harzard. Also, the platform includes a number of peripheral such as VGA controller, AC97 controller, UART controller, SRAM interface and Debug interface. The platform is designed and verified on a Xilinx VERTEX-4 FPGA using a number of test programs for functional tests and timing constraints. Finally, the platform is implemented into a single ASIC chip which can be operated at 100MHz clock frequency using the Chartered 0.18um process. As a result of performance estimation, the proposed platform shows about 5~9% performance improvement in comparison with the previous SoC Platform.
PDF

Process Algebraic Approach to Timing Analysis of Superscalar Processor Programs (프로세스 대수에 기반을 둔 수퍼스칼라 프로세서 프로그램의 시간 분석)

Yoo, Hee-Jun;Lee, Ki-Huen;Choi, Jin-Young
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.2
- /
- pp.200-208
- /
- 2000
Multi-ports register could shared several instructions at the same time in read operation. We address a formal methods for describing timing analysis and resource restriction in pipeline super scalar process that having multi-Port registers. First, we specify in-order pipeline instructions, and then, extend timing analysis in out-of-order super-scalar. In this case, we find instruction pairs in any cycle which can execute same time, We use ACSR(Algebra of Communicating Shared Resources), a branch of formal methods based on process algebra, for instruction specification and modelling.
PDF

Conditional Branch Optimization in the Compilers for Superscalar Processors (수퍼스칼라 프로세서를 위한 컴파일러에서 조건부 분기의 최적화)

Kim, Myung-Ho;Choi, Wan
- The Transactions of the Korea Information Processing Society
- /
- v.2 no.2
- /
- pp.264-276
- /
- 1995
In this paper, a technique for eliminating conditional branches in the compilers for superscalar processors is presented. The technique consists of three major steps. The first step transforms conditional branches into equivalent expressions using algebraic laws. The second step searches all possible instruction sequences for those expressions using GSO of Granlund/Kenner. Finally an optimal sequence that has the least dynamic count for the target superscalar processor is selected from the GSO output. Experiment result shows that for each conditional branch is the input program matched by one of the optimization patterns, the proposed technique outperforms more than 25% speedup of execution time over the original code when the GNU C compiler and the SuperSPARC processor are used.
PDF

A Meta-data Generation Technique for Efficient and Secure Code Reuse Attack Detection with a Consideration on Two Types of Instruction Set (안전하고 효율적인 Code Reuse Attack 탐지를 위한 ARM 프로세서의 두 가지 명령어 세트를 고려한 Meta-data 생성 기술)

Heo, Ingeo;Han, Sangjun;Lee, Jinyong;Paek, Yunheung
- Annual Conference of KIPS
- /
- 2014.11a
- /
- pp.443-446
- /
- 2014
Code reuse attack (CRA)는 기존의 코드 내에서 필요한 코드 조각들 (gadgets)을 모아 indirect branch 명령어들로 잇는 방식으로 공격자가 원하는 악성 프로그램을 구성할 수 있는 강력한 공격 방법이다. 공격자는 자신의 코드를 대상 시스템에 심는 대신 기존의 코드를 이용하기 때문에, 대부분의 범용 운영체제 (OS)가 강제하는 W^X protection 을 무력화할 수 있다. 이러한 CRA 에 대응하기 위하여 다수의 연구들에서 branch 의 trace 를 분석하여 CRA 고유의 특성을 찾아내는 Signature 기반 탐지 기술을 제안하였다. 본 논문에서는 ARM 프로세서 상에서의 CRA 를 대응하기 위한 Signature 기반 탐지 기술을 효율적으로 도울 수 있는 binary 분석 및 meta-data 생성 기술을 제안한다. 특히, 본 논문은 우리의 이전 논문에서 고려 되지 못했던 ARM 의 두 가지 명령어 세트의 특성을 고려하여, 공격자가 어느 명령어 세트를 이용하여 CRA 를 시도하더라도 막아낼 수 있도록 meta-data 를 두 가지 mode 에 대해서 생성하였다. 실험 결과, meta-data 는 본래 바이너리 코드 대비 20.8% 정도의 크기 증가를 일으키는 것으로 나타났다.
https://doi.org/10.3745/PKIPS.y2014m11a.443 인용 PDF

Search Result 73, Processing Time 0.038 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)