Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

Oh, Jaeg-Eun;Hwang, Seok-Joong;Nguyen, Huong Giang;Kim, A-Reum;Kim, Seon-Wook;Kim, Chul-Woo;Kim, Jong-Kook;

ETRI Journal

Volume 30 Issue 4
/
Pages.576-586
/
2008
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

Oh, Jaeg-Eun (School of Electrical Engineering, Korea University) ;
Hwang, Seok-Joong (School of Electrical Engineering, Korea University) ;
Nguyen, Huong Giang (School of Electrical Engineering, Korea University) ;
Kim, A-Reum (School of Electrical Engineering, Korea University) ;
Kim, Seon-Wook (School of Electrical Engineering, Korea University) ;
Kim, Chul-Woo (School of Electrical Engineering, Korea University) ;
Kim, Jong-Kook (School of Electrical Engineering, Korea University)

Received : 2007.12.12
Accepted : 2008.06.30
Published : 2008.08.30

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

Keywords

ILP;
TLP;
SMT;
CMP;
MLEP

ETRI Journal

Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)