Code Size Reduction and Execution performance Improvement with Instruction Set Architecture Design based on Non-homogeneous Register Partition

코드감소와 성능향상을 위한 이질 레지스터 분할 및 명령어 구조 설계

  • Published : 1999.12.01

Abstract

Embedded processors often accommodate two instruction sets, a standard instruction set and a compressed instruction set. With the compressed instruction set, code size can be reduced while instruction count (and consequently execution time) can be increased. To achieve code size reduction without significant increase of execution time, this paper proposes a new compressed instruction set architecture, called TOE (Two Operations Execution). The proposed instruction set format includes the parallel bit that indicates an instruction can be executed simultaneously with the next instruction. To add the parallel bit, TOE instruction format reduces the destination register field. The reduction of the register field limits the number of registers that are accessible by an instruction. To overcome the limited accessibility of registers, TOE adapts non-homogeneous register partition in which registers are divided into multiple subsets, each of which are accessed by different groups of instructions. With non-homogeneous registers, each instruction can access only a limited number of registers, but an entire program can access all available registers. With efficient non-homogeneous register allocator, all registers can be used in a balanced manner. As a result, the increase of code size due to register spills is negligible. Experimental results show that more than 30% of TOE instructions can be executed in parallel without significant increase of code size when compared to existing Thumb instruction set.

Keywords

References

  1. Proc. 1995 Int'l Symp. on Systems Synthesis Optimal Code Generation for Embedded Memory Non-homogeneous Register Architectures G. Araujo;S. Malik
  2. An Introduction to Thumb ARM
  3. Technical Report 92-21, Univ. of Pittsburgh, Comp. Sci. Dept. URSA: A Unified Resource Allocator for Registers and Functional Units in VLIW Architecture D. Berson;R. Gupta;M. Soffa
  4. Proc. IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques Resource Spackling: A Framework for Integrating Register Allocation in Local and Global Schedules D. Berson;R. Gupta;;M. Soffa
  5. ARM System Architecture S. Furber
  6. Computer Architecture A Quantitative Approach(2nd ed.) J. Hennessy;D. Patterson
  7. Ph.D. Thesis, MIT Dept. of EECS Code Generation and Optimization for Embedded Digital Signal Processors S. Liao
  8. M.S. Thesis, Louisiana Tech University Non-homogeneous Register Allocation for Embedded Processors X. Ma
  9. Advanced Compiler Design Implementation S. Muchnick
  10. IEEE Micro v.15 no.5 Embedded Control Problems, Thumb, and the ARM7TDMI S. Segars;K. Clarke;L. Goudge
  11. See MIPS Run D. Sweetman
  12. Ph.D. Dissertation, Louisiana Tech University A Systematic Integration of Register Allocation and Instruction Scheduling Y. Zhang