Enhancing Instruction Queue Efficiency with Return Address Stack in Shallow-Pipelined EISC Architecture

복귀주소 스택을 활용한 얕은 파이프라인 EISC 아키텍처의 명령어 큐 효율성 향상연구

  • Received : 2014.12.10
  • Accepted : 2015.03.09
  • Published : 2015.03.29

Abstract

In the EISC processor, the Instruction Queue (IQ) supporting LERI folding and loop buffering occupies roughly 20% of real estate, and its efficient utilization is a key for performance. This paper presents an architectural enhancement for the IQ utilization with return address stack (RAS) in the EISC processor. The proposed architecture eliminates the RAS corruption from the wrong-path, taking advantage of shallow pipeline. In experiments, a 4-entry RAS reduces the number of IQ flushes by up to 58.90% over baseline, and an 8-entry RAS by up to 61.28%. The experiments show up to 3.47% performance improvement with 8-entry RAS and up to 3.15% performance improvement with 4-entry RAS.

EISC 프로세서에서 LERI 폴딩과 루프 버퍼링을 지원하는 명령어 큐는 하드웨어적으로 20%를 차지하며, 그 효율성은 성능에 직결된다. 본 연구에서는 EISC 프로세서의 명령어 큐 아키텍처 효율성 향상을 복귀주소 스택(RAS)을 통해 실현하였다. 구현한 아키텍처는 EISC의 얕은 파이프라인 구조의 이점을 활용하여 잘못된 명령어 수행으로 인한 RAS Corruption 문제를 제거하였다. 실험에서, 4개 엔트리의 RAS는 명령어 큐의 플러시를 기존보다 최대 58.90% 줄였고, 8개 엔트리의 RAS는 이를 최대 61.28% 줄였다. 또한 실험 결과 8개 엔트리의 RAS는 3.47%의 성능향상을 보여주었고, 4개 엔트리의 RAS는 3.15%의 성능향상을 보여주었다.

Keywords

References

  1. Thomadakis, M. E. (2011). The architecture of the Nehalem processor and Nehalem-EP smp platforms. Resource, 3, 2.
  2. Seznec, A., & Michaud, P. (1999). De-aliased hybrid branch predictors.
  3. Yeh, T. Y., & Patt, Y. N. (1993). A comparison of dynamic branch predictors that use two levels of branch history. ACM SIGARCH Computer Architecture News, 21(2), 257-266. https://doi.org/10.1145/173682.165161
  4. McFarling, S. (1993). Combining branch predictors (Vol. 49). Technical Report TN-36, Digital Western Research Laboratory.
  5. Papermaster, M., Dinkjian, R., Jayfiield, M., Lenk, P., Ciarfella, B., O'Conell, F., & DuPont, R. (1998). POWER3: Next generation 64-bit PowerPC processor design. IBM White Paper, October.
  6. McFarling, S. (1993). Combining branch predictors (Vol. 49). Technical Report TN-36, Digital Western Research Laboratory.
  7. ARM. $ARM1156T2-S^{TM}$ Revision: r0p4 Technical Reference Manual.
  8. Hennessy JL, Patterson DA. (2002) Computer architecture: a quantitative approach: Morgan Kaufmann
  9. Lee, H., Beckett, P., & Appelbe, B. (2001, January). High-performance extendable instruction set computing. In Australian Computer Science Communications (Vol. 23, No. 4, pp. 89-94). IEEE Computer Society.
  10. Parikh, D., Skadron, K., Zhang, Y., & Stan, M. (2004). Power-aware branch prediction: Characterization and design. Computers, IEEE Transactions on, 53(2), 168-186. https://doi.org/10.1109/TC.2004.1261827
  11. Das, B., Bhattacharya, G., Maity, I., & Sikdar, B. K. (2011, December). Impact of Inaccurate Design of Branch Predictors on Processors' Power Consumption. In Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth International Conference on (pp. 335-342). IEEE.
  12. Webb, C. F. (1988). Subroutine call/return stack. IBM Technical Disclosure Bulletin, 30(11), 221-225.
  13. Kaeli, D. R., & Emma, P. G. (1991, April). Branch history table prediction of moving target branches due to subroutine returns. In ACM SIGARCH Computer Architecture News (Vol. 19, No. 3, pp. 34-42). ACM. https://doi.org/10.1145/115953.115957
  14. Jourdan, S., Stark, J., Hsing, T. H., & Patt, Y. N. (1997). Recovery requirements of branch prediction storage structures in the presence of mispredicted-path execution. International Journal of Parallel Programming, 25(5), 363-383. https://doi.org/10.1007/BF02699883
  15. Skadron, K., Ahuja, P. S., Martonosi, M., & Clark, D. W. (1998, November). Improving prediction for procedure returns with return-address-stack repair mechanisms. In Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture (pp. 259-271). IEEE Computer Society Press.
  16. Wang, G., Hu, X., Zhu, Y., & Zhang, Y. (2012, June). Self-Aligning Return Address Stack. In Networking, Architecture and Storage (NAS), 2012 IEEE 7th International Conference on (pp. 278-282). IEEE.
  17. Vandierendonck, H., & Seznec, A. (2008). Speculative return address stack management revisited. ACM Transactions on Architecture and Code Optimization (TACO), 5(3), 15.
  18. Kim, H. G., Jung, D. Y., Jung, H. S., Choi, Y. M., Han, J. S., Min, B. G., & Oh, H. C. (2003). AE32000B: a fully synthesizable 32-bit embedded microprocessor core. ETRI journal, 25(5), 337-344. https://doi.org/10.4218/etrij.03.0303.0008
  19. Intel 64 and IA-32 Architectures Optimization Reference Manual. http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf.
  20. Xilinx. ISE Simulator (ISim). http://www.xilinx.com/tools/isim.htm.
  21. Weicker, R. P. (1984). Dhrystone: a synthetic systems programming benchmark. Communications of the ACM, 27(10), 1013-1030. https://doi.org/10.1145/358274.358283