DOI QR코드

DOI QR Code

An Improved Dynamic Branch Predictor by Selective Access of a Specific Element in 4-Way Cache

4-Way 캐쉬의 선택된 Element를 이용한 향상된 동적 분기 예측기 구현

  • 황인성 (서강대학교 전자공학과 CAD & ES 연구실) ;
  • 황선영 (서강대학교 전자공학과 CAD & ES 연구실)
  • Received : 2013.11.21
  • Accepted : 2013.12.09
  • Published : 2013.12.31

Abstract

This paper proposes an improved branch predictor that reduces the number execution cycles of applications by selectively accessing a specific element in 4-way associative cache. When a branch instruction is fetched, the proposed branch predictor acquires a branch target address from the selected element in the cache by referring to MRU buffer. Branch prediction rate and application execution speed are considerably improved by increasing the number of BTAC entries in restricted power condition, when compared with that of previous branch predictor which accesses all elements. The effectiveness of the proposed dynamic branch predictor is verified by executing benchmark applications on the core simulator. Experimental results show that number of execution cycles decreases by an average of 10.1%, while power consumption increases an average of 7.4%, when compared to that of a core without a dynamic branch predictor. Execution cycles are reduced by 4.1% in comparison with a core which employs previous dynamic branch predictor.

본 논문은 4-Way 캐쉬의 선택된 element만을 사용하여 어플리케이션 수행 사이클을 줄인 향상된 동적 분기 예측기를 제안한다. 제안된 동적 분기 예측기는 분기명령어가 페치되면 MRU 버퍼를 참조하여 4-Way 캐쉬의 선택된 element에서 타깃 주소를 얻으므로, 모든 element에 접근하는 기존의 동적 분기 예측기보다 제한된 전력하에서 BTAC entry 수를 증가시킬 수 있어 분기 예측 성공률과 어플리케이션의 수행속도가 상당히 향상된다. 제안된 동적 분기 예측기의 효율성을 SMDL 시스템에 의해 생성된 코어가 벤치마크 어플리케이션을 수행하여 검증한다. 실험결과 동적 분기 예측기가 없는 코어에 비해 생성된 코어의 어플리케이션 수행 사이클은 평균 10.1% 감소하고 어플리케이션의 전력소모는 7.4% 증가한다. 기존 동적 분기 예측기를 사용하는 코어에 비해 수행 사이클은 평균 4.1% 줄어든다.

Keywords

References

  1. T. Juan, S. Sanjeevan, and J. Navarro, "Dynamic history-length fitting : A third level of adaptivity for branch prediction," in Proc. Comput. Architecture, pp. 155-166, Barcelona, Spain, July 1998.
  2. J. Lee and A. Smith, "Branch prediction strategies and branch target buffer design," Computer, vol. 17, no. 1, pp. 6-22, Jan. 1984.
  3. J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, 1990.
  4. T. Ball and J. Laurs, "Branch prediction for free," in Proc. ACM SIGPLAN Conf. Programming Language Design Implementation, pp. 300-313, New York, U.S.A., Aug. 1993.
  5. J. Patterson, "Accurate static branch prediction by value range propagation," in Proc. ACM SIGPLAN Conf. Programming Language Design Implementation, pp. 67-78, New York, U.S.A., June 1995.
  6. B. Calder, D. Grunwald, M. Jones, D. Lindsay, J. Martin, M. Mozer, and B. Zorn, "Evidence-based static branch prediction using machine learning," ACM Trans. Programming Languages Syst., vol. 19, no. 1, pp. 1-43, Sep. 1996.
  7. C. Cheng, The Schemes and Performances of Dynamic Branch Predictors, Technical Report, Berkeley Wireless Research Center, 2000.
  8. R. Sendag, J. Yi, P. Chuang, and D. Lilja, "Low power/area branch prediction using complementary branch predictors," in Proc. IEEE Int. Parallel Distributed Process. Symp., pp. 1-12, Miami, U.S.A., Apr. 2008.
  9. Y. Maa, M. Yen, S. Kuo, and G. Lee, "Cost-effective branch prediction by combining hedging and filtering," in Proc Int. Comput. Symp., pp. 648-655, Tainan, Taiwan, Dec. 2010.
  10. T. Chen, P. Pan, G. Jiang, and M. Ye, "Record branch prediction : An optimized scheme for two-level branch predictors," in Proc. IEEE 14th Int. Conf. High Performance Comput. Commun., pp. 1526-1533, Liverpool, U.K., June 2012.
  11. D. Parikh, K. Skadron, Y. Zhang, and M. Stan, "Power-aware branch prediction: Characterization and design," IEEE Trans. Comput., vol. 53, no. 2, pp. 168-186, Feb. 2004. https://doi.org/10.1109/TC.2004.1261827
  12. L. Nadav and W. Shlomo, "Low power branch prediction for embedded application processors," in Proc. Low Power Electron. Design, pp. 67-72, Austin, U.S.A., Aug. 2010.
  13. S. McFarling, Combining branch predictors, Technical Report, Western Research Laboratory, Dec. 1993.
  14. Y. Ding and W. Zhang, "Loop-based instruction prefetching to reduce the worst-case execution time," IEEE Trans. Comput., vol. 59, no. 6, pp. 855-864, June 2010. https://doi.org/10.1109/TC.2010.44
  15. M. Kobayashi, "Dynamic characteristics of loops," IEEE Trans. Comput., vol. 33, no. 2, pp. 125-132, Feb. 1984.
  16. S. Segars, "The ARM9 family-High performance microprocessors for embedded applications," in Proc. Int. Conf. Comput. Design, pp. 230-235, Austin, U.S.A., Oct. 1998.
  17. M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown, "MiBench: A free, commercially representative embedded benchmark suite," in Proc. IEEE Int. Workshops Workload Characterization, pp. 3-14, Austin, U.S.A., Dec. 2001.
  18. K. Inoue, T. Ishihara, and K. Murakami, "Way-predicting set-associative cache for high performance and low energy consumption," in Proc. Int. Symp. Low Power Electron. Design, pp. 273-275, San Diego, U.S.A., Aug. 1999.
  19. M. Calagos and Y. Chu, "Hybrid scheme for low-power set associative caches," Electron. Lett., vol. 48, no. 14, pp. 819-821, July 2012. https://doi.org/10.1049/el.2012.1434
  20. K. Kedzierski, M. Moreto, F. Cazorla, and M. Valero, "Adapting cache partitioning algorithms to pseudo-LRU replacement policies," in Proc. Parallel Distributed Process, pp. 1-12, Atlanta, U.S.A., Apr. 2010.
  21. N. Dutt and K. Choi, "Configurable processor for embedded computing," IEEE Comput., vol. 36, no. 1, pp. 120-123, Jan. 2003.
  22. K. Choi and Y. Cho, "Recent trends in the SoC design methodology," Inst. Electron. Eng. Korea (IEEK) Mag., vol. 30, no. 9, pp. 17-27, Sep. 2003.
  23. H. Lee and S. Hwang, "Design of a high-level synthesis system for automatic generation of pipelined datapath," J. Inst. Electron. Eng. Korea (IEEK), vol. 31-A, no. 4, pp. 53-67, Mar. 1994.
  24. J. Cho, Y. Yoo, and S. Hwang, "Construction of an automatic generation system of embedded processor cores," J. Korean Inst. Commun. Inform. Sci, (KICS), vol. 30, no. 6A, pp. 526-534, June 2005.
  25. ARM, ARM922T Technical Reference Manual (rev 0), 2001.
  26. ARM, ARM Architecture Reference Manual (rev 0), 2005.