DOI QR코드

DOI QR Code

Design and Performance Evaluation of Expansion Buffer Cache

확장 버퍼 캐쉬의 설계 및 성능 평가

  • 홍원기 (대구대학교 정보통신공학부)
  • Published : 2004.12.01

Abstract

VLIW processor is considered to be an appropriate processor for the embedded system, provided with high performance and low power con-sumption due to its simple hardware structure. Unfortunately, the VLIW processor often suffers from high memory access latency due to the variable length of I-packets, which consist of independent instructions to be issued in parallel. It is because of the variable I-packet length that some I-packets must be placed over two cache blocks, which are called straddle I-packets, so that two cache accesses are required to fetch such I-packets. In this paper, an expansion buffer cache is proposed to improve not only the instruction fetch bandwidth, but also the power consumption of the I-cache with moderate hardware cost. The expansion buffer cache has a small expansion buffer containing a fraction of a straddle packet along with the main cache to reduce the additional cache accesses due to the straddle I-packets. With a great reduction in the cache accesses due to the straddle packets, the expansion buffer cache can achieve $5{\~}9{\%}$improvement over the conventional I-caches in the $Delay{\cdot}Power{\cdot}Area$ metric.

VLIW 프로세서는 간단한 하드웨어 구조로 인해 저전력 및 고성능을 제공하여 임베디드 시스템에 매우 적합한 프로세서 구조로 인식되고 있다. 그러나 VLIW 프로세서는 동시에 수행 가능한 명령어들의 집합인 명령어 패킷 길이가 일정하지 않기 때문에 메모리 접근 지연 시간이 늘어나는 문제점을 안고 있다. 이는 가변 길이의 명령어 패킷으로 인해 일부 명령어 패킷이 두개의 캐쉬 블록에 걸쳐 있게 되고(스트래들 명령어 패킷), 이러한 명령어 패킷을 읽어 오기 위해 두 번의 캐쉬 접근이 요구되기 때문이다. 본 논문에서는 명령어 인출 대역폭을 높여줄 뿐만 아니라 명령어 캐쉬의 전력 소모를 낮춰주는 확장 버퍼 캐쉬를 제안한다. 확장 버퍼 캐쉬는 메인 캐쉬와 함께 스트래들 명령어 패킷의 일부를 저장하기 위한 소량의 확장 버퍼 캐쉬를 갖고 있으며 스트래들 명령어 패킷으로 인해 추가적으로 발생하는 캐쉬 접근을 줄여준다. 실험 결과 스트래들 명령어 패킷으로 인한 캐쉬 접근을 줄여 줌으로써 확장 버퍼 캐쉬는 기존 명령어 캐쉬에 비해 약 $5{\~}9{\%}$의 성능 전력${\cdot}$비용 향상을 가져옴을 확인할 수 있었다.

Keywords

References

  1. D. H. Friendly, S. J. Patel and Y. N. Patt, 'Alternative fetch and issue policies for the trace cache fetch mechanism,' in Proc. of Int'l Symp. on Microarchitecture, pp.24-33, Dec., 1997 https://doi.org/10.1109/MICRO.1997.645794
  2. E. Rottenberg, S. Benett and J. E. Smith, 'Trace cache : a low latency approach to high bandwidth instruction fetching,' in Proc. of Int'l Symp. on Microarchitecture, pp.24-34, Dec., 1996 https://doi.org/10.1109/MICRO.1996.566447
  3. L. Geppert and T. Perry, 'Transmeta's magic show,' IEEE Spectrum, pp.26-33, May, 2000 https://doi.org/10.1109/6.842131
  4. P. Grun, N. Dutt and A. Nicolau, 'Access Pattern Based Local Memory Customization for Low Power Embedded Systems,' in Proc. of Design Automation and Test in Europe Conference, Mar., 2001 https://doi.org/10.1109/DATE.2001.915120
  5. H. Michael and et aI., 'L1 data cache decomposition for energy efficiency,' in Proc. of Int Symp. on Low Power Electronics and Design, pp.10-15, 2001 https://doi.org/10.1109/LPE.2001.945364
  6. E. Hao, P.-Y. Chang, M. Evers and Y. Patt, 'Increasing the instruction fetch rate via block-structured instruction set architectures,' in Proc. of Int'l Symp. on Microarchitecture, pp.191- 200, Dec., 1996 https://doi.org/10.1109/MICRO.1996.566461
  7. A. Seznec, S. Jourdan, P. Sainrat and P. Michaud, 'Multiple block ahead branch predictor,' in Proc. of Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, 1996 https://doi.org/10.1145/237090.237169
  8. A. Klauser and D. Grunwald, 'Instruction fetch mechanisms for multipath execution processors,' in Proc. of Int'l Symp. on Microarchitecture, pp.38-47, Dec., 1999 https://doi.org/10.1109/MICRO.1999.809441
  9. T. M. Conte and et al., 'Instruction fetch mechanisms for VLIW architectures with compressed encodings,' in Proc. of Int'l Symp. on Microarchitecture, pp.201- 211, Dec., 1996 https://doi.org/10.1109/MICRO.1996.566462
  10. N. S. Kim and et al., 'Leakage Current: Moore's Law Meets Static Power,' IEEE Computer, pp.68-75, Dec., 2003 https://doi.org/10.1109/MC.2003.1250885
  11. P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, 'IMPACT: An architectural frame-work for multiple-instruction-issue processors,' in Proc. of Int'l Symp. on Computer Architecture, pp.266-275, May, 1991 https://doi.org/10.1145/115952.115979
  12. T. Y. Yeh and Y. N. Part, 'A comparison of dynamic branch predictors that use two levels of branch history,' in Proc. of Int'l. Symp. on Computer Architecture, pp. 257-266, 1993 https://doi.org/10.1145/165123.165161
  13. T. Conte, K. Menezes, P. Millis and B. Patell, 'Optimization of instruction fetch mechanism for high issue rates,' in Proc. of Int'l Symp. on Computer Architecture, pp.333-344, June, 1995 https://doi.org/10.1145/223982.224444
  14. S. Banerjia, K. N. Menzes and T. M. Conte, 'NextPC computation for a banked instruction cache for a VLIW architecture with a compressed encoding,' Technical Report. Dept. of Electrical and Computer Engineering, North Carolina State University, June, 1996
  15. M. B. Kamble and K Ghose, 'Energy-efficiency of VLSI cache: a comparative study,' in Proc. of Int'l Conf. on VLSI Design, pp.261-267, Jan., 1997 https://doi.org/10.1109/ICVD.1997.568087
  16. J. M. Mulder, N. T. Quach and M. J. Flynn, 'An area model for on-chip memories and its application,' IEEE Journal of Solid-State Circuits, Vol.26, No.2, pp.98-106, Feb., 1991 https://doi.org/10.1109/4.68123
  17. M. Horowitz, T. Indermaur and R. Gonzalez, 'Low-power digital design,' in Proc. of IEEE Symp. Low Power Electron, pp.8-11, Oct., 1994
  18. W. Tang, R. Gupta and A. Nicolau, 'Power savings in embedded processors through decode filter cache,' in Proc. of Int. Conf. on Design Automation & Test in Europe, pp.443-448, Mar., 2002 https://doi.org/10.1109/DATE.2002.998311