Dual Cache Architecture for Low Cost and High Performance

  • Received : 2030.11.05
  • Published : 2003.10.31

Abstract

We present a high performance cache structure with a hardware prefetching mechanism that enhances exploitation of spatial and temporal locality. Temporal locality is exploited by selectively moving small blocks into the direct-mapped cache after monitoring their activity in the spatial buffer. Spatial locality is enhanced by intelligently prefetching a neighboring block when a spatial buffer hit occurs. We show that the prefetch operation is highly accurate: over 90% of all prefetches generated are for blocks that are subsequently accessed. Our results show that the system enables the cache size to be reduced by a factor of four to eight relative to a conventional direct-mapped cache while maintaining similar performance.

Keywords

References

  1. Proc. Int'l Conf. on Supercomputing'91 An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty J.L. Baer;T.F.Chen
  2. Proc. 5th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems Design and Evaluation of a Compiler Algorithm for Prefetching Mowry, T.;Lam, M.S.;Gupta, A.
  3. Proc. 25th Int'l Symposium on Microarchitecture An Efficient Architecture for Loop Based Data Preloading Chen, W.Y.;Bringmann, R.A.;Mahlke, S.A.;Hank, R.E.;Sicolo, J.E.
  4. Proc. 17th ISCA Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers Jouppi Norman, P.
  5. IEEE Trans. Comput. v.46 no.5 Selective Victim Caching: A Method to Improve the Performance of Direct Mapped Cache Stiliadis, D.;Varma, A.
  6. Proc. Int'l Conf. on Supercomputing'95 Data Cache with Multiple Cachig Strategies Tuned to Different Types of Locality Gonzalez, A.;Aliagas, C.;Valero, M.
  7. SCIzzL-5 The Split Temporal/Spatial Cache:Initial Performance Analysis Milutinovic, V.;Tomasevic, M.;Markovic, B.;Tremblay, M.
  8. COMPCON Digest of Papers PA-7200: A PA-RISC Processor with Integrated High Performance MP Bus Interface Kurpanchek, G.(et al.)
  9. Proc. the 1996 Int'l Conf. on Parallel Processing v.I Reducing Conflicts in Direct-Mapped Caches with a Temporality-Based Design Rivers Jude, A.;Davidson Edward, S.
  10. Proc. 17th ISCA The Performance Impact of Blcok sizes and Fetch Strategies Przybylski, S.
  11. Proc. PACT'97 Static Locality Analysis for Cache Management F. Jesus Sanchez;Antonio Gonzaelz;Mateo Valeo
  12. Proc. IEEE Alessandro Volta Memorial Workshop Power/Performance Advantages of Victim Buffer in High-Performance Processors G. Albera;R. Iris Bahar
  13. Improving Performance of an L1 Cache with an Associated Buffer, CSE-TR-631-98, University of Michigan Srinivasan, V.
  14. IEEE J. Solid State Circuits v.26 no.2 An Area Model for On-Chip Memories and its Applications Mulder, J.M.;Quach, N.T.;Flynn, M.J.
  15. CACTI 3.0: An Integrated Cache Timing and Power, and Area Model, Compaq WRL Report Reinman, G.(et al.)
  16. Proc. IEEE 10th Int'l Conf. on VLSI Design Energy-Efficiency of VLSI Cache: A Comparative Study Kamble, M.B.(et al.)
  17. Proc. ISLPED'97 Analytical Energy Dissipation Models for Low Power Caches Kamble, M.B.(et al.)