DOI QR코드

DOI QR Code

Data Cache System based on the Selective Bank Algorithm for Embedded System

내장형 시스템을 위한 선택적 뱅크 알고리즘을 이용한 데이터 캐쉬 시스템

  • 정보성 (경상국립대학교 제어계측공학과) ;
  • 이정훈 (경상국립대학교 전기전자공학부)
  • Published : 2009.04.30

Abstract

One of the most effective way to improve cache performance is to exploit both temporal and spatial locality given by any program executive characteristics. In this paper we present a high performance and low power cache structure with a bank selection mechanism that enhances exploitation of spatial and temporal locality. The proposed cache system consists of two parts, i.e., a main direct-mapped cache with a small block size and a fully associative buffer with a large block size as a multiple of the small block size. Especially, the main direct-mapped cache is constructed as two banks for low power consumption and stores a small block which is selected from fully associative buffer by the proposed bank selection algorithm. By using the bank selection algorithm and three state bits, We selectively extend the lifetime of those small blocks with high temporal locality by storing them in the main direct-mapped caches. This approach effectively reduces conflict misses and cache pollution at the same time. According to the simulation results, the average miss ratio, compared with the Victim and STAS caches with the same size, is improved by about 23% and 32% for Mibench applications respectively. The average memory access time is reduced by about 14% and 18% compared with the he victim and STAS caches respectively. It is also shown that energy consumption of the proposed cache is around 10% lower than other cache systems that we examine.

캐쉬의 성능을 향상시키는 가장 효과적인 방법은 프로그램 수행 특성에 내재되어 있는 시간적 (temporal locality) 지역성과 공간적 지역성(spatial locality)을 활용하는 것이다. 본 논문은 프로그램 수행 특성에 적합한 시간적/공간적 지역성을 이용하기 위한 뱅크 선택 메커니즘을 가진 고성능 저전력 캐쉬 구조를 제안하였다. 제안하는 캐쉬 시스템은 다른 블록 크기와 다른 연관도를 가지는 두개의 캐쉬로 구성되어 진다. 즉 작은 블록 크기를 지원하는 직접사상 구조의 주 캐쉬(main direct-mapped cache)와 큰 블록을 지원하는 완전연관 버퍼 (fully associative buffer)로 구성되어 진다. 특히 주 캐쉬는 저전력을 위해 2-뱅크로 구성되며, 완전연관 버퍼에서 선택되어진 작은 블록은 제안된 뱅크 선택 알고리즘에 의해 주 캐쉬의 뱅크에 저장된다. 제안된 뱅크 선택 알고리즘과 3비트 상태 비트를 이용하여 시간적 지역성이 높은 데이터들을 주 캐쉬에 선택적으로 저장함으로써 고성능의 효과를 얻을 수 있었다. 제안된 알고리즘은 또한 충돌 미스 (conflict miss)와 캐쉬 오염 (cache pollution)을 효과적으로 줄여준다. 시뮬레이션 결과에 따르면, 평균 접근 실패율의 경우 Mibench 응용군에 대해 Victim 캐쉬에 비해 23%, STAS 캐쉬에 비해 32%의 감소효과를 보여준다. 평균 메모리 접근 시간의 경우 Victim 캐쉬에 비해 14%, STAS 캐쉬에 비해 18%의 감소효과를 얻을 수 있었다. 에너지 소비의 관점에서도 제안된 캐쉬 시스템은 Victim 캐쉬와 STAS 캐쉬에 비해 약 10% 감소 효과를 얻을 수 있었다.

Keywords

References

  1. W. Shiue, S. Udayanarayanan, and C. Chakrabati, 'Data memory design and exploration for low-power embedded systems,' ACM Trams. Design Automation of Electronic Systems, Vol.6 No.4, pp.553-568, Oct., 2001 https://doi.org/10.1145/502175.502182
  2. S. Santhanam, 'Strong ARM SA110-A 160Mhz 32b 0.5W CMOS ARM Processor,' Hot Chips 8: A Symposium on High-Performance Chips, Aug., 1996
  3. A. j. Smith, 'Cache Memories,' ACM Computing Surveys, Vol.14, No.3, pp.473-530, Sep., 1982 https://doi.org/10.1145/356887.356892
  4. Kessler, R. E, Jooss, R., Lebeck, A.,and Hill, M. D, 'Inexpensive Implementations of Set-Associativity,' Proc. of the 16th International Symposium on Computer Architectures, pp.131-139, 1989 https://doi.org/10.1145/74925.74941
  5. Norman P. Jouppi, 'Improving Direct-Mapped Cache Perfor mance by the Addition of a Small Fully Associative Cache and Prefetch Buffer,' 17th ISCA, pp.364-347, May, 1990 https://doi.org/10.1109/ISCA.1990.134547
  6. C. Zhang and F. Vahid, 'Using a Victim Buffer in an Application-Specific Memory Hierarchy,' Design Automation and Test in Europe Conference (DATE), pp.220-225, February, 2004
  7. P. J. de Langen et al, 'Reducing traffic generated by conflict misses in caches,' The 1st ACM International Conference on Computing Frontiers, pp.235-239, Apr., 2004 https://doi.org/10.1145/977091.977123
  8. A. Gonzalez, C. Aliagas and M. Mateo, Data Cache with Multiple Caching Strategies Tuned to Differnt Tyoes of Locality, Suppercomputing '95, pp.338-347, July, 1995 https://doi.org/10.1145/224538.224622
  9. B. Juurlink, 'Unified Dual Data Cache,' Proceedings. Euromicro Symposium on Digital System Design, pp.33-40, Sept., 2003
  10. Jude A. Rivers, and Edward S. Davidson, Reducing Conflicts in Direct-Mapped Caches with a Temporality-Based Design, Proceedings of the 1996 International Conference on Parallel Processing, Vol.I, pp.151-162, Aug., 1996 https://doi.org/10.1109/ICPP.1996.537156
  11. J. H. Lee, J. S. Lee, and S. D. Kim, 'A New Cache Architecture Based on Temporal and Spatial Locality,' J. Systems Architecture, vol.46, pp.1451-1467, Sept., 2000 https://doi.org/10.1016/S1383-7621(00)00035-7
  12. A. Naz, M. Rezaei, K. Kavi and P. Sweany, Improving data. cache performance with integrated use of split caches, victim. cache and stream buffers, in Proceedings of the Workshop. on Memory performance dealing with appli cations, systems and architecture, Conference (DATA) Sept., 2004 https://doi.org/10.1145/1101868.1101876
  13. Guthaus, M.R.; Ringenberg, J.S. Ernst, D. Austin, T.M. Mudge, T.; Brown, R.B., 'MiBench: A free, commercially representative embedded benchmark suite,' IEEE 4th Annual Workshop on Workload Characterization, Austin, TX, December, 2001
  14. Mibench Version 1.0 http://www.eecs.umich.edu/mibench/
  15. Henning, John L., 'SPEC CPU2000: Measuring CPU Perfor mance in the New Millennium,' IEEE Computer, Vol.33, No.7, pp.28-35, July, 2000 https://doi.org/10.1109/2.869367
  16. SPEC Benchmark Suite. Information available at http://www.spec.org
  17. D. Burger and T. M. Austin, 'The SimpleScalar tool set, version 2.0, Technical Report TR-97-1342,' University of Wisconsin-Madison, 1997
  18. G. Reinman. and N. P. Jouppi, 'CACTI 3.0: An integrated cache timing and power, and area model,' Compaq WRL Report, Aug., 2001