Keeping-ownership Cache Replacement Policies for Remote Access Caches of NUMA System

NUMA 시스템에서 소유권에 근거한 원격 캐시 교체 정책

  • 신숭현 (서울대학교 전기컴퓨터공학부) ;
  • 곽종욱 (서울대학교 전기컴퓨터공학) ;
  • 장성태 (수원대학교 정보공학대학 컴퓨터학) ;
  • 전주식 (서울대학교 전기컴퓨터공학부)
  • Published : 2004.08.01

Abstract

NUMA systems have remote access caches(RAC) in each local node to reduce the overhead for repeated remote memory accesses. By this RAC, memory latency and network traffic can be reduced and the performance of the multiprocessor system can be improved. Until now, several cache replacement policies have been proposed in recent years, and there also is cache replacement policy for multiprocessor systems. In this paper, we propose a cache replacement policy which is based on cache line coherence information. In this policy, the cache line that does not have an ownership is replaced first with respect to cache line that has an ownership. Like this way, the overhead to transfer ownership is avoided and the memory latency can be decreased. We also propose “Keeping-Ownership replacement policy with MRU (KOM)” and “Keeping-Ownership replacement policy with Reference Bit(KORB)” to reduce the frequent replacement penalty of the ownership-lacking cache line. We compare and analyze these with LRU and Pseudo LRU(PLRU). The simulation shows that KOM outperforms the PLRU by 25%, and KORB outperforms the PLRU by 13%. Although the hardware cost of KOM is very small, the performance of KOM is nearly equal to that of the LRU.

NUMA 시스템은 원격의 메모리에 반복적으로 접근하는 오버헤드를 피하기 위해 지역 노드내에 원격 캐시를 둔다. 이러한 원격 캐시를 사용하여 원격 메모리로의 접근 지연 시간을 감소시키고 네트워크 상의 트래픽 양을 줄이지 못한다면 다중 프로세서 시스템의 성능 저하는 명백하다. 성능 상의 여러 기준 중에서 메모리 시스템과 관련해서는 캐시 교체 정책에 관한 연구가 계속되었고, 그 중 다중 프로세서 시스템에서의 캐시 교체 정책에 관한 연구도 이어졌다. 본 논문에서는 캐시의 공유 상태에 기반을 둔 교체 정책을 제안한다. 소유권이 없는 캐시 라인을 먼저 교체하고, 이를 통해 소유권이 옮겨지는 오버헤드를 피하여 메모리 지연 시간을 줄인다. 또한 소유권이 없는 캐시 라인에 지나친 피해가 얼도록, “MRU를 사용한 소유권 유지 교체 정책(KOM)”과 “참조 비트를 사용한 소유권 유지 교체 정책(KORB)”를 제안하고, 이를 LRU, Pseudo LRU(PLRU)와 비교한다. KOM과 KORB는 PLRU에 비하여 수행 시간에서 25%, 13%씩 각각 향상을 보였다. 특히 KOM은 하드웨어 복잡도가 현저히 낮음에도 불구하고 LRU에 가까운 성능을 나타냈다.

Keywords

References

  1. 김형호, '지점간 링크를 이용한 스누핑 버스의 설계 및 성능 분석', 서울대학교 석사학위 논문, 1996
  2. D.J. Lilja, 'Cache coherence in large-scale shared-memory multiprocessors : Issues and comparisons,' ACM Computing Surveys, 25(3):303-338, Sept, 1993 https://doi.org/10.1145/158439.158907
  3. Per Stenstrom, Truman Joe, and Anoop Gupta, 'Comparative Performanc Evaluation of Cache-Coherent NUMA and COMA Architectures,' In the 19th Int'l Symp. on Computer Architecture, pages 80-91, 1992 https://doi.org/10.1145/139669.139705
  4. Namgi Kim, Sungkee Jean, Jinsoo Kim;Hyunsoo Yoon, 'Cache Replacement Schemes for Data-Driven Label Switching Networks,'IEEE Workshop on High Performance Switching and Routing, Page(s): 223-227, 2001 https://doi.org/10.1109/HPSR.2001.923636
  5. Yeung, K.H. Ng, K.W., 'An Optimal Cache Replacement Algorithm for Internet Systems,' 22nd Annual Conference on Local Computer Networks, page(s): 189-194, 1997
  6. Liangzhong Yin, Guohong Cao, Ying Cai, 'A Generalized Target-Driven Cache Replacement Policy for Mobile Environments,' 2003 Symposium on Applications and the Internet, Page(s): 14-21, 2003
  7. Jain, P., Devadas, S., Engels, D., Rudolph, L., 'Software-Assisted Cache Replacement Mechanism for Embedded Systems,' IEEE/ACM International Conference on Computer Aided Design, Page(s): 119-126, 2001 https://doi.org/10.1109/ICCAD.2001.968607
  8. Mounes-Toussi, F., Lilja, D.J., 'The Effect of Using State-Based Priority Information in a Shared-Memory Multiprocessor Cache Replacement Policy,' International Conference on Parallel Processing, Page(s): 217-224, 1998 https://doi.org/10.1109/ICPP.1998.708489
  9. Jaeheon Jeong, Michel Dubois, 'Cost-Sensitive Cache Replacement Algorithms,' In Proceedings of the 9th International Symposium on High-Performance Computer Architecture, p.327-337, February 2003 https://doi.org/10.1109/HPCA.2003.1183550
  10. A.J.Smith, 'Cache Memories,' ACM Computing Surveys, vol.3, pp. 473-530, September 1982 https://doi.org/10.1145/356887.356892
  11. S.T.Srivivasan, R.D. Ju, A.R. Lebeck and C.Wilkerson, 'Locality vs. Criticality,' In Proceedings of the 28th International Symposium on Computer Architecture, pp. 132-143, July 2001 https://doi.org/10.1109/ISCA.2001.937442
  12. N.Young, 'The k-server Dual and Loose Competitiveness for Paging,' Algorithmica, vol.11, no.6, pp.525-541, June 1994 https://doi.org/10.1007/BF01189992
  13. J.L. Hennessy and D.A. Patternson, 'Computer Architecture: A Quantitative Approach,' Second Edition, Morgan Kaufmann Publishers, 1996
  14. 신정헌, '이중 링 스누핑 버스 시스템에서 성능 향상을 위한 원격 캐시 교체 정책 연구', 서울대학교 석사학위 논문, 2001
  15. David E. Culler and Jaswinder Pal Singh with Anoop Gupta, Parallel Computer Architecture : A Hardward/Software Approach, Morgan Kaufmann Publishers, Inc, Page(s):306-311, 1998
  16. Kimming So and Rudolph N.Rechtschaffen, 'Cache operations by MRU change,' IEEE Transactions on Computers, Vol.37(6):700-709, Jun 1988 https://doi.org/10.1109/12.2208
  17. Jim Handy, 'The Cache Memory Book,' Academic Press, Inc, Page(s): 49-60, 1993
  18. J. E. Veenstra and R. J. Fowler. MINT: a front end for efficient simulation of shared-memory multiprocessors. In Proc. 2nd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 201-207, 1994 https://doi.org/10.1109/MASCOT.1994.284422
  19. S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, Methodological considerations and characterization of the SPLASH-2 parallel application suite. In Proc. 22th Annual International Symposium on Computer Architecture, 1995 https://doi.org/10.1145/225830.223990