A Dual Slotted Ring Organization for Reducing Memory Access Latency in Distributed Shared Memory System

분산 공유 메모리 시스템에서 메모리 접근지연을 줄이기 위한 이중 슬롯링 구조

  • Min, Jun-Sik (Informatization Support Program Center) ;
  • Chang, Tae-Mu (Dept.of Computer Multi Media Engineering, Dongguk University)
  • 민준식 (한국전산원 국가망 이용관리팀) ;
  • 장태무 (동국대학교 컴퓨터.멀티미디어공학과)
  • Published : 2001.12.01

Abstract

Advances in circuit and integration technology are continuously boosting the speed of processors. One of the main challenges presented by such developments is the effective use of powerful processors in shared memory multiprocessor system. We believe that the interconnection problem is not solved even for small scale shared memory multiprocessor, since the speed of shared buses is unlikely to keep up with the bandwidth requirements of new powerful processors. In the past few years, point-to-point unidirectional connection have emerged as a very promising interconnection technology. The single slotted ring is the simplest form point-to-point interconnection. The main limitation of the single slotted ring architecture is that latency of access increase linearly with the number of the processors in the ring. Because of this, we proposed the dual slotted ring as an alternative to single slotted ring for cache-based multiprocessor system. In this paper, we analyze the proposed dual slotted ring architecture using new snooping protocol and enforce simulation to compare it with single slotted ring.

집적회로 기술의 발달은 처리기의 속도를 계속적으로 증가시켜 왔다. 처리기 응용분야의 주요한 도전은 공유 메모리 다중 처리기 시스템에서 고성능 처리기들을 효과적으로 사용하고자 하는 것이다. 우리는 상호 연결망 문제가 소규모의 공유 메모리 다중처리기 시스템에서 조차 완전히 해결되었다고 생각하지 않는다. 그 이유는 공유버스의 속도는 새로운 강력한 처리기들의 대역폭 요구를 수용할 수 없기 때문이다. 지난 수년간 점대점 단방향 연결은 매우 가능성 있는 상호 연결망 기술로서 대두되었다. 단일 슬롯링은 점대점 상호 연결망의 가장 간단한 형태이다. 단일 슬롯링 구조의 단점은 링에서 처리기의 수가 증가함에 따라 메모리 접근지연 시간이 선형적으로 증가한다는 것이다. 이런 이유로 우리는 캐쉬 기반의 다중처리기 시스템에서 단일 슬롯링을 대체할 수 있는 이중 슬롯링 구조를 제안한다. 또한 본 논문에서 새로운 스누핑 프로토콜을 사용하는 이중 슬롯링 구조를 분석하고 분석적모델과 모의 실험을 통하여 기존의 단일 슬롯링과 성능을 비교한다.

Keywords

References

  1. Per Stenstrom, 'A Survey of Cache Coherence Schemes for Multiprocessor,' IEEE Computer, pp.12-24, Jun. 1990 https://doi.org/10.1109/2.55497
  2. L.A. Barroso and M. Dubois, 'Cache Coherence on a Slotted Ring,' Intl. Conf. on Parallel Processing, pp.1230-1237, 1991
  3. L.A. Barroso and M. Dubois, 'The Performance of Cache-Coherent Ring-based Multiprossors,' Proc. 20th Annul. Intl. Symp. on computer Architecture, pp.268-277, May, 1993
  4. M. Dubois, 'Cache Architectures in Tightly Coupled Multiprocessors,' IEEE Computer, pp.9-11, June, 1990 https://doi.org/10.1109/MC.1990.10053
  5. M. Dubois and F. Briggs, 'Effect of Cache Coherency in Multiprocessors,' IEEE Trans. on Computer, No.11, pp.1083-1099, Nov. 1982 https://doi.org/10.1109/TC.1982.1675925
  6. J. Archibald and J.L. Bear, 'Cache Coherence protocols : Evaluation Using a Multiprocessor Simulation Model,' Acm. Trans. Comput. Sys. Vol.4, pp.273-298, Nov. 1986 https://doi.org/10.1145/6513.6514
  7. D. Chaiken et al., 'Directory-Based Cache Coherence in Large-Scale Multiprocessor,' IEEE Computer, pp.49-57, June, 1990 https://doi.org/10.1109/2.55500
  8. Davor Magdic. Limes : A Multiprocessor simulation Environment for PC Platforms(http://galeb.etf.bg.ac.yu/~dav0r/limes)
  9. Z. Vranesic, M. Stumm, D. Lewis and R. White, 'Hector : A hierarchically Structured Shared Memory Multiprocessor,' IEEE Computer, Vol.24, No.1, pp.72-78, January, 1991 https://doi.org/10.1109/2.67196
  10. Kendall Square Research, 'Technical Summary,' Walthan, Massachusetts, 1992
  11. D. Gustavson, 'The Scalable Coherence Interface and Related Standards Projects,' IEEE Micro, Vol.12, No.1, February, 1992 https://doi.org/10.1109/40.124376
  12. L. Censier, P. Feautrier, 'A new Solution to Coherence Problems in Multicache Systems,' IEEE Trans. On Computers C-27(12), pp.1112-1118, December, 1978 https://doi.org/10.1109/TC.1978.1675013
  13. D. Chaiken, C. Fields, K. Kurihara and A. Agawal, 'Directory-Based Cache Coherence in Large Scale Multiprosessors,' IEEE Computer, Vol.23, No.6, pp.49-59, June, 1990 https://doi.org/10.1109/2.55500
  14. L. Lamport, 'How to Make a Multiprocessor Computer that correctly executes Multiprocess Programs,' IEEE Trans. on Compters, Vol.C-28, No.9, pp.690-691, Sept. 1979 https://doi.org/10.1109/TC.1979.1675439
  15. C. Scheurich and M. Dubois, 'Correct Memory Operation of Cache-based Multiprocessors,' The 14th Intl. Symp. on Computer Architecture, pp.234-243, 1987 https://doi.org/10.1145/30350.30377
  16. S. WOO, and J. Singh, The SPLASH2 Programs : Characterization and Methodological Considerations. In Proceedings of the 22nd Annual Int'l Symp. On Computer Architecture, 43-63, June, 1995
  17. PANDA : Ring-Based Multiprocessor System using New Snooping Protocol,' Sung Woo Chung, Seong Tae Jhang, Chu Shik Jhon, ICPADS'98(International Conferene on Parallel And DIstributed Systems), pp.10-17, December, 1998 https://doi.org/10.1109/ICPADS.1998.741012
  18. 'STARRING : Slotted Ring_Based Multiprocessor System with a Central Directory Module,' Seong Tae Jhang, et al., 1997 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Canada, Aug. 20-22, 1997
  19. A. Grbic, S. Brown, S. Caranci, R. Grindley, M. Gusat, G. Lemieux, K. Loveless, N. Manjikian, S. Srbljic, M. Stumm, Z. Vranesic, and Z. Zilic, 'Design and Implementation of the NUMAchine Multiprocessor,' To appear in Proceedings of the 35th IEEE Design Automation Conference, San Francisco, June, 1998 https://doi.org/10.1145/277044.277057