• Title/Summary/Keyword: Cache Coherence

Search Result 40, Processing Time 0.039 seconds

An Efficient Cache Coherence Protocol for Multi-Core Processors with Ring Interconnects (링 연결구조 기반의 멀티코어 프로세서를 위한 캐시 일관성 유지 기법)

  • Park, Jin-Young;Choi, Lynn
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.8
    • /
    • pp.768-772
    • /
    • 2008
  • Today's microprocessor normally includes several processing cores to reduce the energy consumption without losing performance. In this paper, data transfer ordering mechanism can be efficiently used for cache coherence solution in unidirectional ring interconnect. RING-DATA ORDER combines the simplicity of GREEDY-ORDER and the performance of RING-ORDER. RING-DATA ORDER can be easily applicable to multicore processor with unidirectional ring interconnect.

Reduction of Read Access Latency by Invalid Hint in Directory-Based Cache Coherence Scheme (디렉토리를 이용한 캐쉬 일관성 유지 기법에서 무효화 힌트를 이용한 읽기 접근 시간 감소)

  • Oh, Seung-Taek;Rhee, Yun-Seok;Maeng, Seung-Ryoul;Lee, Joon-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.408-415
    • /
    • 2000
  • Large scale shared memory multiprocessors have suffered from large access latency to shared memory. The large latency partially stems from a feature of directory-based cache coherence schemes which require a shared memory access to be serviced at a home node of the memory block. The home visit results in three or more hops traversal for a memory read access. The traversal becomes much longer as a system scales up. In this paper, we propose a new cache coherence scheme that reduces read access latency. The proposed scheme exploits ideas of invalid hint. Invalid hint for a cache block means which node has invalidated the cache block before. Thus a read access request can be directly sent to and serviced by the node (called owner) without help of a home node. Execution-driven simulation is employed to evaluate performance of the proposed scheme. The simulation results show that read access latency and execution time are reduced.

  • PDF

Cache Replacement and Coherence Policies Depending on Data Significance in Mobile Computing Environments (모바일 컴퓨팅 환경에서 데이터의 중요도에 기반한 캐시 교체와 일관성 유지)

  • Kim, Sam-Geun;Kim, Hyung-Ho;Ahn, Jae-Geun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.2A
    • /
    • pp.149-159
    • /
    • 2011
  • Recently, mobile computing environments are becoming rapidly common. This trend emphasizes the necessity of accessing database systems on fixed networks from mobile platforms via wireless networks. However, it is not an appropriate way that applies the database access methods for traditional computing environments to mobile computing environments because of their essential restrictions. This paper suggests a new agent-based mobile database access model and also two functions calculating data significance scores to choose suitable data items for cache replacement and coherence policies. These functions synthetically reflect access term, access frequency and tendency, update frequency and tendency, and data item size distribution. As the result of simulation experiment, our policies outperform LRU, LIX, and SAIU policies in aspects of decrement of access latency, improvement of cache byte hit ratio, and decrease of cache byte pollution ratio.

Dynamic Limited Directory Scheme for Distributed Shared Memory Systems (분산공유 메모리 시스템을 위한 동적 제한 디렉터리 기법)

  • Lee, Dong-Gwang;Gwon, Hyeok-Seong;Choe, Seong-Min;An, Byeong-Cheol
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.4
    • /
    • pp.1098-1105
    • /
    • 1999
  • The caches in distributed shared memory systems enhance the performance by reducing memory access latency and communication overhead, but they must solve the cache coherence problem. This paper proposes a new directory protocol to solve the cache coherence problem and to improve the system performance in distributed shared memory systems. To maintain the cache coherence of shared data, processors within a limited distance reduce the communication overhead by using a bit-vector like the full directory scheme. Processors over a limited distance store pointers in a directory pool. Since the bit-vector and the directory pool remove the unnecessary cache invalidations, the proposed scheme reduces the communication traffic and improves the system performance. The dynamic limited directory scheme reduces the communication traffic up to 66 percents compared with the limited directory scheme and the number of directory access up to 27 percents compared with the dynamic pointer allocation scheme.

  • PDF

Directory Cache Coherence Scheme using the Number-Balanced Binary Tree (수 평형 이진트리를 이용한 디렉토리 캐쉬 일관성 유지 기법)

  • Seo, Dae-Wha
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.3
    • /
    • pp.821-830
    • /
    • 1997
  • The directory-based cache coherence scheme is an attractive approach to solve the caceh coherence problem in a large-scale shared-memory multiprocessor.However, the exsting directory-based schemes have some problens such as the enormous storage overhead for a directory, the long invalidation latency, the heavy network condes-tion, and the low scalability.For resolving these problems, we propose a new directroy- based caceh coherence scheme which is suitable for building scalable, shred-memory multiprocessors.In this scheme, each directory en-try ofr a given memory block is a number-balanced binaty tree(NBBT) stucture.The NBBT has several proper-ties to effciently maintain the directory for the cache consistency such that the shape is unique, the maximum depth is [log$_2$n], and the tree has the minimum number of leaf nodes among the binarry tree with n nodes.Therefore, this scheme can reduce the storage overhead, the network traffic, and the inbalidation latency and can ensutr the high- scalability the large-scale shared-memory multiprocessors.

  • PDF

Performance Analysis of Futurebus+ based Multiprocessor Systems with MESI Cache Coherence Protocol (MESI 캐쉬 코히어런스 프로토콜을 사용하는 Futurebus+ 기반 멀티프로세서 시스템의 성능 평가)

  • 고석범;강인곤;박성우;김영천
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1815-1827
    • /
    • 1993
  • In this paper, we evaluate the performance of a Futurebus based multiprocessor system with MESI cache coherence protocol for four bus transaction types. Graphical symbols and compiler of SLAM II are used in modeling and simulation. A steady-state probability of each state for MESI protocol is computed by a Markov chain. The probability of each state is used as an input value for a correct simulation. Processor utilization, memory utilization, bus utilization, and the waiting time for bus arbitration are measured in terms of the number of processors, the hit ratio of cache memory, the probability of internal operation, and bus bandwidth.

  • PDF

Eager Data Transfer Mechanism for Reducing Communication Latency in User-Level Network Protocols

  • Won, Chul-Ho;Lee, Ben;Park, Kyoung;Kim, Myung-Joon
    • Journal of Information Processing Systems
    • /
    • v.4 no.4
    • /
    • pp.133-144
    • /
    • 2008
  • Clusters have become a popular alternative for building high-performance parallel computing systems. Today's high-performance system area network (SAN) protocols such as VIA and IBA significantly reduce user-to-user communication latency by implementing protocol stacks outside of operating system kernel. However, emerging parallel applications require a significant improvement in communication latency. Since the time required for transferring data between host memory and network interface (NI) make up a large portion of overall communication latency, the reduction of data transfer time is crucial for achieving low-latency communication. In this paper, Eager Data Transfer (EDT) mechanism is proposed to reduce the time for data transfers between the host and network interface. The EDT employs cache coherence interface hardware to directly transfer data between the host and NI. An EDT-based network interface was modeled and simulated on the Linux-based, complete system simulation environment, Linux/SimOS. Our simulation results show that the EDT approach significantly reduces the data transfer time compared to DMA-based approaches. The EDTbased NI attains 17% to 38% reduction in user-to-user message time compared to the cache-coherent DMA-based NIs for a range of message sizes (64 bytes${\sim}$4 Kbytes) in a SAN environment.

Dynamic Directory Table: On-Demand Allocation of Directory Entries for Active Shared Cache Blocks (동적 디렉터리 테이블 : 공유 캐시 블록의 디렉터리 엔트리 동적 할당)

  • Bae, Han Jun;Choi, Lynn
    • Journal of KIISE
    • /
    • v.44 no.12
    • /
    • pp.1245-1251
    • /
    • 2017
  • In this study we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than one core. Thus, we do not maintain coherence for private blocks, substantially reducing the number of directory entries. Even for shared blocks, we allocate directory entry dynamically only when the block is actively shared, further reducing the number of directory entries at runtime. For this, we propose a new directory architecture called dynamic directory table (DDT), which is implemented as a cache of active directory entries. Through our detailed simulation on PARSEC benchmarks, we show that DDT can outperform the expensive full-map directory by a slight margin with only 17.84% of directory area across a variety of different workloads. This is achieved by its faster access and high hit rates in the small directory. In addition, we demonstrate that even smaller DDTs can give comparable or higher performance compared to recent directory optimization schemes such as SPACE and DGD with considerably less area.

A Remote Cache Coherence Protocol for Single Shared Memory in Multiprocessor System (단일 공유 메모리를 가지는 다중 프로세서 시스템의 원격 캐시 일관성 유지 프로토콜)

  • Kim, Seong-Woon;Kim, Bo-Gwan
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.6
    • /
    • pp.19-28
    • /
    • 2005
  • The multiprocessor architecture is a good method to improve the computer system performance. The CC-NUMA provides a single shared space with the physically distributed memories is used widely in the multiprocessor computer system. A CC-NUMA has the full-mapped directory for the shared memory md uses a remote cache memory for tile fast memory access. In this paper, we propose a processing node architecture for a CC-NUMA system and a cache coherency protocol on the physically distributed but logically shared system. We show an implementation result of the system which is adopted the cache coherency protocol.