• Title/Summary/Keyword: Shared Caches

Search Result 16, Processing Time 0.03 seconds

Static Timing Analysis of Shared Caches for Multicore Processors

  • Zhang, Wei;Yan, Jun
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.4
    • /
    • pp.267-278
    • /
    • 2012
  • The state-of-the-art techniques in multicore timing analysis are limited to analyze multicores with shared instruction caches only. This paper proposes a uniform framework to analyze the worst-case performance for both shared instruction caches and data caches in a multicore platform. Our approach is based on a new concept called address flow graph, which can be used to model both instruction and data accesses for timing analysis. Our experiments, as a proof-of-concept study, indicate that the proposed approach can accurately compute the worst-case performance for real-time threads running on a dual-core processor with a shared L2 cache (either to store instructions or data).

Bounding Worst-Case Performance for Multi-Core Processors with Shared L2 Instruction Caches

  • Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.1
    • /
    • pp.1-18
    • /
    • 2011
  • As the first step toward real-time multi-core computing, this paper presents a novel approach to bounding the worst-case performance for threads running on multi-core processors with shared L2 instruction caches. The idea of our approach is to compute the worst-case instruction access interferences between different threads based on the program control flow information of each thread, which can be statically analyzed. Our experiments indicate that the proposed approach can reasonably estimate the worst-case shared L2 instruction cache misses by considering the inter-thread instruction conflicts. Also, the worst-case execution time (WCET) of applications running on multi-core processors estimated by our approach is much better than the estimation by simply assuming all L2 instruction accesses are misses.

Multicore-Aware Code Co-Positioning to Reduce WCET on Dual-Core Processors with Shared Instruction Caches

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.1
    • /
    • pp.12-25
    • /
    • 2012
  • For real-time systems it is important to obtain the accurate worst-case execution time (WCET). Furthermore, how to improve the WCET of applications that run on multicore processors is both significant and challenging as the WCET can be largely affected by the possible inter-core interferences in shared resources such as the shared L2 cache. In order to solve this problem, we propose an innovative approach that adopts a code positioning method to reduce the inter-core L2 cache interferences between the different real-time threads that adaptively run in a multi-core processor by using different strategies. The worst-case-oriented strategy is designed to decrease the worst-case WCET among these threads to as low as possible. The other two strategies aim at reducing the WCET of each thread to almost equal percentage or amount. Our experiments indicate that the proposed multicore-aware code positioning approaches, not only improve the worst-case performance of the real-time threads but also make good tradeoffs between efficiency and fairness for threads that run on multicore platforms.

Multicore Real-Time Scheduling to Reduce Inter-Thread Cache Interferences

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.1
    • /
    • pp.67-80
    • /
    • 2013
  • The worst-case execution time (WCET) of each real-time task in multicore processors with shared caches can be significantly affected by inter-thread cache interferences. The worst-case inter-thread cache interferences are dependent on how tasks are scheduled to run on different cores. Therefore, there is a circular dependence between real-time task scheduling, the worst-case inter-thread cache interferences, and WCET in multicore processors, which is not the case for single-core processors. To address this challenging problem, we present an offline real-time scheduling approach for multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model. Our evaluation indicates that the enhanced scheduling approach is more likely to generate feasible and safe schedules with stricter timing constraints in multicore real-time systems.

Counter-Based Approaches for Efficient WCET Analysis of Multicore Processors with Shared Caches

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.4
    • /
    • pp.285-299
    • /
    • 2013
  • To enable hard real-time systems to take advantage of multicore processors, it is crucial to obtain the worst-case execution time (WCET) for programs running on multicore processors. However, this is challenging and complicated due to the inter-thread interferences from the shared resources in a multicore processor. Recent research used the combined cache conflict graph (CCCG) to model and compute the worst-case inter-thread interferences on a shared L2 cache in a multicore processor, which is called the CCCG-based approach in this paper. Although it can compute the WCET safely and accurately, its computational complexity is exponential and prohibitive for a large number of cores. In this paper, we propose three counter-based approaches to significantly reduce the complexity of the multicore WCET analysis, while achieving absolute safety with tightness close to the CCCG-based approach. The basic counter-based approach simply counts the worst-case number of cache line blocks mapped to a cache set of a shared L2 cache from all the concurrent threads, and compares it with the associativity of the cache set to compute the worst-case cache behavior. The enhanced counter-based approach uses techniques to enhance the accuracy of calculating the counters. The hybrid counter-based approach combines the enhanced counter-based approach and the CCCG-based approach to further improve the tightness of analysis without significantly increasing the complexity. Our experiments on a 4-core processor indicate that the enhanced counter-based approach overestimates the WCET by 14% on average compared to the CCCG-based approach, while its averaged running time is less than 1/380 that of the CCCG-based approach. The hybrid approach reduces the overestimation to only 2.65%, while its running time is less than 1/150 that of the CCCG-based approach on average.

Optimizing Shared Memory Accesses for GPGPU Computations (GPGPU를 위한 공유 메모리 최적화)

  • Tran, Nhat-Phuong;Lee, Myungho;Hong, Sugwon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.197-199
    • /
    • 2012
  • Recently, a lot of general-purpose application programs in addition to graphic applications have been parallelized for boosting their performance using Graphic Processing Unit (GPU)'s excellent floating-point performance. In order to maximize the application performance on GPUs, optimizing the memory hierarchy and the on-chip caches such as the shared memory is essential. In this paper, we propose techniques to optimize the shared memory, and verify its effectiveness using a pattern matching application program.

A Client-based distributed web caching system (클라이언트 기반 분산 웹캐싱 시스템)

  • Park, Jong-Ho;Yoo, Sung-Goo;Chong, Kil-To
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.829-830
    • /
    • 2006
  • A distributed web caching system can transmit information to a user quickly and stably, avoiding a congested internet network by storing and later supplying requested content to a cache that is distributed and shared like a proxy server. This paper proposes a client-based distributed web caching system that assigns an object and controls the load using a user's direct connection to shared caches, without the aid of additional domain name system (DNS) requests. The proposed system simplifies information transmission by reducing both DNS queries and delay time.

  • PDF

Dynamic Limited Directory Scheme for Distributed Shared Memory Systems (분산공유 메모리 시스템을 위한 동적 제한 디렉터리 기법)

  • Lee, Dong-Gwang;Gwon, Hyeok-Seong;Choe, Seong-Min;An, Byeong-Cheol
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.4
    • /
    • pp.1098-1105
    • /
    • 1999
  • The caches in distributed shared memory systems enhance the performance by reducing memory access latency and communication overhead, but they must solve the cache coherence problem. This paper proposes a new directory protocol to solve the cache coherence problem and to improve the system performance in distributed shared memory systems. To maintain the cache coherence of shared data, processors within a limited distance reduce the communication overhead by using a bit-vector like the full directory scheme. Processors over a limited distance store pointers in a directory pool. Since the bit-vector and the directory pool remove the unnecessary cache invalidations, the proposed scheme reduces the communication traffic and improves the system performance. The dynamic limited directory scheme reduces the communication traffic up to 66 percents compared with the limited directory scheme and the number of directory access up to 27 percents compared with the dynamic pointer allocation scheme.

  • PDF

Performance Evaluation of Disk Replacement Algorithms in a Shared Cluster (공유 디스크 클러스터에서 버퍼 고체 알고리즘의 성능 평가)

  • Cho, Haeng-Rae
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.469-480
    • /
    • 2008
  • A shared disk (SD) cluster couples multiple nodes for high performance transaction processing, and all the coupled nodes share a common database at the disk level. To reduce the number of disk accesses, each node caches database pages in its memory buffer. Since a particular page may be cached simultaneously in different nodes, cache consistency should be maintained to ensure that nodes can always access the most recent version of database pages. Most cache consistency schemes proposed in the SD cluster adopted LRU as a buffer replacement algorithm. In this paper, we first present four buffer replacement algorithms that consider the characteristics of the SD cluster. Then we compare the performance of the buffer replacement algorithms. We perform the experiments on a variety of cluster configurations and database workloads. The experiment results show that the proposed algorithms achieve performance improvement up to 5 times of LRU algorithm.

Formal Verification of RACE Protocol Using VIS (VIS를 이용한 RACE 포로토콜의 정형검증)

  • Um, Hyun-Sun;Choi, JIn-Young;Han, Woo-Jong;Ki, An-Do;Shim, Kyu-Hyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.7
    • /
    • pp.2219-2228
    • /
    • 2000
  • Caches in a multiprocessing environment introduce the cache coherence problem. When multiple processors maintain locally cached copies of a unique shared-memory location, any local modification of the location can result in a globally inconsistent view of memory. Cache coherence protocols are important to operate a shared-memory multiprocessor system with efficiency and correctness. Since random testing and simulations are not enough to validate correctness of protocols, it is necessary to develop efficient and reliable verification methods. In this appear we present our experience in using VIS (Verification Interacting with Synthesis), a tool of formal method, to analyze a number of property of a cache coherence protocol, RACE (Remote Access Cache coherent Enforcement).

  • PDF