• 제목/요약/키워드: Shared Caches

검색결과 16건 처리시간 0.053초

Static Timing Analysis of Shared Caches for Multicore Processors

  • Zhang, Wei;Yan, Jun
    • Journal of Computing Science and Engineering
    • /
    • 제6권4호
    • /
    • pp.267-278
    • /
    • 2012
  • The state-of-the-art techniques in multicore timing analysis are limited to analyze multicores with shared instruction caches only. This paper proposes a uniform framework to analyze the worst-case performance for both shared instruction caches and data caches in a multicore platform. Our approach is based on a new concept called address flow graph, which can be used to model both instruction and data accesses for timing analysis. Our experiments, as a proof-of-concept study, indicate that the proposed approach can accurately compute the worst-case performance for real-time threads running on a dual-core processor with a shared L2 cache (either to store instructions or data).

Bounding Worst-Case Performance for Multi-Core Processors with Shared L2 Instruction Caches

  • Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • 제5권1호
    • /
    • pp.1-18
    • /
    • 2011
  • As the first step toward real-time multi-core computing, this paper presents a novel approach to bounding the worst-case performance for threads running on multi-core processors with shared L2 instruction caches. The idea of our approach is to compute the worst-case instruction access interferences between different threads based on the program control flow information of each thread, which can be statically analyzed. Our experiments indicate that the proposed approach can reasonably estimate the worst-case shared L2 instruction cache misses by considering the inter-thread instruction conflicts. Also, the worst-case execution time (WCET) of applications running on multi-core processors estimated by our approach is much better than the estimation by simply assuming all L2 instruction accesses are misses.

Multicore-Aware Code Co-Positioning to Reduce WCET on Dual-Core Processors with Shared Instruction Caches

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • 제6권1호
    • /
    • pp.12-25
    • /
    • 2012
  • For real-time systems it is important to obtain the accurate worst-case execution time (WCET). Furthermore, how to improve the WCET of applications that run on multicore processors is both significant and challenging as the WCET can be largely affected by the possible inter-core interferences in shared resources such as the shared L2 cache. In order to solve this problem, we propose an innovative approach that adopts a code positioning method to reduce the inter-core L2 cache interferences between the different real-time threads that adaptively run in a multi-core processor by using different strategies. The worst-case-oriented strategy is designed to decrease the worst-case WCET among these threads to as low as possible. The other two strategies aim at reducing the WCET of each thread to almost equal percentage or amount. Our experiments indicate that the proposed multicore-aware code positioning approaches, not only improve the worst-case performance of the real-time threads but also make good tradeoffs between efficiency and fairness for threads that run on multicore platforms.

Multicore Real-Time Scheduling to Reduce Inter-Thread Cache Interferences

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • 제7권1호
    • /
    • pp.67-80
    • /
    • 2013
  • The worst-case execution time (WCET) of each real-time task in multicore processors with shared caches can be significantly affected by inter-thread cache interferences. The worst-case inter-thread cache interferences are dependent on how tasks are scheduled to run on different cores. Therefore, there is a circular dependence between real-time task scheduling, the worst-case inter-thread cache interferences, and WCET in multicore processors, which is not the case for single-core processors. To address this challenging problem, we present an offline real-time scheduling approach for multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model. Our evaluation indicates that the enhanced scheduling approach is more likely to generate feasible and safe schedules with stricter timing constraints in multicore real-time systems.

Counter-Based Approaches for Efficient WCET Analysis of Multicore Processors with Shared Caches

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • 제7권4호
    • /
    • pp.285-299
    • /
    • 2013
  • To enable hard real-time systems to take advantage of multicore processors, it is crucial to obtain the worst-case execution time (WCET) for programs running on multicore processors. However, this is challenging and complicated due to the inter-thread interferences from the shared resources in a multicore processor. Recent research used the combined cache conflict graph (CCCG) to model and compute the worst-case inter-thread interferences on a shared L2 cache in a multicore processor, which is called the CCCG-based approach in this paper. Although it can compute the WCET safely and accurately, its computational complexity is exponential and prohibitive for a large number of cores. In this paper, we propose three counter-based approaches to significantly reduce the complexity of the multicore WCET analysis, while achieving absolute safety with tightness close to the CCCG-based approach. The basic counter-based approach simply counts the worst-case number of cache line blocks mapped to a cache set of a shared L2 cache from all the concurrent threads, and compares it with the associativity of the cache set to compute the worst-case cache behavior. The enhanced counter-based approach uses techniques to enhance the accuracy of calculating the counters. The hybrid counter-based approach combines the enhanced counter-based approach and the CCCG-based approach to further improve the tightness of analysis without significantly increasing the complexity. Our experiments on a 4-core processor indicate that the enhanced counter-based approach overestimates the WCET by 14% on average compared to the CCCG-based approach, while its averaged running time is less than 1/380 that of the CCCG-based approach. The hybrid approach reduces the overestimation to only 2.65%, while its running time is less than 1/150 that of the CCCG-based approach on average.

GPGPU를 위한 공유 메모리 최적화 (Optimizing Shared Memory Accesses for GPGPU Computations)

  • 쟌 느앗 프엉;이명호;홍석원
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2012년도 추계학술발표대회
    • /
    • pp.197-199
    • /
    • 2012
  • 최근 GPU 의 뛰어난 부동 소수점 연산 능력을 활용하여 그래픽 이외에 다양한 응용 프로그램들의 병렬화 및 성능최적화가 활발하게 이루어지고 있다. 이러한 GPU 의 성능을 극대화하기 위해서는 메모리 계층구조 및 shared memory 를 비롯한 on-chip 메모리의 사용을 최적화하는 것이 필수적이다. 본 논문에서는 이러한 shared memory 의 사용을 최적화하기 위한 기법들을 제안하고, 이를 패턴 매칭 응용 프로그램에 적용하여 효용성을 검증한다.

클라이언트 기반 분산 웹캐싱 시스템 (A Client-based distributed web caching system)

  • 박종호;유성구;정길도
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2006년도 하계종합학술대회
    • /
    • pp.829-830
    • /
    • 2006
  • A distributed web caching system can transmit information to a user quickly and stably, avoiding a congested internet network by storing and later supplying requested content to a cache that is distributed and shared like a proxy server. This paper proposes a client-based distributed web caching system that assigns an object and controls the load using a user's direct connection to shared caches, without the aid of additional domain name system (DNS) requests. The proposed system simplifies information transmission by reducing both DNS queries and delay time.

  • PDF

분산공유 메모리 시스템을 위한 동적 제한 디렉터리 기법 (Dynamic Limited Directory Scheme for Distributed Shared Memory Systems)

  • 이동광;권혁성;최성민;안병철
    • 한국정보처리학회논문지
    • /
    • 제6권4호
    • /
    • pp.1098-1105
    • /
    • 1999
  • 분산 공유 메모리(distributed shared memory) 시스템에서 캐쉬는 메모리 접근 지연과 통신 부하 줄임으로 성능을 향상시킬 수 있으나 캐쉬일관성 문제를 해결하여야 한다. 본 논문은 DSM 시스템에서 캐쉬일관성 문제를 해결하고 성능을 향상시킬 수 있는 새 디렉터리 프로토콜을 제안한다. 캐시 일관성을 유지하기 일정거리 이내에 있는 처리기는 전체 디렉터리 기법처럼 비트 벡터를 사용하여 통신 오버헤드를 줄일 수 있다. 그리고 일정거리 이상에 있는 처리기는 포인터를 디렉터리 풀에 저장한다. 이 비트 벡터와 디렉터리 풀의 사용은 불필요한 캐쉬 무효화를 방지하므로 시스템의 성능을 향상시킬 수 있다. 제안한 기법은 제한 디렉터리 기법보다 통행량을 66%까지 줄일 수 있으며 동적할당 디렉터리 기법보다 디렉터리 접근 회수도 27%까지 각각 줄일 수 있다.

  • PDF

공유 디스크 클러스터에서 버퍼 고체 알고리즘의 성능 평가 (Performance Evaluation of Disk Replacement Algorithms in a Shared Cluster)

  • 조행래
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제35권6호
    • /
    • pp.469-480
    • /
    • 2008
  • 공유 디스크(Shared Disk: SD) 클러스터는 온라인 트랜잭션 처리를 위해 다수 개의 처리 노드들을 연동하는 방식으로, 모든 노드는 디스크 계층에서 데이터 베이스를 공유한다. 빈번한 디스크 액세스를 피하기 위하여 각 노드는 자신의 메모리 버퍼에 최근에 액세스한 페이지들을 캐싱한다. 이때 동일한 페이지가 여러 노드의 메모리 버퍼에 동시에 캐싱될 수 있으므로 각 노드가 최신의 내용을 액세스하기 위해서는 캐싱된 페이지의 일관성이 유지되어야 한다. SD 클러스터에서 기존에 제안된 대부분의 캐쉬 일관성 기법들은 버퍼 교체 알고리즘으로 LRU를 가정하였다. 이와는 달리 본 논문에서는 SD 클러스터의 특징을 고려한 네 가지의 버퍼 교체 알고리즘들을 제안하고 성능을 평가한다. 클러스터 구성과 데이터베이스 부하를 다양하게 변경하면서 실험을 수행하였고, 제안한 알고리즘은 LRU에 비해 최대 5배까지 성능이 향상됨을 확인할 수 있었다.

VIS를 이용한 RACE 포로토콜의 정형검증 (Formal Verification of RACE Protocol Using VIS)

  • 엄현선;최진영;한우종;기안도;심규현
    • 한국정보처리학회논문지
    • /
    • 제7권7호
    • /
    • pp.2219-2228
    • /
    • 2000
  • 다중 프로세서 시스템에서 각각의 프로세서에 할당되어 있는 지역 캐쉬에 데이터의 복사본이 분산 공유되어 있는 경우 데이터의 일관성 유지가 필요하다. 따라서 캐쉬 일관성 유지 프로토코콜은 공유 메모리 다중 프로세서 시스템의 정확하고 효율적인 작동이 중요하다. 그러므로 시스템이 복잡해짐과 비례하여 현재 사용되고있는 무작위적 테스트나 시뮬레이션은 프로토콜의 정확성을 확인하기에 충분하지 못하므로 보다 효율적이고 믿을 만한 검증 방법이 필요하다. 본 논문은 ETRI에서 개발된 캐쉬 일관성 프로토콜인 RACE(Remote Access Cache coherent Enforcement)프로토콜의 몇 가지 특성(property)들을 정형기법에 쓰이는 도구 중이 하나인 VIS(Verification Interacting with Synthesis)를 이용하여 검증한다.

  • PDF