Search | Korea Science

Remote Cache Replacement Policy using Processor Locality in Multi-Processor System (다중 프로세서 시스템에서 프로세서 지역성을 이용한 원격 캐쉬 교체 정책)

Han Sang Yoon;Kwak Jong Wook;Jhang Seong Tae;Jhon Chu Shik
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.541-556
- /
- 2005
The memory access latency of the system has been a primary factor of performance degradation in single-processor system and multi-processor system. The remote memory access latency takes a lot of overhead over the local memory access latency especially in the distributed shared-memory system. To resolve this problem, the multi-level cache architecture that contains a remote cache in the multi-processor system has been proposed. In this paper, we propose a new cache replacement policy that improves the performance of the multi-processor system with the remote cache. If the multi-level cache keeps the multi-level inclusion(MLI) property and uses the LRU(Least Recently Used) cache replacement policy, the LRU information of the higher-level cache(a processor cache) would be different with that of the lower-level cache(a remote cache). In this situation, the replacement of a remote cache line can induce the exchange of a processor cache line that is used by the processor. It is a main factor of performance degradation in a whole system. To alleviate this disadvantage of the LRU replacement polity, the new policy analyses tht processor's remote memory access pattern of each node and uses this information to reduce the number of invalidations of the useful cache line in the higher-level cache. The new replacement policy of the remote cache can improve the performance by $3.5\%$ in maximum and $2.5\%$ in average on SPLASH-2 benchmarks, compared to the general LRU cache replacement policy.
PDF KSCI

CacheSCDefender: VMM-based Comprehensive Framework against Cache-based Side-channel Attacks

Yang, Chao;Guo, Yunfei;Hu, Hongchao;Liu, Wenyan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.12 no.12
- /
- pp.6098-6122
- /
- 2018
Cache-based side-channel attacks have achieved more attention along with the development of cloud computing technologies. However, current host-based mitigation methods either provide bad compatibility with current cloud infrastructure, or turn out too application-specific. Besides, they are defending blindly without any knowledge of on-going attacks. In this work, we present CacheSCDefender, a framework that provides a (Virtual Machine Monitor) VMM-based comprehensive defense framework against all levels of cache attacks. In designing CacheSCDefender, we make three key contributions: (1) an attack-aware framework combining our novel dynamic remapping and traditional cache cleansing, which provides a comprehensive defense against all three cases of cache attacks that we identify in this paper; (2) a new defense method called dynamic remapping which is a developed version of random permutation and is able to deal with two cases of cache attacks; (3) formalization and quantification of security improvement and performance overhead of our defense, which can be applicable to other defense methods. We show that CacheSCDefender is practical for deployment in normal virtualized environment, while providing favorable security guarantee for virtual machines.
https://doi.org/10.3837/tiis.2018.12.026 인용 PDF KSCI

Exploiting Memory Sequence Analysis to Defense Wear-out Attack for Non-Volatile Memory (동작 분석을 통한 비휘발성 메모리에 대한 Wear-out 공격 방지 기법)

Choi, Juhee
- Journal of the Semiconductor & Display Technology
- /
- v.21 no.4
- /
- pp.86-91
- /
- 2022
Cache bypassing is a scheme to prevent unnecessary cache blocks from occupying the capacity of the cache for avoiding cache contamination. This method is introduced to alleviate the problems of non-volatile memories (NVMs)-based memory system. However, the prior works have been studied without considering wear-out attack. Malicious writing to a small area in NVMs leads to the failure of the system due to the limited write endurance of NVMs. This paper proposes a novel scheme to prolong the lifetime with higher resistance for the wear-out attack. First, the memory reference pattern is found by modified reuse distance calculation for each cache block. If a cache block is determined as the target of the attack, it is forwarded to higher level cache or main memory without updating the NVM-based cache. The experimental results show that the write endurance is improved by 14% on average and 36% on maximum.
PDF KSCI

Design of Central Directory Unit for Cache Coherence of Multiprocessor based on Intel486 Microprocessor (Intel486 병렬시스템의 Cache Coherence를 위한 Central Directory Unit의 설계)

You, Jun-Bok;Chung, Tae-Sang
- Proceedings of the KIEE Conference
- /
- 2001.07d
- /
- pp.2684-2686
- /
- 2001
In order to utilize cache in multiprocessor system, cache coherence problem must be handled. Central directory scheme is one of hardware-assisted cache coherence solutions. The goal of this paper was not only to propose some special methods needed to apply central directory scheme to the specific multiprocessor system based on Intel486 microprocessors but also to design central directory unit for cache coherence of the target system. The problems of arbitrating several requests from processors, storing the cache information, and generating control signals for cache line fill and snoop cycle were solved.
PDF

Designing a low-power L1 cache system using aggressive data of frequent reference patterns

Jung, Bo-Sung;Lee, Jung-Hoon
- Journal of the Korea Society of Computer and Information
- /
- v.27 no.7
- /
- pp.9-16
- /
- 2022
Today, with the advent of the 4th industrial revolution, IoT (Internet of Things) systems are advancing rapidly. For this reason, a various application with high-performance and large-capacity are emerging. Therefore, there is a need for low-power and high-performance memory for computing systems with these applications. In this paper, we propose an effective structure for the L1 cache memory, which consumes the most energy in the computing system. The proposed cache system is largely composed of two parts, the L1 main cache and the buffer cache. The main cache is 2 banks, and each bank consists of a 2-way set association. When the L1 cache hits, the data is copied into buffer cache according to the proposed algorithm. According to simulation, the proposed L1 cache system improved the performance of energy delay products by about 65% compared to the existing 4-way set associative cache memory.
https://doi.org/10.9708/jksci.2022.27.07.009 인용 PDF KSCI HTML

DOC： A Distributed Object Caching System for Information Infrastructure (분산 환경에서의 객체 캐슁)

이태희;심준호;이상구
- Proceedings of the CALSEC Conference
- /
- 2003.09a
- /
- pp.249-254
- /
- 2003
Object caching is a desirable feature to improve the both scalability and performance of distributed application systems for information infrastructure, the information management system leveraging the power of network computing. However, in order to exploit such benefits, we claim that the following problems: cache server placement, cache replacement, and cache synchronization, should be considered when designing any object cache system. We are under developing DOC: a Distributed Object Caching, as a part of building our information infrastructures. In this paper, we show how each problem is inter-related, and focus to highlight how we handle cache server deployment problem
PDF

An Implementation of a Memory Operation System Architecture for Memory Latency Penalty Reduction in SIMT Based Stream Processor (Memory Latency Penalty를 개선한 SIMT 기반 Stream Processor의 Memory Operation System Architecture 설계)

Lee, Kwang-Yeob
- Journal of IKEEE
- /
- v.18 no.3
- /
- pp.392-397
- /
- 2014
In this paper, we propose a memory operation system architecture for memory latency penalty reduction in SIMT architecture based stream processor. The proposed architecture applied non-blocking cache architecture to reduce cache miss penalty generated by blocking cache architecture. We verified that the proposed memory operation architecture improve the performance of the stream processor by comparing processing performances of various algorithms. We measured the performance improvement rate that was improved in accordance with the ratio of memory instruction in each algorithm. As a result, we confirmed that the performance of stream processor improves up to minimum 8.2% and maximum 46.5%.
https://doi.org/10.7471/ikeee.2014.18.3.392 인용 PDF KSCI

A Study on Direct Cache-to-Cache Transfer for Hybrid Cache Architecture to Reduce Write Operations (쓰기 횟수 감소를 위한 하이브리드 캐시 구조에서의 캐시간 직접 전송 기법에 대한 연구)

Juhee Choi
- Journal of the Semiconductor & Display Technology
- /
- v.23 no.1
- /
- pp.65-70
- /
- 2024
Direct cache-to-cache transfer has been studied to reduce the latency and bandwidth consumption related to the shared data in multiprocessor system. Even though these studies lead to meaningful results, they assume that caches consist of SRAM. For example, if the system employs the non-volatile memory, the one of the most important parts to consider is to decrease the number of write operations. This paper proposes a hybrid write avoidance cache coherence protocol that considers the hybrid cache architecture. A new state is added to finely control what is stored in the non-volatile memory area, and experimental results showed that the number of writes was reduced by about 36% compared to the existing schemes.
PDF

Design and Implementation of an SCI-Based Network Cache Coherent NUMA System for High-Performance PC Clustering (고성능 PC 클러스터 링을 위한 SCI 기반 Network Cache Coherent NUMA 시스템의 설계 및 구현)

Oh Soo-Cheol;Chung Sang-Hwa
- Journal of KIISE:Computer Systems and Theory
- /
- v.31 no.12
- /
- pp.716-725
- /
- 2004
It is extremely important to minimize network access time in constructing a high-performance PC cluster system. For PC cluster systems, it is possible to reduce network access time by maintaining network cache in each cluster node. This paper presents a Network Cache Coherent NUMA (NCC-NUMA) system to utilize network cache by locating shared memory on the PCI bus, and the NCC-NUMA card which is core module of the NCC-NUMA system is developed. The NCC-NUMA card is directly plugged into the PCI slot of each node, and contains shared memory, network cache, shared memory control module and network control module. The network cache is maintained for the shared memory on the PCI bus of cluster nodes. The coherency mechanism between the network cache and the shared memory is based on the IEEE SCI standard. According to the SPLASH-2 benchmark experiments, the NCC-NUMA system showed improvements of 56% compared with an SCI-based cluster without network cache.
PDF KSCI

Preventing Fast Wear-out of Flash Cache with An Admission Control Policy

Lee, Eunji;Bahn, Hyokyung
- JSTS:Journal of Semiconductor Technology and Science
- /
- v.15 no.5
- /
- pp.546-553
- /
- 2015
Recently, flash cache is widely adopted as the performance accelerator of legacy storage systems. Unlike other cache media, flash cache should be carefully managed as it has peculiar characteristics such as long write latency and limited P/E cycles. In particular, we make two prominent observations that can be utilized in managing flash cache. First, a serious worn-out problem happens when the working-set of a system is beyond the capacity of flash cache due to excessively frequent cache replacement. Second, more than 50% of data has no hit in flash cache as it is a second level cache. Based on these observations, we propose a cache admission control policy that does not cache data when it is first accessed, and inserts it into the cache only after its second access occurs within a certain time window. This allows the filtering of data disruptive to flash cache in terms of endurance and performance. With this policy, we prolong the lifetime of flash cache 2.3 times without any performance degradations.
https://doi.org/10.5573/JSTS.2015.15.5.546 인용 PDF KSCI

Search Result 457, Processing Time 0.019 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)