• Title/Summary/Keyword: filter cache

Search Result 23, Processing Time 0.022 seconds

Reuse Information based Thrashing Resistant Cache Management Scheme

  • Sim, Gyu Yeon;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.3
    • /
    • pp.9-16
    • /
    • 2017
  • In recent computing systems, LRU replacement policy has been widely used because it can be simply implemented and applicable to most programs. However, if the working set size of the program is bigger than the actual cache size, LRU replacement policy may occur thrashing problem. Thrashing problem means that cache blocks are consistently replaced without re-referencing in the cache. This paper proposes a new cache management scheme to solve the thrashing problem in the second-level cache. The proposed scheme measures per set reuse frequency using EAF structure to find thrashing sets. When the cache miss occurs, it tests whether the address of the missed block is stored or not. If the address of the missed block is stored, it means that the recently evicted block is re-requested, so the reuse frequency is predicted high. In this case, the corresponding counter of the set is increased. When the counter value is bigger than the threshold value, we assume that the corresponding set shows high reuse frequency. The proposed scheme assigns the set with high reuse frequency to the additional small size cache to keep the blocks in the cache for a long time. Our experimental results show that the proposed scheme improves the IPC by 3.81% on average.

Low Power Architecture of FIR Filter for 2D Image Filter (2D Image Filter에 적합한 저전력 FIR Filter의 구현)

  • Han, Chang-Yeong;Park, Hyeong-Jun;Kim, Lee-Seop
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.9
    • /
    • pp.663-670
    • /
    • 2001
  • This paper proposes a new power reduction method for 2D FIR (Finite Impulse Response) filters. We exploited the spatial redundancy of image data in order to reduce power dissipation in multiplication of FIR filters. Since the higher bits of input pixels are hardly changed, the redundant multiplication of higher bits is avoided by separating multiplication into higher and lower parts. The calculated values of higher bits are stored in memory cells, cache such that they can be reused when a cache hit occurs. Therefore, we can reduce power in 2D FIR Filter modules about 15% by using the proposed separated multiplication Technique (SMT).

  • PDF

Analysis on the Effectiveness of the Filter Buffer for Low Power NAND Flash Memory (저전력 NAND 플래시 메모리를 위한 필터 버퍼의 효율성 분석)

  • Jung, Bo-Sung;Lee, Jung-Hoon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.7 no.4
    • /
    • pp.201-207
    • /
    • 2012
  • Currently, NAND Flash memory has been widely used in consumer storage devices due to its non-volatility, stability, economical feasibility, low power usage, durability, and high density. However, a high capacity of NAND flash memory causes the high power consumption and the low performance. In the convention memory research, a hierarchical filter mechanism can archive an effective performance improvement in terms of the power consumption. In order to attain the best filter structure for NAND flash memory, we selected a direct-mapped filter, a victim filter, a fully associative filter and a 4-way set associative filter for comparison in the performance analysis. According to the results of the simulation, the fully associative filter buffer with a 128byte fetching size can obtain the bet performance compared to another filter structures, and it can reduce the energy*delay product(EDP) by about 93% compared to the conventional NAND Flash memory.

Performance of the Finite Difference Method Using Cache and Shared Memory for Massively Parallel Systems (대규모 병렬 시스템에서 캐시와 공유메모리를 이용한 유한 차분법 성능)

  • Kim, Hyun Kyu;Lee, Hyo Jong
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.4
    • /
    • pp.108-116
    • /
    • 2013
  • Many algorithms have been introduced to improve performance by using massively parallel systems, which consist of several hundreds of processors. A typical example is a GPU system of many processors which uses shared memory. In the case of image filtering algorithms, which make references to neighboring points, the shared memory helps improve performance by frequently accessing adjacent pixels. However, using shared memory requires rewriting the existing codes and consequently results in complexity of the codes. Recent GPU systems support both L1 and L2 cache along with shared memory. Since the L1 cache memory is located in the same area as the shared memory, the improvement of performance is predictable by using the cache memory. In this paper, the performance of cache and shared memory were compared. In conclusion, the performance of cache-based algorithm is very similar to the one of shared memory. The complexity of the code appearing in a shared memory system, however, is resolved with the cache-based algorithm.

An Active Prefetch Filtering Schemes using Exclusive Prefetch Cache (선인출 전용 캐시를 이용한 적극적 선인출 필터링 기법)

  • Chon Young-Suk;Kim Suk-il;Jeon Joong-nam
    • The KIPS Transactions:PartA
    • /
    • v.12A no.1 s.91
    • /
    • pp.41-52
    • /
    • 2005
  • Memory reference instruction caused by cache miss is the critical factor that limits the processing power of processor. Cache prefetching technique is an effective way to reduce the latency due to memory access. However, excessively aggressive prefetch leads to cache pollution and finally to cancel out the advantage of prefetch. In this study, an active prefetch filtering scheme is introduced which dynamically decides whether to commence prefetching after referring a filtering table to reduce the cache pollution due to unnecessary prefetches. For the precision filtering, an evicted address referencing scheme has been proposed where the filter directly compares the current prefetch address with previous unnecessary prefetch addresses stored in filtering table. Moreover, a small sized exclusive prefetch cache has been introduced to increase the amount of eviction of unnecessarily prefetched addresses to enhance the accuracy of dynamic filtering. The exclusive prefetch cache also prevents useful demand data from being pushed out by prefetched data, while the evicted address direct referencing scheme enables the prefetch cache to keep most of useful prefetch data within its small size. Experimental results from commonly used general and multimedia benchmarks show that the average cache miss ratio has been decreased by $13.3{\%}$ by virtue of enhanced filtering accuracy compared with conventional schemes.

Dynamic Prefetch Filtering Schemes to enhance Utilization of Data Cache (데이타 캐시의 활용도를 높이는 동적 선인출 필터링 기법)

  • Chon, Young-Suk;Kim, Suk-Il;Jeon, Joong-Nam
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.1
    • /
    • pp.30-43
    • /
    • 2008
  • Memory reference instructions such as loads or stores are critical factors that limit the processing power of processor. The prefetching technique is an effective way to reduce the latency caused from memory access. However, excessively aggressive prefetch leads to cache pollution so as to cancel out the advantage of prefetch. In this study, four filtering schemes have been compared and evaluated which dynamically decide whether to begin prefetch after referring a filtering table to decrease cache pollution. First, A bi-states scheme has been shown to analyze the lock problem of the conventional scheme, this scheme such as conventional scheme used to be N:1 mapping, but it has the two state to 1bit value of each entries. A complete state scheme has been introduced to be used as a reference for the comparative study. A block address lookup scheme has been proposed as the main idea of this paper which exhibits the most exact filtering performance. This scheme has a length of the table the same as the bi-states scheme, the contents of each entry have the fields the same as the complete state scheme recently, never referenced data block address has been 1:1 mapping a entry of the filter table. Experimental results from commonly used general benchmarks and multimedia programs show that average cache miss ratio have been decreased by 10.5% for the block address lookup scheme(BAL) compare to conventional dynamic filter scheme(2-bitSC).

Efficient Hybrid Transactional Memory Scheme using Near-optimal Retry Computation and Sophisticated Memory Management in Multi-core Environment

  • Jang, Yeon-Woo;Kang, Moon-Hwan;Chang, Jae-Woo
    • Journal of Information Processing Systems
    • /
    • v.14 no.2
    • /
    • pp.499-509
    • /
    • 2018
  • Recently, hybrid transactional memory (HyTM) has gained much interest from researchers because it combines the advantages of hardware transactional memory (HTM) and software transactional memory (STM). To provide the concurrency control of transactions, the existing HyTM-based studies use a bloom filter. However, they fail to overcome the typical false positive errors of a bloom filter. Though the existing studies use a global lock, the efficiency of global lock-based memory allocation is significantly low in multi-core environment. In this paper, we propose an efficient hybrid transactional memory scheme using near-optimal retry computation and sophisticated memory management in order to efficiently process transactions in multi-core environment. First, we propose a near-optimal retry computation algorithm that provides an efficient HTM configuration using machine learning algorithms, according to the characteristic of a given workload. Second, we provide an efficient concurrency control for transactions in different environments by using a sophisticated bloom filter. Third, we propose a memory management scheme being optimized for the CPU cache line, in order to provide a fast transaction processing. Finally, it is shown from our performance evaluation that our HyTM scheme achieves up to 2.5 times better performance by using the Stanford transactional applications for multi-processing (STAMP) benchmarks than the state-of-the-art algorithms.

Design of a High-Speed RFID Filtering Engine and Cache Based Improvement (고속 RFID 필터링 엔진의 설계와 캐쉬 기반 성능 향상)

  • Park Hyun-Sung;Kim Jong-Deok
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.5A
    • /
    • pp.517-525
    • /
    • 2006
  • In this paper, we present a high-speed RFID data filtering engine designed to carry out filtering under the conditions of massive data and massive filters. We discovered that the high-speed RFID data filtering technique is very similar to the high-speed packet classification technique which is used in high-speed routers and firewall systems. Actually, our filtering engine is designed based on existing packet classification algorithms, Bit Parallelism and Aggregated Bit Vector(ABV). In addition, we also discovered that there are strong temporal relations and redundancy in the RFID data filtering operations. We incorporated two kinds of caches, tag and filter caches, to make use of this characteristic to improve the efficiency of the filtering engine. The performance of the proposed engine has been examined by implementing a prototype system and testing it. Compared to the basic sequential filter comparison approach, our engine shows much better performance, and it gets better as the number of filters increases.

A Dynamic Prefetch Filtering Schemes to Enhance Usefulness Of Cache Memory (캐시 메모리의 유용성을 높이는 동적 선인출 필터링 기법)

  • Chon Young-Suk;Lee Byung-Kwon;Lee Chun-Hee;Kim Suk-Il;Jeon Joong-Nam
    • The KIPS Transactions:PartA
    • /
    • v.13A no.2 s.99
    • /
    • pp.123-136
    • /
    • 2006
  • The prefetching technique is an effective way to reduce the latency caused memory access. However, excessively aggressive prefetch not only leads to cache pollution so as to cancel out the benefits of prefetch but also increase bus traffic leading to overall performance degradation. In this thesis, a prefetch filtering scheme is proposed which dynamically decides whether to commence prefetching by referring a filtering table to reduce the cache pollution due to unnecessary prefetches In this thesis, First, prefetch hashing table 1bitSC filtering scheme(PHT1bSC) has been shown to analyze the lock problem of the conventional scheme, this scheme such as conventional scheme used to be N:1 mapping, but it has the two state to 1bit value of each entries. A complete block address table filtering scheme(CBAT) has been introduced to be used as a reference for the comparative study. A prefetch block address lookup table scheme(PBALT) has been proposed as the main idea of this paper which exhibits the most exact filtering performance. This scheme has a length of the table the same as the PHT1bSC scheme, the contents of each entry have the fields the same as CBAT scheme recently, never referenced data block address has been 1:1 mapping a entry of the filter table. On commonly used prefetch schemes and general benchmarks and multimedia programs simulates change cache parameters. The PBALT scheme compared with no filtering has shown enhanced the greatest 22%, the cache miss ratio has been decreased by 7.9% by virtue of enhanced filtering accuracy compared with conventional PHT2bSC. The MADT of the proposed PBALT scheme has been decreased by 6.1% compared with conventional schemes to reduce the total execution time.

A Design of ATM Firewall Switch using Cell Screening (셀 스크리닝 방식에 기반한 ATM Firewall Switch의 설계)

  • Hong, Seung-Seon;Jeong, Tae-Myeong;Park, Mi-Ryong;Lee, Jong-Hyeop
    • The KIPS Transactions:PartC
    • /
    • v.8C no.4
    • /
    • pp.389-396
    • /
    • 2001
  • 기존의 라우터 기반의 패킷 스크리닝 방식은 ATM 네트워크 상에서는 패킷 수준의 스크리닝 기능의 적용을 위하여 SAR(Segmentation And Reassembly) 과정을 필요로 하기 때문에 고속의 셀 처리를 수행하는 ATM Switch의 셀 처리 속도를 저하시킨다는 문제점을 안고 있다. 본 논문에서는 셀 스크리닝 방식에 기반한 병렬 처리 구조의 ATM Firewall Switch를 제안한다. 제안된 Enhanced ATM Firewall Switch는 셀 단위로 분할된 패킷의 1, 2번 셀들에 대한 검사만을 통하여 스크리닝 기능을 수행하기 때문에 셀 단위의 스크리닝 수행이 가능하며, 정책 캐쉬의 도입을 통해 셀 스크리닝 수행속도를 향상하였다. 또한 독립적인 User Cells Filter 기능 블록의 설계를 통하여 병렬 처리 구조의 셀 스크리닝 수행이 가능하도록 구성하여 셀 지연 시간을 최소화하였다.

  • PDF