Search | Korea Science

An Efficient Instruction Prefetching Scheme Based on the Page Access Information (페이지 접근 정보에 기반한 효율적인 명령어 캐쉬 선인출 기법)

Shin Soong-Hyun;Kim Cheol-Hong;Jhon Chu-Shik
- Journal of KIISE:Computer Systems and Theory
- /
- v.33 no.5
- /
- pp.306-315
- /
- 2006
In general, the hit ratio of the first level cache is one of the most important factors in determining the performance of computer systems. Prefetching from lower level memory structure is one of the most useful techniques for improving the hit ratio of the first level cache. In this paper, we propose a prefetch on continuous same page access (CSPA) scheme which improves the prefetch efficiency of the instruction cache and reduces prefetch cost at the same time. The proposed CSPA scheme traces the page addresses of executed instructions to count how many times the same memory page is accessed continuously. To increase the prefetch efficiency, the CSPA scheme initiates prefetch only if the number of accesses to the same page exceeds the threshold value. Generally, the size of a L1 cache block is smaller than that of a L2 cache block. Therefore, one L2 cache block contains a number of L1 cache blocks. To reduce the number of unnecessary accesses to the L2 cache due to prefetch, the CSPA scheme enables prefetch only when the missed L1 block and the prefetch L1 block are in the same L2 cache block, leading to reduced prefetch cost. According to our simulations, the proposed prefetching scheme improves the performance by up to 6.7%.
PDF KSCI

Dynamic Prefetch Filtering Schemes to Enhance Utilization of Data Cache (데이터 캐시의 활용도를 높이는 동적 선인출 필터링 기법)

전영숙;이병권;김석일;전중남
- Proceedings of the Korean Information Science Society Conference
- /
- 2004.10a
- /
- pp.562-564
- /
- 2004
캐시 선인출 기법은 메모리 참조에 따른 지연시간을 줄이는 효과적인 방법이다. 그러나 너무 적극적인 선인출은 캐시 오염을 유발시켜 선인출에 의한 장점을 상쇄시킨다. 본 연구에서는 캐시의 오염을 줄이기 위해 동적으로 필터 테이블을 참조하여 선인출 명령을 수행할 지의 여부를 결정하는 4가지 필터링 방법들을 비교 평가한다. 비교 연구를 위한 이상적인 필터링 구조를 제안하였으며, 기존 연구에서의 잠김 현상을 개선하기 위한 이진 상태 구조를 제안하였다. 또한, 정교한 필터링을 위한 블록주소 참조 방식을 제안하였다. 일반적으로 많이 사용되는 일반 벤치마크 프로그램과 멀티미디어 벤치마크 프로그램들에 대하여 실험한 결과, 캐시 미스율이 이진 상태 구조는 평균 5.6%, 블록주소 참조 구조는 7.9% 각각 감소하였다.
PDF

PMS: Probability-based Multi Successor Prefetch Algorithm for Software Streaming Services of Mobile Embedded Devices (PMS: 모바일 임베디드 시스템의 소프트웨어 스트리밍 서비스를 위한 확률 기반 다중 접근 블록 선인출 알고리즘)

Lee, Young-Jae;Park, Seon-Yeong;Pak, Eun-Jj;Lee, Dae-Woo;Jung, Wook;Kim, Jin-Soo
- Journal of KIISE:Computer Systems and Theory
- /
- v.34 no.5_6
- /
- pp.238-248
- /
- 2007
As the demand of employing various PC software on mobile embedded devices which have limited storages has been increased, software streaming services are needed. However it takes too much time to launch software on them because it is transferred through wireless networks. To address this problem, prefetch algorithms are needed. We examined 'Last successor (LS)' algorithm and PPM-based prefetch algorithm as prefetch algorithms for software streaming services. We present 'Probability-base Multi Successor (PMS)' algorithm which is contrived through analyzing evaluations of previous algorithms and characteristics of software streaming services. While LS has one successor per each block, PMS has N successors based on probability which is calculated by PPM-based prefetch algorithm. The hit rate of PMS is similar to that of PPM-base prefetch algorithm and the space overhead is similar to that of LS. We can get good efficiency at the point of memory usage when PMS is applied to software streaming services.
PDF KSCI

A Block Structured Multimedia Data Prefetching (블록 구조형 멀티미디어 데이터의 선인출)

Kim Suk-Ju;Lee Byung-Kwon;Kim Suk-Il
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.1A
- /
- pp.53-64
- /
- 2004
As to medium data which is involved in the form of streaming for a multimedia application, it characterizes that spatial locality occurs strongly but temporal locality appears even weaker. In this paper, with regard to dynamic prefetching, we suggest a method to make the most of memory reference regularities which typically innate by nature in the multimedia data with strong spatial locality but with weak temporal locality. Especially, the suggested method has a remarkable capability such that it can reduce prefetching errors substantially compared to existing prefetching methods for an application Program which divides an way into small sub-blocks and, plus executes in the unit of sub-block. We carried out experiments to test the suggested method using various MediaBench benchmarks. From the results, we have confirmed that the occurrences of prefetching error decrease effectively than those of existing linear prefetching methods.
PDF KSCI

A Cache Controller to Maximize Effectiveness of Hierarchical Memory Architecture (계층적 메모리 구조의 효과를 극대화하는 캐시 제어기)

Uh Bong Yong;Ju Young Kwan;Cheon Joong Nam;Kim Suk Il
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.608-616
- /
- 2005
A cache architecture is proposed here which evokes prefetch at level 1 cache miss. Existing structures only prefetch at level 2 cache miss. In the proposed cache architecture, level 1 cache miss would select demand fetch block and prefetch block from the level 2 cache and store to level 1 cache and prefetch cache, respectively. According to an experimental analysis using 11 benchmark programs, the hierarchical cache architecture that employs both a level 1 cache prefetcher and a level 2 cache prefetcher obtained a maximum $19\%$ increased performance when compared to the cache architecture that employs only a level 2 cache prefetcher.
PDF KSCI

A Multimedia Data Prefetching Based on 2 Dimensional Block Structure (이차원 블록 구조에 근거한 선인출 기법)

Kim, Seok-Ju
- Journal of Korea Multimedia Society
- /
- v.7 no.8
- /
- pp.1086-1096
- /
- 2004
In case of a multimedia application which deals with streaming data, in terms of cache management, cache loses its efficiency due to weak temporal locality of the data. This means that when data have been brought into cache, much of the data are supposed to be replaced without being accessed again during its service. However, there is a good chance that such multimedia data has a commanding locality in it. In this paper, to take advantage of the memory reference regularity which typically innates even in the multimedia data showing up its weak temporal locality, a method is suggested. The suggested method with the feature of dynamic regular-stride reference prefetching can identify for 2-dimensional array format(block pattern). The suggested method is named as block-reference-prediction-technique (BRPT) since it identifies a block pattern and place an address to be prefetched by the regulation of the block format. BRPT proved to be reassuring to reduce memory reference time significantly for applications having abundant block patterns although new rule has complicated the prefetching system even further.
PDF

Data Prefetching Effect of the Stride Merging-Arrays Method (스트라이드 배열 병합 방법의 데이터 선인출 효과)

Jeong, In-Beom;Lee, Jun-Won
- Journal of KIISE:Computer Systems and Theory
- /
- v.26 no.11
- /
- pp.1429-1436
- /
- 1999
데이타들에 대한 선인출 효과를 얻기 위하여 캐쉬 메모리의 캐쉬 블록은 다중 워드로 구성된다. 그러나 선인출된 데이타들이 사용되지 않을 경우 캐쉬 메모리가 낭비되고 따라서 캐쉬 실패율이 증가한다. 데이타 배열 병합 방법은 캐쉬 실패 원인의 하나인 캐쉬 충돌 실패를 감소시키기 위하여 사용되고 있다. 그러나 기존의 배열 병합 방법은 유용하지 못한 데이타들을 캐쉬 블록에 선인출하는 현상을 보인다. 본 논문에서는 이러한 현상을 개선한 스트라이드 배열 병합을 제안한다. 모의시험에서 캐쉬 블록이 다중 워드로 구성된 경우 스트라이드 배열 병합은 캐쉬 충돌 실패를 감소시킬 뿐 만 아니라 유용한 데이타 선인출을 증가 시키므로 캐쉬 성능을 향상시킴을 보여준다. 또한 이렇게 향상된 캐쉬 성능은 프로세서 증가에 따른 확장성 있는 프로그램 성능을 나타낸다.Abstract The cache memory is composed of cache lines with multiple words to achieve the effect of data prefetching. However, if the prefetched data are not used, the spaces of the cache memory are wasted and thus the cache miss rate increases. The data merging-arrays method is used for the sake of the reduction of the cache conflict misses. However, the existing merging-arrays method results in the useless data prefetching. In this paper, a stride merging-arrays method is suggested for improving this phenomenon. Simulation results show that when a cache line is composed of multiple words, the stride merging-arrays method increases the cache performance due to not only the reduction of cache conflict misses but also the useful data prefetching. This enhanced cache performance also represents the more scalable performance of parallel applications according to increasing the number of processors.

A Dynamic Prefetch Filtering Schemes to Enhance Usefulness Of Cache Memory (캐시 메모리의 유용성을 높이는 동적 선인출 필터링 기법)

Chon Young-Suk;Lee Byung-Kwon;Lee Chun-Hee;Kim Suk-Il;Jeon Joong-Nam
- The KIPS Transactions:PartA
- /
- v.13A no.2 s.99
- /
- pp.123-136
- /
- 2006
The prefetching technique is an effective way to reduce the latency caused memory access. However, excessively aggressive prefetch not only leads to cache pollution so as to cancel out the benefits of prefetch but also increase bus traffic leading to overall performance degradation. In this thesis, a prefetch filtering scheme is proposed which dynamically decides whether to commence prefetching by referring a filtering table to reduce the cache pollution due to unnecessary prefetches In this thesis, First, prefetch hashing table 1bitSC filtering scheme(PHT1bSC) has been shown to analyze the lock problem of the conventional scheme, this scheme such as conventional scheme used to be N:1 mapping, but it has the two state to 1bit value of each entries. A complete block address table filtering scheme(CBAT) has been introduced to be used as a reference for the comparative study. A prefetch block address lookup table scheme(PBALT) has been proposed as the main idea of this paper which exhibits the most exact filtering performance. This scheme has a length of the table the same as the PHT1bSC scheme, the contents of each entry have the fields the same as CBAT scheme recently, never referenced data block address has been 1:1 mapping a entry of the filter table. On commonly used prefetch schemes and general benchmarks and multimedia programs simulates change cache parameters. The PBALT scheme compared with no filtering has shown enhanced the greatest 22%, the cache miss ratio has been decreased by 7.9% by virtue of enhanced filtering accuracy compared with conventional PHT2bSC. The MADT of the proposed PBALT scheme has been decreased by 6.1% compared with conventional schemes to reduce the total execution time.
https://doi.org/10.3745/KIPSTA.2006.13A.2.123 인용 PDF KSCI

A Cache Managing Strategy for Fast Media Data Access (미디어 데이터의 빠른 참조를 위한 캐시 운영 전략)

Moon, Hyun-Ju;Kim, Suk-il
- The KIPS Transactions:PartA
- /
- v.11A no.1
- /
- pp.11-20
- /
- 2004
Multimedia data processing in streaming pattern contains high spatial locality and low temporal locality. This paper has proposed a dynamic data prefetching scheme that fully exploits the regularity between memory addresses referred consecutively. Compared to the existing data Prefetching scheme, the Proposed scheme can reduce data Prefetching error when an application divides an way into smaller blocks and processes them block by block. Experimental results on various media benchmark programs show the proposed scheme predicts memory addresses more accurately and results in better performance than existing prefetching schemes.
https://doi.org/10.3745/KIPSTA.2004.11A.1.011 인용 PDF KSCI

Dynamic Prefetch Filtering Schemes to enhance Utilization of Data Cache (데이타 캐시의 활용도를 높이는 동적 선인출 필터링 기법)

Chon, Young-Suk;Kim, Suk-Il;Jeon, Joong-Nam
- Journal of KIISE:Computer Systems and Theory
- /
- v.35 no.1
- /
- pp.30-43
- /
- 2008
Memory reference instructions such as loads or stores are critical factors that limit the processing power of processor. The prefetching technique is an effective way to reduce the latency caused from memory access. However, excessively aggressive prefetch leads to cache pollution so as to cancel out the advantage of prefetch. In this study, four filtering schemes have been compared and evaluated which dynamically decide whether to begin prefetch after referring a filtering table to decrease cache pollution. First, A bi-states scheme has been shown to analyze the lock problem of the conventional scheme, this scheme such as conventional scheme used to be N:1 mapping, but it has the two state to 1bit value of each entries. A complete state scheme has been introduced to be used as a reference for the comparative study. A block address lookup scheme has been proposed as the main idea of this paper which exhibits the most exact filtering performance. This scheme has a length of the table the same as the bi-states scheme, the contents of each entry have the fields the same as the complete state scheme recently, never referenced data block address has been 1:1 mapping a entry of the filter table. Experimental results from commonly used general benchmarks and multimedia programs show that average cache miss ratio have been decreased by 10.5% for the block address lookup scheme(BAL) compare to conventional dynamic filter scheme(2-bitSC).
PDF KSCI

Search Result 14, Processing Time 0.019 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)