• Title/Summary/Keyword: 캐시메모리

Search Result 242, Processing Time 0.028 seconds

Design of an Asynchronous Data Cache with FIFO Buffer for Write Back Mode (Write Back 모드용 FIFO 버퍼 기능을 갖는 비동기식 데이터 캐시)

  • Park, Jong-Min;Kim, Seok-Man;Oh, Myeong-Hoon;Cho, Kyoung-Rok
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.6
    • /
    • pp.72-79
    • /
    • 2010
  • In this paper, we propose the data cache architecture with a write buffer for a 32bit asynchronous embedded processor. The data cache consists of CAM and data memory. It accelerates data up lood cycle between the processor and the main memory that improves processor performance. The proposed data cache has 8 KB cache memory. The cache uses the 4-way set associative mapping with line size of 4 words (16 bytes) and pseudo LRU replacement algorithm for data replacement in the memory. Dirty register and write buffer is used for write policy of the cache. The designed data cache is synthesized to a gate level design using $0.13-{\mu}m$ process. Its average hit rate is 94%. And the system performance has been improved by 46.53%. The proposed data cache with write buffer is very suitable for a 32-bit asynchronous processor.

Dynamic Prefetch Filtering Schemes to Enhance Utilization of Data Cache (데이터 캐시의 활용도를 높이는 동적 선인출 필터링 기법)

  • 전영숙;이병권;김석일;전중남
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.562-564
    • /
    • 2004
  • 캐시 선인출 기법은 메모리 참조에 따른 지연시간을 줄이는 효과적인 방법이다. 그러나 너무 적극적인 선인출은 캐시 오염을 유발시켜 선인출에 의한 장점을 상쇄시킨다. 본 연구에서는 캐시의 오염을 줄이기 위해 동적으로 필터 테이블을 참조하여 선인출 명령을 수행할 지의 여부를 결정하는 4가지 필터링 방법들을 비교 평가한다. 비교 연구를 위한 이상적인 필터링 구조를 제안하였으며, 기존 연구에서의 잠김 현상을 개선하기 위한 이진 상태 구조를 제안하였다. 또한, 정교한 필터링을 위한 블록주소 참조 방식을 제안하였다. 일반적으로 많이 사용되는 일반 벤치마크 프로그램과 멀티미디어 벤치마크 프로그램들에 대하여 실험한 결과, 캐시 미스율이 이진 상태 구조는 평균 5.6%, 블록주소 참조 구조는 7.9% 각각 감소하였다.

  • PDF

A Enhanced Set-Associative Page Cache Scheme using Pollute Buffer (오염 버퍼를 적용한 집합 연상 페이지 캐시 기법)

  • An, Deukhyeon;Kim, Jeehong;Eom, Young Ik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.241-242
    • /
    • 2012
  • 큰 데이터 트래픽을 일으키는 I/O 작업을 수행할 경우에 많은 디스크 접근과 데이터 처리가 발생하며 이는 컴퓨팅 성능의 하락을 일으킨다. 이를 위해 메모리와 디스크 사이에 버퍼 역할을 하는 페이지 캐시 기법이 사용된다. 그러나 LRU 를 사용하는 페이지 캐시의 특성상, 많은 양의 데이터가 한번만 접근되고 다시 사용되지 않는다면 성능상의 큰 효과가 없다. 본 논문에서는 집합 연상 페이지 캐시에 오염 버퍼를 둠으로써, 재사용되지 못하고 페이지 캐시의 크기만 커지는 현상을 최소화시켜 I/O 성능을 개선시킬 수 있는 방법을 제안한다.

A Multimedia Data Prefetching Based on 2 Dimensional Block Structure (이차원 블록 구조에 근거한 선인출 기법)

  • Kim, Seok-Ju
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.8
    • /
    • pp.1086-1096
    • /
    • 2004
  • In case of a multimedia application which deals with streaming data, in terms of cache management, cache loses its efficiency due to weak temporal locality of the data. This means that when data have been brought into cache, much of the data are supposed to be replaced without being accessed again during its service. However, there is a good chance that such multimedia data has a commanding locality in it. In this paper, to take advantage of the memory reference regularity which typically innates even in the multimedia data showing up its weak temporal locality, a method is suggested. The suggested method with the feature of dynamic regular-stride reference prefetching can identify for 2-dimensional array format(block pattern). The suggested method is named as block-reference-prediction-technique (BRPT) since it identifies a block pattern and place an address to be prefetched by the regulation of the block format. BRPT proved to be reassuring to reduce memory reference time significantly for applications having abundant block patterns although new rule has complicated the prefetching system even further.

  • PDF

Design of a Parallel Rendering Processor Architecture with Effective Memory System (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 설계)

  • Park Woo-Chan;Yoon Duk-Ki;Kim Kyoung-Su
    • The KIPS Transactions:PartA
    • /
    • v.13A no.4 s.101
    • /
    • pp.305-316
    • /
    • 2006
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. For the above two goals, effective memory organizations including a new pixel cache architecture are presented. The experimental results show that the proposed architecture achieves almost linear speedup at best case even in sixteen rasterizers.

Compact Field Remapping for Dynamically Allocated Structures (동적으로 할당된 구조체를 위한 압축된 필드 재배치)

  • Kim, Jeong-Eun;Han, Hwan-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.10
    • /
    • pp.1003-1012
    • /
    • 2005
  • The most significant difference of embedded systems from general purpose systems is that embedded systems are allowed to use only limited resources including battery and memory. Especially, the number of applications increases which deal with multimedia data. In those systems with high data computations, the delay of memory access is one of the major bottlenecks hurting the system performance. As a result, many researchers have investigated various techniques to reduce the memory access cost. Most programs generally have locality in memory references. Temporal locality of references means that a resource accessed at one point will be used again in the near future. Spatial locality of references is that likelihood of using a resource gets higher if resources near it were just accessed. The latest embedded processors usually adapt cache memory to exploit these two types of localities. Processors access faster cache memory than off-chip memory, reducing the latency. In this paper we will propose the enhanced dynamic allocation technique for structure-type data in order to eliminate unused memory space and to reduce both the cache miss rate and the application execution time. The proposed approach aggregates fields from multiple records dynamically allocated and consecutively remaps them on the memory space. Experiments on Olden benchmarks show $13.9\%$ L1 cache miss rate drop and $15.9\%$ L2 cache miss drop on average, compared to the previously proposed techniques. We also find execution time reduced by $10.9\%$ on average, compared to the previous work.

A Performance Improvement Scheme for a Wireless Internet Proxy Server Cluster (무선 인터넷 프록시 서버 클러스터 성능 개선)

  • Kwak, Hu-Keun;Chung, Kyu-Sik
    • Journal of KIISE:Information Networking
    • /
    • v.32 no.3
    • /
    • pp.415-426
    • /
    • 2005
  • Wireless internet, which becomes a hot social issue, has limitations due to the following characteristics, as different from wired internet. It has low bandwidth, frequent disconnection, low computing power, and small screen in user terminal. Also, it has technical issues to Improve in terms of user mobility, network protocol, security, and etc. Wireless internet server should be scalable to handle a large scale traffic due to rapidly growing users. In this paper, wireless internet proxy server clusters are used for the wireless Internet because their caching, distillation, and clustering functions are helpful to overcome the above limitations and needs. TranSend was proposed as a clustering based wireless internet proxy server but it has disadvantages; 1) its scalability is difficult to achieve because there is no systematic way to do it and 2) its structure is complex because of the inefficient communication structure among modules. In our former research, we proposed the All-in-one structure which can be scalable in a systematic way but it also has disadvantages; 1) data sharing among cache servers is not allowed and 2) its communication structure among modules is complex. In this paper, we proposed its improved scheme which has an efficient communication structure among modules and allows data to be shared among cache servers. We performed experiments using 16 PCs and experimental results show 54.86$\%$ and 4.70$\%$ performance improvement of the proposed system compared to TranSend and All-in-one system respectively Due to data sharing amount cache servers, the proposed scheme has an advantage of keeping a fixed size of the total cache memory regardless of cache server numbers. On the contrary, in All-in-one, the total cache memory size increases proportional to the number of cache servers since each cache server should keep all cache data, respectively.

Policy for Selective Flushing of Smartphone Buffer Cache using Persistent Memory (영속 메모리를 이용한 스마트폰 버퍼 캐시의 선별적 플러시 정책)

  • Lim, Soojung;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.1
    • /
    • pp.71-76
    • /
    • 2022
  • Buffer cache bridges the performance gap between memory and storage, but its effectiveness is limited due to periodic flush, performed to prevent data loss in smartphones. This paper shows that selective flushing technique with small persistent memory can reduce the flushing overhead of smartphone buffer cache significantly. This is due to our I/O analysis of smartphone applications in that a certain hot data account for most of file writes, while a large proportion of file data incurs single-writes. The proposed selective flushing policy performs flushing to persistent memory for frequently updated data, and storage flushing is performed only for single-write data. This eliminates storage write traffic and also improves the space efficiency of persistent memory. Simulations with popular smartphone application I/O traces show that the proposed policy reduces write traffic to storage by 24.8% on average and up to 37.8%.

A Buffer Replacement Policy using Hot Page Management Scheme for Improving Performance of Flash Memory (플래시 메모리 성능향상을 위한 핫 페이지 관리 기법을 이용한 버퍼교체 정책)

  • Daeyoung Kim;Junghan Kim;Hyun-jin Cho;Young Ik Eom
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.860-863
    • /
    • 2008
  • 플래시 메모리는 우리 생활에 널리 사용되고 있는 휴대용 저장장치 중의 하나이다. 빠른 입출력 속도와 저전력, 무소음, 작은 크기 등의 장점을 가지나 덮어쓰기가 불가능하고 읽기/쓰기의 속도에 비해 소거 연산의 속도가 매우 느리다는 단점이 있다. 이를 보완하기 위해, 호스트와 플래시 메모리 사이에 버퍼 캐시를 두어 사용하고 있으며, 버퍼 캐시에 사용되는 교체 정책에 따라 플래시 메모리 장치의 성능이 크게 영향을 받는다. 본 논문에서는 블록 단위의 LRU 기법의 단점을 개선한 HPLRU 기법을 제안한다. HPLRU 기법은 최근에 자주 참조되었던 페이지인 핫 페이지 들을 모아 리스트를 만들어 관리하고, 이를 통해 페이지 적중률을 향상시키고 다른 페이지들로 인해 핫 페이지들이 소거되는 현상을 개선하였다. 이 알고리즘은 임의 데이터 패턴에 좋은 성능을 보이며 쓰기 발생 횟수를 많이 감소시키는 결과를 보였다.

An Optimal Technic to Utilize Resource on Extended Web Cache Server (확장된 웹 캐시 서버에서 자원이용률 최적화 기법)

  • 김원기;김두상;김성락;구용완
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10e
    • /
    • pp.184-186
    • /
    • 2002
  • 대규모 웹 캐시 서버의 자원 이용도는 네트워크와 디스크 I/O 대기 시간에 주로 의존하고 또한 작업 부하 패턴에 있어 네트웍 사용이 폭주하는 시간과 새벽과 같은 한가한 시간간의 변동성이 심하다. 따라서, 한정된 자원범위에서 최상의 서비스를 제공키위해서는 절정기 동안 자원 이용도를 낮추고 이들 작업부하를 비절정기 때에 나누어 수행토록 함으로써 자원 활용도를 최대로 끌어 올리자는데, 연구의 목적이 있다 이를 위해 비절정기 동안 캐시압축 기법을 이용하여 디스크 입출력 작업을 미래예측 기법은 어느 점에서의 실제 작업 세트가 작았다는 것과 페이지 재사용 패턴의 정확한 예측은 물리적 메모리 크기의 캐시에서 높은 히트율을 생산할 것이라는 점을 보여주었다.

  • PDF