• Title/Summary/Keyword: 캐시메모리

Search Result 242, Processing Time 0.03 seconds

A Study on Tree Transformation from Linked List Tree to Array Tree (연결 리스트 트리의 배열 트리 변환에 관한 연구)

  • Shin, Dongyoung;Park, Joonseok
    • Annual Conference of KIPS
    • /
    • 2011.11a
    • /
    • pp.133-136
    • /
    • 2011
  • 트리의 검색은 어플리케이션에서 흔하게 사용되는 연산중 하나이다. 하지만 대부분의 트리는 연결 리스트 기반으로 생성되며 연결 리스트 트리 구조는 데이터의 지역성을 가지기 힘들기 때문에 트리구조의 검색을 동반한 응용은 캐시메모리 사용효율의 제약으로 인해 성능상의 문제점이 존재한다. 본 논문에서는 연결 리스트 트리를 배열 기반의 트리로 변형하여 트리 검색 시 성능을 향상시킬 수 있는 방법을 제시한다.

Indexing Scheme based on the Cache & Main Memory for RFID tag Tracing (CSTmr-tree) (RFID 태그 추적을 위한 캐시 & 메인 메모리 기반의 색인 기법(CSTmr-tree))

  • Hong, Jin-Suk;Youn, Sung-Dae
    • Annual Conference of KIPS
    • /
    • 2007.05a
    • /
    • pp.24-27
    • /
    • 2007
  • 주기억 색인 기법인 Tmr-트리가 R-트리에 비해서 삽입시간이 오래 걸린다는 단점이 있다. 본 논문은 L2 캐시를 최대한 활용하여 기존 Tmr-트리의 장점을 가지는 새로운 CSTmr-트리(Cache Sensitive Tmr-트리)구조를 제안하고, 이 구조에 삽입, 삭제 등의 알고리즘을 제안하였다. 제안한 구조와 알고리즘을 다른 인덱스 구조와 비교하여 CSTmr-트리의 우수성을 보인다.

Bank Level Simulator to Analysis Memory System (메모리 시스템 구조 분석을 위한 시뮬레이터)

  • Kang, Dongwoo;Choi, Jongmoo
    • Annual Conference of KIPS
    • /
    • 2014.04a
    • /
    • pp.40-42
    • /
    • 2014
  • 최근의 컴퓨터 시스템은 멀티 코어를 기반으로 병렬성 향상을 추구 하고 있지만 코어의 개수가 증가함에 따라 메모리가 새로운 병목 지점으로 지적되고 있다. 메모리 시스템은 가상 메모리, 물리 메모리, 뱅크 메모리 3계층으로 나눌 수 있으며, 각 계층은 상호연관 관계가 있어서 분석하기에 어려움이 있다. 본 논문에서는 이를 위해 계층 구조를 지원하는 시뮬레이터를 제안한다. 제안하는 시뮬레이터는 총 5개의 구성 요소로 이루어져 있으며, CPU 개수, 캐시 정책, 뱅크 개수등 다양한 설정을 지원한다. 또한 시뮬레이터를 통하여 운영체제 수준의 물리 메모리 관리자가 메모리 접근 지연에 영향이 있음을 보인다.

Image Cache for FPGA-based Real-time Image Warping (FPGA 기반 실시간 영상 워핑을 위한 영상 캐시)

  • Choi, Yong Joon;Ryoo, Jung Rae
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.6
    • /
    • pp.91-100
    • /
    • 2016
  • In FPGA-based real-time image warping systems, image caches are utilized for fast readout of image pixel data and reduction of memory access rate. However, a cache algorithm for a general computer system is not suitable for real-time performance because of time delays from cache misses and on-line computation complexity. In this paper, a simple image cache algorithm is presented for a FPGA-based real-time image warping system. Considering that pixel data access sequence is determined from the 2D coordinate transformation and repeated identically at every image frame, a cache load sequence is off-line programmed to guarantee no cache miss condition, and reduced on-line computation results in a simple cache controller. An overall system structure using a FPGA is presented, and experimental results are provided to show accuracy and validity of the proposed cache algorithm.

A Multi-Level Flash Translation Layer for Large Capacity Solid State Drives

  • Kim, Yong-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.11-18
    • /
    • 2021
  • The flash translation layer(FTL) of SSD maps the logical page number requested from the host to the actual recorded flash memory page number. It is very important to reduce the amount of RAM used to manage the mapping information. In the existing demand-based FTLs, two-level method is applied in which mapping information is also recorded in flash memory pages and only their addresses are managed as a table in RAM. As the capacities of SSDs are growing to tens of terabytes, the amount of RAM for mapping table becomes too large. In this paper, ML-FTL was proposed as a method of managing mapping information in three levels to reduce the amount of RAM required drastically. From an evaluation, the increase in overhead was minimal compared to the conventional two-level method by properly utilizing cache.

Using Cache Access History for Reducing False Conflicts in Signature-Based Eager Hardware Transactional Memory (시그니처 기반 이거 하드웨어 트랜잭셔널 메모리에서의 캐시 접근 이력을 이용한 거짓 충돌 감소)

  • Kang, Jinku;Lee, Inhwan
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.442-450
    • /
    • 2015
  • This paper proposes a method for reducing false conflicts in signature-based eager hardware transactional memory (HTM). The method tracks the information on all cache blocks that are accessed by a transaction. If the information provides evidence that there are no conflicts for a given transactional request from another core, the method prevents the occurrence of a false conflict by forcing the HTM to ignore the decision based on the signature. The method is very effective in reducing false conflicts and the associated unnecessary transaction stalls and aborts, and can be used to improve the performance of the multicore processor that implements the signature-based eager HTM. When running the STAMP benchmark on a 16-core processor that implements the LogTM-SE, the increased speed (decrease in execution time) achieved with the use of the method is 20.6% on average.

The Need of Cache Partitioning on Shared Cache of Integrated Graphics Processor between CPU and GPU (내장형 GPU 환경에서 CPU-GPU 간의 공유 캐시에서의 캐시 분할 방식의 필요성)

  • Sung, Hanul;Eom, Hyeonsang;Yeom, HeonYoung
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.507-512
    • /
    • 2014
  • Recently, Distributed computing processing begins using both CPU(Central processing unit) and GPU(Graphic processing unit) to improve the performance to overcome darksilicon problem which cannot use all of the transistors because of the electric power limitation. There is an integrated graphics processor that CPU and GPU share memory and Last level cache(LLC). But, There is no LLC access rules between CPU and GPU, so if GPU and CPU processes run together at the same time, performance of both processes gets worse because of the contention on the LLC. This Paper gives evidence to prove the need of the Cache Partitioning and is mentioned about the cache partitioning design using page coloring to allocate the L3 Cache space only for the GPU process to guarantee GPU process performance.

A Buffer Cache Replacement Algorithm for Considering both Hybrid Main Memory and Storage (하이브리드 메인 메모리와 스토리지의 특성을 고려한 버퍼 캐시 교체 정책)

  • Kang, Dong Hyun;Eom, Young Ik
    • Journal of KIISE
    • /
    • v.42 no.8
    • /
    • pp.947-953
    • /
    • 2015
  • PRAM is being considered as a potential successor to DRAM because of its characteristics such as byte-addressability, non-volatility, and high density. To gain its benefits, buffer cache replacement algorithm based on PRAM has been actively studied. However, most of the previous studies on buffer cache replacement algorithm limitedly exploit the byte-level performance of PRAM by focusing its limited lifetime and slower access latency compared to DRAM. In this paper, we propose a novel buffer cache replacement algorithm that fully considers the byte-level performance of PRAM and the performance of secondary storage. To take advantage of small size write on PRAM, proposed scheme keeps pages, which are frequently accessed with a small size write, on PRAM and allows the selective page migration from DRAM to PRAM. As a result, our scheme significantly reduces the number of PRAM writes. Our experimental results indicate for real workloads that our scheme reduces the number of PRAM writes by up to 92% and improves its performance by up to 62% compared to CLOCK.

Proposal of 3D Graphic Processor Using Multi-Access Memory System (Multi-Access Memory System을 이용한 3D 그래픽 프로세서 제안)

  • Lee, S-Ra-El;Kim, Jae-Hee;Ko, Kyung-Sik;Park, Jong-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.4
    • /
    • pp.119-128
    • /
    • 2019
  • Due to the nature of the 3D graphics processor system, many mathematical calculations are required and parallel processing research using GPU (Graphics Processing Unit) is being performed for high-speed processing. In this paper, we propose a 3D graphics processor using MAMS, a parallel processor that does not use cache memory, to solve the GPU problem of increasing bandwidth caused by cache memory miss and the problem that 3D shader processing speed is not constant. The 3D graphics processor using MAMS proposed in this paper designed Vertex shader, Pixel shader, Tiling and Rasterizing structure using DirectX command analysis, the FPGA(Xilinx Virtex6@100MHz) board for MAMS was constructed and designed using Verilog. We compared the processing time of the developed FPGA (100Mhz) and nVidia GeForce GTX 660 (980Mhz), the processing time using GTX 660 was not constant and suing MAMS was constant.

Design of the Compression Algorithm for in-Memory Data of the Virtual Memory (가상 메모리 압축을 위한 CAMD 알고리즘 설계)

  • Jang, Seung-Ju
    • The KIPS Transactions:PartA
    • /
    • v.11A no.3
    • /
    • pp.157-162
    • /
    • 2004
  • This paper suggests the CAMD(Compression Algorithm for in-Memory Data) algorithm that is not moved the pages into the swap space by assigning the compressed cache area in the main memory. The CAMD algorithm that supports the virtual memory system takes high memory usability and performance benefit by reducing the page fault. The memory data is not general data. It is extraordinary data format. In general it consists of specific form of data. Therefore. the CAMD algorithm can compress this data efficiently.