• Title/Summary/Keyword: L3 캐시

Search Result 8, Processing Time 0.024 seconds

The Need of Cache Partitioning on Shared Cache of Integrated Graphics Processor between CPU and GPU (내장형 GPU 환경에서 CPU-GPU 간의 공유 캐시에서의 캐시 분할 방식의 필요성)

  • Sung, Hanul;Eom, Hyeonsang;Yeom, HeonYoung
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.507-512
    • /
    • 2014
  • Recently, Distributed computing processing begins using both CPU(Central processing unit) and GPU(Graphic processing unit) to improve the performance to overcome darksilicon problem which cannot use all of the transistors because of the electric power limitation. There is an integrated graphics processor that CPU and GPU share memory and Last level cache(LLC). But, There is no LLC access rules between CPU and GPU, so if GPU and CPU processes run together at the same time, performance of both processes gets worse because of the contention on the LLC. This Paper gives evidence to prove the need of the Cache Partitioning and is mentioned about the cache partitioning design using page coloring to allocate the L3 Cache space only for the GPU process to guarantee GPU process performance.

Proxy Cache Replacement Policy reflecting Network Transmission Costs in Web and Multimedia Environments (웹과 멀티미디어 요청이 혼재한 환경에서 네트워크 전송 비용을 고려한 프락시 캐시 교체 정책)

  • 서진모;강지숙;남동훈;박승규
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04a
    • /
    • pp.10-12
    • /
    • 2002
  • 사용자의 요구와 인터넷 어플리케이션의 발달로 규모가 큰 미디어 오브젝트의 수가 급증하고 있다. 따라서 네트워크 전송비용은 반드시 고려해야 하는 중요한 요소이다. 본 논문에서는 기존 프락시 캐시 교체 정책들을 분석하고, 이를 개선한 G-N 및 L-N 정책을 제안한다. 이것은 프락시 캐시 소프트웨어인 'Squid'에서 채택하고있는 GDSF와 LFU-DA 정책에 네트워크 전송 비용을 추가하여 확장한 알고리즘이다. 시뮬레이션을 통하여 기존의 알고리즘과 비교해 본 결과, 평균 응답 시간을 10%이상 감소시킬 수 있었으며, 추가로 드는 비용(Processing Overhead)은 3게 증가하지 아니 하였음을 확인하였다.

  • PDF

Cache Management using a Adaptive Parity Group Configuration in RAID 5 Controller (적응형 패리티 그룹 구성을 이용한 RAID 5 제어기에서의 캐시 운영)

  • Huh, Jung-Ho;Song, Ja-Young;Chang, Tae-Mu
    • The KIPS Transactions:PartA
    • /
    • v.10A no.2
    • /
    • pp.83-92
    • /
    • 2003
  • RAID 5 is a widely-used technique used to construct disk systems of high reliability and performance. This paper proposes APGOC (Adaptive Parity Group On Cache) organization on cache to solve "small write" problem of RAID 5 especially in OLTP (On-Line Transaction Processing System) environments. In our approach, when user process makes a request for a file to kernel, the information on the read/write characteristics is added to the file data structure of the file system. With this information, data and parity cache can be managed interchangeably through parity fetching. Therefore we can enhance the cache utilization and improve the disk request response time. Our method is analyzed and evaluated with a simulation method. Comparing with previous works, we observed about 6~l3% of performance enhancement.hancement.

A Study on Design and Cache Replacement Policy for Cascaded Cache Based on Non-Volatile Memories (비휘발성 메모리 시스템을 위한 저전력 연쇄 캐시 구조 및 최적화된 캐시 교체 정책에 대한 연구)

  • Juhee Choi
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.3
    • /
    • pp.106-111
    • /
    • 2023
  • The importance of load-to-use latency has been highlighted as state-of-the-art computing cores adopt deep pipelines and high clock frequencies. The cascaded cache was recently proposed to reduce the access cycle of the L1 cache by utilizing differences in latencies among banks of the cache structure. However, this study assumes the cache is comprised of SRAM, making it unsuitable for direct application to non-volatile memory-based systems. This paper proposes a novel mechanism and structure for lowering dynamic energy consumption. It inserts monitoring logic to keep track of swap operations and write counts. If the ratio of swap operations to total write counts surpasses a set threshold, the cache controller skips the swap of cache blocks, which leads to reducing write operations. To validate this approach, experiments are conducted on the non-volatile memory-based cascaded cache. The results show a reduction in write operations by an average of 16.7% with a negligible increase in latencies.

  • PDF

Way-set Associative Management for Low Power Hybrid L2 Cache Memory (고성능 저전력 하이브리드 L2 캐시 메모리를 위한 연관사상 집합 관리)

  • Jung, Bo-Sung;Lee, Jung-Hoon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.13 no.3
    • /
    • pp.125-131
    • /
    • 2018
  • STT-RAM is attracting as a next generation Non-volatile memory for replacing cache memory with low leakage energy, high integration and memory access performance similar to SRAM. However, there is problem of write operations as the other Non_volatile memory. Hybrid cache memory using SRAM and STT-RAM is attracting attention as a cache memory structure with lowe power consumption. Despite this, reducing the leakage energy consumption by the STT-RAM is still lacking access to the Dynamic energy. In this paper, we proposed as energy management method such as a way-selection approach for hybrid L2 cache fo SRAM and STT-RAM and memory selection method of write/read operation. According to the simulation results, the proposed hybrid cache memory reduced the average energy consumption by 40% on SPEC CPU 2006, compared with SRAM cache memory.

Accelerating Medical Image Processing on Integrated GPU Using OpenCL (OpenCL을 이용한 내장형 GPU에서의 의학영상처리 가속화)

  • Kim, Beom-Jun;Shin, Byeong-seok
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.2
    • /
    • pp.1-10
    • /
    • 2017
  • A variety of filters are applied to improve the quality of noise and low resolution medical images. This is necessary to reduce the radiation dose of the patient and to improve the utilization of the conventional spherical imaging equipment. In the conventional method, it is common to perform filtering using the CPU of the PC. However, it is difficult to produce results in real time by applying various calculations and filters to high-resolution human images using only the CPU performance of a PC used in a hospital. In this paper, we analyze the structure and performance of Intel integrated GPU in CPU and propose a method to perform image filtering using OpenCL parallel processing function. By applying complex filters with high computational complexity to medical images, high quality images can be generated in real time.

An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU (Coloring이 적용된 Gauss-Seidel 해법을 통한 CPU와 GPU의 연산 효율에 관한 연구)

  • Yoon, Jong Seon;Jeon, Byoung Jin;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.41 no.2
    • /
    • pp.117-124
    • /
    • 2017
  • The performance of the colored Gauss-Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss-Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss-Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.

Performance Enhancement of Handover in mSCTP using Pre-acquisition RA in WLAN (WLAN에서 RA 선수신을 이용한 mSCTP 핸드오버 성능 향상)

  • Choi, Soon-Won;Kim, Kwang-Ryoul;Min, Sung-Gi
    • Journal of KIISE:Information Networking
    • /
    • v.33 no.2
    • /
    • pp.156-164
    • /
    • 2006
  • The SCTP (Stream Control Transmission Protocol) implementation with the DAR (Dynamic Address Reconfiguration) extension is called the mSCTP (Mobile SCTP) that is proposed recently for mobility support in transport layer. The mSCTP does not satisfy short handover latency for real-time applications and it has no specific handover decision mechanisms. In this paper, we propose fast handover schemes for mobile nodes that are moving into different subnet using pre-acquisition RA (Router Advertisement) and L3 trigger for improving handover performance. Furthermore, we introduce three specific methods which are RA cache, FMIPv6 (Fast Handovers for Mobile IPv6) and dual interface and how proposed scheme can be interoperated with handover process respectively. Finally, we show two experimental results which are the mSCTP and the mSCTP using FMIPv6 on Linux platforms. Experimental results show that handover performance is improved with reducing the time of receiving RA which takes most of total handover latency.