• Title/Summary/Keyword: memory access pattern

Search Result 53, Processing Time 0.023 seconds

Algorithmic GPGPU Memory Optimization

  • Jang, Byunghyun;Choi, Minsu;Kim, Kyung Ki
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.391-406
    • /
    • 2014
  • The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

(PMU (Performance Monitoring Unit)-Based Dynamic XIP(eXecute In Place) Technique for Embedded Systems) (내장형 시스템을 위한 PMU (Performance Monitoring Unit) 기반 동적 XIP (eXecute In Place) 기법)

  • Kim, Dohun;Park, Chanik
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.3 no.3
    • /
    • pp.158-166
    • /
    • 2008
  • These days, mobile embedded systems adopt flash memory capable of XIP feature since they can reduce memory usage, power consumption, and software load time. XIP provides direct access to ROM and flash memory for processors. However, using XIP incurs unnecessary degradation of applications' performance because direct access to ROM and flash memory shows more delay than that to main memory. In this paper, we propose a memory management framework, dynamic XIP, which can resolve the performance degradation of using XIP. Using a constrained RAM cache, dynamic XIP can dynamically change XIP region according to page access pattern to reduce performance degradation in execution time or energy consumption resulting from native XIP problem. The proposed framework consists of a page profiler gathering applications' memory access pattern using PMU and an XIP manager deciding that a page is accessed whether in main memory or in flash memory. The proposed framework is implemented and evaluated in Linux kernel. Our evaluation shows that our framework can reduce execution time at most 25% and energy consumption at most 22% compared with using XIP-only case adopted in general mobile embedded systems. Moreover, the evaluation shows that in execution time and energy consumption, our modified LRU algorithm with code page filters can reduce more than at most 90% and 80% respectively compared with applying just existing LRU algorithm to dynamic XIP.

  • PDF

A Technique for Improving the Performance of Cache Memories

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.3
    • /
    • pp.104-108
    • /
    • 2021
  • In order to improve performance in IoT, edge computing system, a memory is usually configured in a hierarchical structure. Based on the distance from CPU, the access speed slows down in the order of registers, cache memory, main memory, and storage. Similar to the change in performance, energy consumption also increases as the distance from the CPU increases. Therefore, it is important to develop a technique that places frequently used data to the upper memory as much as possible to improve performance and energy consumption. However, the technique should solve the problem of cache performance degradation caused by lack of spatial locality that occurs when the data access stride is large. This study proposes a technique to selectively place data with large data access stride to a software-controlled cache. By using the proposed technique, data spatial locality can be improved by reducing the data access interval, and consequently, the cache performance can be improved.

Memory Access Behavior of Embedded Java Virtual Machine in Energy Viewpoint (에너지 관점에서 임베디드 자바가상기계의 메모리 접근 형태)

  • Yang Heejae
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.223-228
    • /
    • 2005
  • Several researchers have pointed out that the energy consumption in memory takes a dominant fraction on the energy budget of a whole embedded system. This applies to the embedded Java virtual machine tn, and to develop a more energy-efficient JVM it is absolutely necessary to optimize the energy usage in Jana memory. In this paper we have analyzed the logical memory access pattern in JVM as it executes numerous number of bytecode instructions while running a Java program. The access pattern gives us an insight how to design and select a suitable memory technology for Java memory. We present the memory access pattern for the three logical data spaces of JVM: heap, operand stack, and local variable array. The result saws that operand stack is accessed most frequently and uniformly, whereas heap used least frequently and non-uniformly among the three. Both heap and local variable array are accessed mostly in read-only fashion, but no remarkable difference is found between read and write operations for operand stack usage.

Prefetching System based on File Access Pattern Applicable to Multimedia Prefetching Scheme (멀티미디어 선반입에 적용 가능한 파일 액세스 패턴 기반의 선반입 시스템)

  • 황보준형;서대화
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.6
    • /
    • pp.489-499
    • /
    • 2001
  • This paper presents the SIC(Size-Interval-Count) prefetching system that can record the file access patterns of applications within a relatively small space of memory based on the repetitiveness of the file access patterns. The SICPS(SIC Prefetching System) is based on knowledge-based prefetching methods which includes high correctness in predicting future accesses of applications. The proposed system then uses the recorded file access patterns, referred to as "SIC access pattern information", to correctly predict the future accesses of the applications. The proposed prefetching system improved the response time by about 40% compared to the general file system and showed remarkable memory efficiency compared to the previously knowledge-based prefetching methods.

  • PDF

File Access Pattern Collection Scheme based on Repetitiveness (반복성을 고려한 파일 액세스 패턴 수집 기법)

  • Hwnag-Bo, Jun-Hyoung;Seok, Seong-U;Seo, Dae-Hwa
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.12
    • /
    • pp.674-684
    • /
    • 2001
  • This paper presents the SIC(Size-Interval-Count) prefetching scheme that can record the file access patterns of applications within a relatively small space of memory based on the repetitiveness of the file access patterns. Several knowledge-based prefetching methods were recently introduced, which includes high correctness in predicting future accesses of applications. They records the access patterns of applications and uses recorded access pattern information to predict which blocks will be requested next. Yet, these methods require to much memory space. Accordingly, the proposed method then uses the recorded file access patterns, referred to as "SIC access pattern information", to correctly predict the future accesses of the applications. The proposed prefetching method improved the response time by about 40% compared to the general file system and showed remarkable memory efficiency compared to the previously knowledge-based prefetching methods.

  • PDF

Development of Memory Controller for Punctuality Guarantee from Memory-Free Inspection Equipment using DDR2 SDRAM (DDR2 SDRAM을 이용한 비메모리 검사장비에서 정시성을 보장하기 위한 메모리 컨트롤러 개발)

  • Jeon, Min-Ho;Shin, Hyun-Jun;Jeong, Seung-Heui;Oh, Chang-Heon
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.6
    • /
    • pp.1104-1110
    • /
    • 2011
  • The conventional semiconductor equipment has adopted SRAM module as the test pattern memory, which has a simple design and does not require refreshing. However, SRAM has its disadvantages as it takes up more space as its capacity becomes larger, making it difficult to meet the requirements of large memories and compact size. if DRAM is adopted as the semiconductor inspection equipment, it takes up less space and costs less than SRAM. However, DRAM is also disadvantageous because it requires the memory cell refresh, which is not suitable for the semiconductor examination equipments that require correct timing. Therefore, In this paper, we will proposed an algorithm for punctuality guarantee of memory-free inspection equipment using DDR2 SDRAM. And we will Developed memory controller using punctuality guarantee algorithm. As the results, show that when we adopt the DDR2 SDRAM, we can get the benefits of saving 13.5 times and 5.3 times in cost and space, respectively, compared to the SRAM.

Memory Access Reduction Scheme for H.264/AVC Decoder Motion Compensation (H.264/AVC 디코더의 움직임 보상을 위한 메모리 접근 감소 기법)

  • Park, Kyoung-Oh;Hong, You-Pyo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.4C
    • /
    • pp.349-354
    • /
    • 2009
  • In this paper, a new motion compensation scheme to reduce external memory access frequency which is one of the major bottlenecks for real-time decoding is proposed. Most H.264/AVC decoders store reference pictures in external memories due to the large size and reference blocks are read into the decoder core as needed during decoding. If the reference data access is done for each reference block in decoding sequence, the memory bandwidth can be unacceptable for real-time decoding. This paper presents a memory access scheme for motion compensation to read as many reference data as possible with reduced memory access frequency by analyzing reference data access pattern for each macroblock. Experimental results show that the proposed motion compensation scheme leads to approximately 30% improvement in memory bandwidth requirement.

Research on the Main Memory Access Count According to the On-Chip Memory Size of an Artificial Neural Network (인공 신경망 가속기 온칩 메모리 크기에 따른 주메모리 접근 횟수 추정에 대한 연구)

  • Cho, Seok-Jae;Park, Sungkyung;Park, Chester Sungchung
    • Journal of IKEEE
    • /
    • v.25 no.1
    • /
    • pp.180-192
    • /
    • 2021
  • One widely used algorithm for image recognition and pattern detection is the convolution neural network (CNN). To efficiently handle convolution operations, which account for the majority of computations in the CNN, we use hardware accelerators to improve the performance of CNN applications. In using these hardware accelerators, the CNN fetches data from the off-chip DRAM, as the massive computational volume of data makes it difficult to derive performance improvements only from memory inside the hardware accelerator. In other words, data communication between off-chip DRAM and memory inside the accelerator has a significant impact on the performance of CNN applications. In this paper, a simulator for the CNN is developed to analyze the main memory or DRAM access with respect to the size of the on-chip memory or global buffer inside the CNN accelerator. For AlexNet, one of the CNN architectures, when simulated with increasing the size of the global buffer, we found that the global buffer of size larger than 100kB has 0.8x as low a DRAM access count as the global buffer of size smaller than 100kB.

Improving Flash Translation Layer for Hybrid Flash-Disk Storage through Sequential Pattern Mining based 2-Level Prefetching Technique (하이브리드 플래시-디스크 저장장치용 Flash Translation Layer의 성능 개선을 위한 순차패턴 마이닝 기반 2단계 프리패칭 기법)

  • Chang, Jae-Young;Yoon, Un-Keum;Kim, Han-Joon
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.4
    • /
    • pp.101-121
    • /
    • 2010
  • This paper presents an intelligent prefetching technique that significantly improves performance of hybrid fash-disk storage, a combination of flash memory and hard disk. Since flash memory embedded in a hybrid device is much faster than hard disk in terms of I/O operations, it can be utilized as a 'cache' space to improve system performance. The basic strategy for prefetching is to utilize sequential pattern mining, with which we can extract the access patterns of objects from historical access sequences. We use two techniques for enhancing the performance of hybrid storage with prefetching. One of them is to modify a FAST algorithm for mapping the flash memory. The other is to extend the unit of prefetching to a block level as well as a file level for effectively utilizing flash memory space. For evaluating the proposed technique, we perform the experiments using the synthetic data and real UCC data, and prove the usability of our technique.