• Title/Summary/Keyword: Access to Memory

Search Result 957, Processing Time 0.024 seconds

Regular File Access of Embedded System Using Flash Memory as a Storage (플래시 메모리를 저장매체로 사용하는 임베디드 시스템에서의 정규파일 접근)

  • 이은주;박현주
    • Journal of Information Technology Applications and Management
    • /
    • v.11 no.1
    • /
    • pp.189-200
    • /
    • 2004
  • Recently Flash Memory which is small and low-powered is widely used as a storage of embedded system, because an embedded system requests portability and a fast response. To resolve a difference of access time between a storage and RAM, Linux is using disk caching which copies a part of file on disk into RAM. It is not also an exception on embedded system. A READ access-time of flash memory is similar to RAMs. So, when a process on an embedded system reads data, it is similar to the time to access cached data in RAM and to access directly data on a flash memory. On the embedded system using limited memory, using a disk cache is that wastes much time and memory spaces to manage it and can not reflects the characteristic of a flash memory. This paper proposes the regular file access of limited using a page cache in the file system based on a flash memory and reflects the characteristic of a flash memory. The proposed algorithm minimizes power consumption because access numbers of the RAM are reduced and doesn't waste a memory space because it accesses directly to a flash memory Therefore, the performance improvement of the system applying the proposed algorithm is expected.

  • PDF

Algorithmic GPGPU Memory Optimization

  • Jang, Byunghyun;Choi, Minsu;Kim, Kyung Ki
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.391-406
    • /
    • 2014
  • The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

Distributed memory access architecture and control for fully disaggregated datacenter network

  • Kyeong-Eun Han;Ji Wook Youn;Jongtae Song;Dae-Ub Kim;Joon Ki Lee
    • ETRI Journal
    • /
    • v.44 no.6
    • /
    • pp.1020-1033
    • /
    • 2022
  • In this paper, we propose novel disaggregated memory module (dMM) architecture and memory access control schemes to solve the collision and contention problems of memory disaggregation, reducing the average memory access time to less than 1 ㎲. In the schemes, the distributed scheduler in each dMM determines the order of memory read/write access based on delay-sensitive priority requests in the disaggregated memory access frame (dMAF). We used the memory-intensive first (MIF) algorithm and priority-based MIF (p-MIF) algorithm that prioritize delay-sensitive and/or memory-intensive (MI) traffic over CPU-intensive (CI) traffic. We evaluated the performance of the proposed schemes through simulation using OPNET and hardware implementation. Our results showed that when the offered load was below 0.7 and the payload of dMAF was 256 bytes, the average round trip time (RTT) was the lowest, ~0.676 ㎲. The dMM scheduling algorithms, MIF and p-MIF, achieved delay less than 1 ㎲ for all MI traffic with less than 10% of transmission overhead.

A 3D Memory System Allowing Multi-Access (다중접근을 허용하는 3차원 메모리 시스템)

  • 이형
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.9
    • /
    • pp.457-464
    • /
    • 2005
  • In this paper a 3D memory system that allows 17 access types at an arbitrary position is introduced. The proposed memory system is based on two main functions: memory module assignment function and address assignment function. Based on them, the memory system supports 17 access types: 13 Lines, 3 Rectangles, and 1 Hexahedron. That is, the memory system allows simultaneous access to multiple data in any access types at an arbitrary position with a constant interval. In order to allow 17 access types the memory system consists of memory module selection circuitry, data routing circuitry for READ/WRITE, and address calculation/routing circuitry In the point of view of a developer and a programmer, the memory system proposed in this paper supports easy hardware extension according to the applications and both of them to deal with it as a logical three-dimensional away In addition, multiple data in various across types can be simultaneously accessed with a constant interval. Therefore, the memory system is suitable for building systems related to ,3D applications (e.g. volume rendering and volume clipping) and a frame buffer for multi-resolution.

A Technique for Improving the Performance of Cache Memories

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.3
    • /
    • pp.104-108
    • /
    • 2021
  • In order to improve performance in IoT, edge computing system, a memory is usually configured in a hierarchical structure. Based on the distance from CPU, the access speed slows down in the order of registers, cache memory, main memory, and storage. Similar to the change in performance, energy consumption also increases as the distance from the CPU increases. Therefore, it is important to develop a technique that places frequently used data to the upper memory as much as possible to improve performance and energy consumption. However, the technique should solve the problem of cache performance degradation caused by lack of spatial locality that occurs when the data access stride is large. This study proposes a technique to selectively place data with large data access stride to a software-controlled cache. By using the proposed technique, data spatial locality can be improved by reducing the data access interval, and consequently, the cache performance can be improved.

Considering Read and Write Characteristics of Page Access Separately for Efficient Memory Management

  • Hyokyung Bahn
    • International journal of advanced smart convergence
    • /
    • v.12 no.1
    • /
    • pp.70-75
    • /
    • 2023
  • With the recent proliferation of memory-intensive workloads such as deep learning, analyzing memory access characteristics for efficient memory management is becoming increasingly important. Since read and write operations in memory access have different characteristics, an efficient memory management policy should take into accountthe characteristics of thesetwo operationsseparately. Although some previous studies have considered the different characteristics of reads and writes, they require a modified hardware architecture supporting read bits and write bits. Unlike previous approaches, we propose a software-based management policy under the existing memory architecture for considering read/write characteristics. The proposed policy logically partitions memory space into the read/write area and the write area by making use of reference bits and dirty bits provided in modern paging systems. Simulation experiments with memory access traces show that our approach performs better than the CLOCK algorithm by 23% on average, and the effect is similar to the previous policy with hardware support.

SDRAM Fast Accession By DMA (Direct Memory Access) (DMA(Direct Memory Access)을 이용한 SDRAM의 고속 인터페이스)

  • Kim, Jin-Wan;Cho, Hyun-Mook
    • Journal of IKEEE
    • /
    • v.10 no.1 s.18
    • /
    • pp.22-29
    • /
    • 2006
  • In this paper, we present the efficient way of SDRAM accessing through the DMA(Direct Memory Access) when a microprocessor and peripheral blocks are sharing a SDRAM. The microprocessor is able to access a memory through the AMBA which is the system bus provided by ARM Corporation and DMAs are able to access a memory through their own bus. Peripheral block's reading and writing on the SDRAM memory are realized by the intermediate DMA in order to minimize times of access and addressing the memory. While the microprocessor doesn‘t access to the SDRAM aproaching other registers or occurring a hit signal for fetching program or data, the DMAs may read/write the data in the SDRAM without an interference of the AMBA. This way increases the efficient of the system and performance is more by 16.8%.

  • PDF

Design of Asynchronous Nonvolatile Memory Module using Self-diagnosis Function (자기진단 기능을 이용한 비동기용 불휘발성 메모리 모듈의 설계)

  • Shin, Woohyeon;Yang, Oh;Yeon, Jun Sang
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.1
    • /
    • pp.85-90
    • /
    • 2022
  • In this paper, an asynchronous nonvolatile memory module using a self-diagnosis function was designed. For the system to work, a lot of data must be input/output, and memory that can be stored is required. The volatile memory is fast, but data is erased without power, and the nonvolatile memory is slow, but data can be stored semi-permanently without power. The non-volatile static random-access memory is designed to solve these memory problems. However, the non-volatile static random-access memory is weak external noise or electrical shock, data can be some error. To solve these data errors, self-diagnosis algorithms were applied to non-volatile static random-access memory using error correction code, cyclic redundancy check 32 and data check sum to increase the reliability and accuracy of data retention. In addition, the possibility of application to an asynchronous non-volatile storage system requiring reliability was suggested.

(PMU (Performance Monitoring Unit)-Based Dynamic XIP(eXecute In Place) Technique for Embedded Systems) (내장형 시스템을 위한 PMU (Performance Monitoring Unit) 기반 동적 XIP (eXecute In Place) 기법)

  • Kim, Dohun;Park, Chanik
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.3 no.3
    • /
    • pp.158-166
    • /
    • 2008
  • These days, mobile embedded systems adopt flash memory capable of XIP feature since they can reduce memory usage, power consumption, and software load time. XIP provides direct access to ROM and flash memory for processors. However, using XIP incurs unnecessary degradation of applications' performance because direct access to ROM and flash memory shows more delay than that to main memory. In this paper, we propose a memory management framework, dynamic XIP, which can resolve the performance degradation of using XIP. Using a constrained RAM cache, dynamic XIP can dynamically change XIP region according to page access pattern to reduce performance degradation in execution time or energy consumption resulting from native XIP problem. The proposed framework consists of a page profiler gathering applications' memory access pattern using PMU and an XIP manager deciding that a page is accessed whether in main memory or in flash memory. The proposed framework is implemented and evaluated in Linux kernel. Our evaluation shows that our framework can reduce execution time at most 25% and energy consumption at most 22% compared with using XIP-only case adopted in general mobile embedded systems. Moreover, the evaluation shows that in execution time and energy consumption, our modified LRU algorithm with code page filters can reduce more than at most 90% and 80% respectively compared with applying just existing LRU algorithm to dynamic XIP.

  • PDF

Design to Chip with Multi-Access Memory System and Parallel Processor for 16 Processing Elements of Image Processing Purpose (영상처리용 16개의 처리기를 위한 다중접근기억장치 및 병렬처리기의 칩 설계)

  • Lim, Jae-Ho;Park, Seong-Mi;Park, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.11
    • /
    • pp.1401-1408
    • /
    • 2011
  • This dissertation present a chip with Multi-Access Memory System(MAMS) and parallel processor for 16 Processing Elements of image processing purpose. MAMS is a kind of parallel access memory system and can simultaneously access to random pixel datas with eight types. It is possible to set a interval about pixel datas to access, too. The parallel processor built-in MAMS actually has been realized in 2003 but its performance fell short of a real time process for high-definition images. I designed a improved parallel processing system by means of addition and expansion of Memory Modules and Processing Elements of previous one. It is feasible to perform a Morphological Closing at the speed of 3 times of the previous one and 6 times of serial system.