• Title/Summary/Keyword: memory access

Search Result 1,131, Processing Time 0.026 seconds

Regular File Access of Embedded System Using Flash Memory as a Storage (플래시 메모리를 저장매체로 사용하는 임베디드 시스템에서의 정규파일 접근)

  • 이은주;박현주
    • Journal of Information Technology Applications and Management
    • /
    • v.11 no.1
    • /
    • pp.189-200
    • /
    • 2004
  • Recently Flash Memory which is small and low-powered is widely used as a storage of embedded system, because an embedded system requests portability and a fast response. To resolve a difference of access time between a storage and RAM, Linux is using disk caching which copies a part of file on disk into RAM. It is not also an exception on embedded system. A READ access-time of flash memory is similar to RAMs. So, when a process on an embedded system reads data, it is similar to the time to access cached data in RAM and to access directly data on a flash memory. On the embedded system using limited memory, using a disk cache is that wastes much time and memory spaces to manage it and can not reflects the characteristic of a flash memory. This paper proposes the regular file access of limited using a page cache in the file system based on a flash memory and reflects the characteristic of a flash memory. The proposed algorithm minimizes power consumption because access numbers of the RAM are reduced and doesn't waste a memory space because it accesses directly to a flash memory Therefore, the performance improvement of the system applying the proposed algorithm is expected.

  • PDF

Algorithmic GPGPU Memory Optimization

  • Jang, Byunghyun;Choi, Minsu;Kim, Kyung Ki
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.391-406
    • /
    • 2014
  • The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

Distributed memory access architecture and control for fully disaggregated datacenter network

  • Kyeong-Eun Han;Ji Wook Youn;Jongtae Song;Dae-Ub Kim;Joon Ki Lee
    • ETRI Journal
    • /
    • v.44 no.6
    • /
    • pp.1020-1033
    • /
    • 2022
  • In this paper, we propose novel disaggregated memory module (dMM) architecture and memory access control schemes to solve the collision and contention problems of memory disaggregation, reducing the average memory access time to less than 1 ㎲. In the schemes, the distributed scheduler in each dMM determines the order of memory read/write access based on delay-sensitive priority requests in the disaggregated memory access frame (dMAF). We used the memory-intensive first (MIF) algorithm and priority-based MIF (p-MIF) algorithm that prioritize delay-sensitive and/or memory-intensive (MI) traffic over CPU-intensive (CI) traffic. We evaluated the performance of the proposed schemes through simulation using OPNET and hardware implementation. Our results showed that when the offered load was below 0.7 and the payload of dMAF was 256 bytes, the average round trip time (RTT) was the lowest, ~0.676 ㎲. The dMM scheduling algorithms, MIF and p-MIF, achieved delay less than 1 ㎲ for all MI traffic with less than 10% of transmission overhead.

Considering Read and Write Characteristics of Page Access Separately for Efficient Memory Management

  • Hyokyung Bahn
    • International journal of advanced smart convergence
    • /
    • v.12 no.1
    • /
    • pp.70-75
    • /
    • 2023
  • With the recent proliferation of memory-intensive workloads such as deep learning, analyzing memory access characteristics for efficient memory management is becoming increasingly important. Since read and write operations in memory access have different characteristics, an efficient memory management policy should take into accountthe characteristics of thesetwo operationsseparately. Although some previous studies have considered the different characteristics of reads and writes, they require a modified hardware architecture supporting read bits and write bits. Unlike previous approaches, we propose a software-based management policy under the existing memory architecture for considering read/write characteristics. The proposed policy logically partitions memory space into the read/write area and the write area by making use of reference bits and dirty bits provided in modern paging systems. Simulation experiments with memory access traces show that our approach performs better than the CLOCK algorithm by 23% on average, and the effect is similar to the previous policy with hardware support.

A 3D Memory System Allowing Multi-Access (다중접근을 허용하는 3차원 메모리 시스템)

  • 이형
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.9
    • /
    • pp.457-464
    • /
    • 2005
  • In this paper a 3D memory system that allows 17 access types at an arbitrary position is introduced. The proposed memory system is based on two main functions: memory module assignment function and address assignment function. Based on them, the memory system supports 17 access types: 13 Lines, 3 Rectangles, and 1 Hexahedron. That is, the memory system allows simultaneous access to multiple data in any access types at an arbitrary position with a constant interval. In order to allow 17 access types the memory system consists of memory module selection circuitry, data routing circuitry for READ/WRITE, and address calculation/routing circuitry In the point of view of a developer and a programmer, the memory system proposed in this paper supports easy hardware extension according to the applications and both of them to deal with it as a logical three-dimensional away In addition, multiple data in various across types can be simultaneously accessed with a constant interval. Therefore, the memory system is suitable for building systems related to ,3D applications (e.g. volume rendering and volume clipping) and a frame buffer for multi-resolution.

A Technique for Improving the Performance of Cache Memories

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.3
    • /
    • pp.104-108
    • /
    • 2021
  • In order to improve performance in IoT, edge computing system, a memory is usually configured in a hierarchical structure. Based on the distance from CPU, the access speed slows down in the order of registers, cache memory, main memory, and storage. Similar to the change in performance, energy consumption also increases as the distance from the CPU increases. Therefore, it is important to develop a technique that places frequently used data to the upper memory as much as possible to improve performance and energy consumption. However, the technique should solve the problem of cache performance degradation caused by lack of spatial locality that occurs when the data access stride is large. This study proposes a technique to selectively place data with large data access stride to a software-controlled cache. By using the proposed technique, data spatial locality can be improved by reducing the data access interval, and consequently, the cache performance can be improved.

SDRAM Fast Accession By DMA (Direct Memory Access) (DMA(Direct Memory Access)을 이용한 SDRAM의 고속 인터페이스)

  • Kim, Jin-Wan;Cho, Hyun-Mook
    • Journal of IKEEE
    • /
    • v.10 no.1 s.18
    • /
    • pp.22-29
    • /
    • 2006
  • In this paper, we present the efficient way of SDRAM accessing through the DMA(Direct Memory Access) when a microprocessor and peripheral blocks are sharing a SDRAM. The microprocessor is able to access a memory through the AMBA which is the system bus provided by ARM Corporation and DMAs are able to access a memory through their own bus. Peripheral block's reading and writing on the SDRAM memory are realized by the intermediate DMA in order to minimize times of access and addressing the memory. While the microprocessor doesn‘t access to the SDRAM aproaching other registers or occurring a hit signal for fetching program or data, the DMAs may read/write the data in the SDRAM without an interference of the AMBA. This way increases the efficient of the system and performance is more by 16.8%.

  • PDF

Performance Improvement Method of Multi-Port Memory Controller Using An Effective Multi-Channel Direct memory Access Management (효과적인 다채널 직접 메모리 접근 관리를 통한 멀티포트 메모리 컨트롤러의 성능 향상 방법)

  • Chun, Ik-Jae;Lyuh, Chun-Gi;Roh, Tae Moon;Lee, Moon-Sik
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.4
    • /
    • pp.33-41
    • /
    • 2014
  • This paper presents an effective memory access method for a high-speed data transfer on mobile systems using a direct memory access controller that considers the characteristics of a multi-port memory controller. The direct memory access controller has an integrated channel management function to control multiple direct memory access channels. The channels are physically separated and operate independently from each other. Experimental results show that the proposed direct memory access method improves the transfer performance by up to 72% and 69% on read and write transfer cycles, respectively. The total number of transfer cycles of the proposed method is 63% less than in a commercial method under 4-channel access.

Technology of MRAM (Magneto-resistive Random Access Memory) Using MTJ(Magnetic Tunnel Junction) Cell

  • Park, Wanjun;Song, I-Hun;Park, Sangjin;Kim, Teawan
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.2 no.3
    • /
    • pp.197-204
    • /
    • 2002
  • DRAM, SRAM, and FLASH memory are three major memory devices currently used in most electronic applications. But, they have very distinct attributes, therefore, each memory could be used only for limited applications. MRAM (Magneto-resistive Random Access Memory) is a promising candidate for a universal memory that meets all application needs with non-volatile, fast operational speed, and low power consumption. The simplest architecture of MRAM cell is a series of MTJ (Magnetic Tunnel Junction) as a data storage part and MOS transistor as a data selection part. To be a commercially competitive memory device, scalability is an important factor as well. This paper is testing the actual electrical parameters and the scaling factors to limit MRAM technology in the semiconductor based memory device by an actual integration of MRAM core cell. Electrical tuning of MOS/MTJ, and control of resistance are important factors for data sensing, and control of magnetic switching for data writing.

Implementation of Integrated CPU-GPU for Efficient Uniform Memory Access Method and Verification System (CPU-GPU간 긴밀성을 위한 효율적인 공유메모리 접근 방법과 검증 시스템 구현)

  • Park, Hyun-moon;Kwon, Jinsan;Hwang, Tae-ho;Kim, Dong-Sun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.2
    • /
    • pp.57-65
    • /
    • 2016
  • In this paper, we propose a system for efficient use of shared memory between CPU and GPU. The system, called Fusion Architecture, assures consistency of the shared memory and minimizes cache misses that frequently occurs on Heterogeneous System Architecture or Unified Virtual Memory based systems. It also maximizes the performance for memory intensive jobs by efficient allocation of GPU cores. To test between architectures on various scenarios, we introduce the Fusion Architecture Analyzer, which compares OpenMP, OpenCL, CUDA, and the proposed architecture in terms of memory overhead and process time. As a result, Proposed fusion architectures show that the Fusion Architecture runs benchmarks 55% faster and reduces memory overheads by 220% in average.