• Title/Summary/Keyword: memory latency

Search Result 361, Processing Time 0.026 seconds

Design of a Scalable Systolic Synchronous Memory

  • Jeong, Gab-Joong;Kwon, Kyoung-Hwan;Lee, Moon-Key
    • Journal of Electrical Engineering and information Science
    • /
    • v.2 no.4
    • /
    • pp.8-13
    • /
    • 1997
  • This paper describes a scalable systolic synchronous memory for digital signal processing and packet switching. The systolic synchronous memory consists of the 2-D array of small memory blocks which are fully pipelined and communicated in three directions with adjacent blocks. The maximum delay of a small memory block becomes the operation speed of the chip. The array configuration is scalable for the entire memory size requested by an application. it has the initial latency of N+3 cycles with NxN array configuration. We designed an experimental 200 MHz 4Kb static RAM chip with the 4x4 array configuration of 256 SRAM blocks. It was fabricated is 0.8$\mu\textrm{m}$ twin-well single-poly double-metal CMOS technology.

  • PDF

Trends of the CCIX Interconnect and Memory Expansion Technology (CCIX 연결망과 메모리 확장기술 동향)

  • Kim, S.Y.;Ahn, H.Y.;Jun, S.I.;Park, Y.M.;Han, W.J.
    • Electronics and Telecommunications Trends
    • /
    • v.37 no.1
    • /
    • pp.42-52
    • /
    • 2022
  • With the advent of the big data era, the memory capacity required for computing systems is rapidly increasing, especially in High Performance Computing systems. However, the number of DRAMs that can be used in a computing node is limited by the structural limitations of the hardware (for example, CPU specifications). Memory expansion technology has attracted attention as a means of overcoming this limitation. This technology expands the memory capacity by leveraging the external memory connected to the host system through hardware interface such as PCIe and CCIX. In this paper, we present an overview and describe the development trends of the memory expansion technology. We also provide detailed descriptions and use cases of the CCIX that provides higher bandwidth and lower latency than cases of the PCIe.

Effect of Environmental Factors on Depressive-like Behavior and Memory Function in Adolescent Rats

  • Song, Min Kyung;Lee, Jae-Min;Kim, Yoon Ju;Lee, Joo Hee;Kim, Youn-Jung
    • Journal of Korean Biological Nursing Science
    • /
    • v.19 no.4
    • /
    • pp.276-283
    • /
    • 2017
  • Purpose: The aim of this study was to identify the effects of environmental factors on depressive-like behavior and memory function during adolescence. We performed behavior tests in adolescent rats exposed to environmental enrichment, handling, and social deprivation for eight weeks. Methods: Wistar rats were randomly assigned to control, environmental enrichment, handling, and social deprivation groups at the age of four weeks. Results: In the forced swim test, the immobility time in the environmental enrichment group was decreased than that in the control group (p=.038), while the immobility time in the social deprivation group was increased than that in the control group (p=.035), the environmental enrichment group (p<.001), and the handling group (p=.001). In the Morris water maze test, the social deprivation group had an increased latency time than the control group (p=.013) and the environmental enrichment group (p=.001). In the passive avoidance test, the environmental enrichment group had an increased latency time than the control group (p=.005). However, the social deprivation group had reduced latency time than the socially housed groups (control: p=.030; environmental enrichment: p<.001; handling: p<.001). Conclusion: These findings suggest that environmental factors play an important role in emotion and memory function during adolescence.

Remote Cache Replacement Policy using Processor Locality in Multi-Processor System (다중 프로세서 시스템에서 프로세서 지역성을 이용한 원격 캐쉬 교체 정책)

  • Han Sang Yoon;Kwak Jong Wook;Jhang Seong Tae;Jhon Chu Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.541-556
    • /
    • 2005
  • The memory access latency of the system has been a primary factor of performance degradation in single-processor system and multi-processor system. The remote memory access latency takes a lot of overhead over the local memory access latency especially in the distributed shared-memory system. To resolve this problem, the multi-level cache architecture that contains a remote cache in the multi-processor system has been proposed. In this paper, we propose a new cache replacement policy that improves the performance of the multi-processor system with the remote cache. If the multi-level cache keeps the multi-level inclusion(MLI) property and uses the LRU(Least Recently Used) cache replacement policy, the LRU information of the higher-level cache(a processor cache) would be different with that of the lower-level cache(a remote cache). In this situation, the replacement of a remote cache line can induce the exchange of a processor cache line that is used by the processor. It is a main factor of performance degradation in a whole system. To alleviate this disadvantage of the LRU replacement polity, the new policy analyses tht processor's remote memory access pattern of each node and uses this information to reduce the number of invalidations of the useful cache line in the higher-level cache. The new replacement policy of the remote cache can improve the performance by $3.5\%$ in maximum and $2.5\%$ in average on SPLASH-2 benchmarks, compared to the general LRU cache replacement policy.

Technology Trends in CXL Memory and Utilization Software (CXL 메모리 및 활용 소프트웨어 기술 동향 )

  • H.Y. Ahn;S.Y. Kim;Y.M. Park;W.J. Han
    • Electronics and Telecommunications Trends
    • /
    • v.39 no.1
    • /
    • pp.62-73
    • /
    • 2024
  • Artificial intelligence relies on data-driven analysis, and the data processing performance strongly depends on factors such as memory capacity, bandwidth, and latency. Fast and large-capacity memory can be achieved by composing numerous high-performance memory units connected via high-performance interconnects, such as Compute Express Link (CXL). CXL is designed to enable efficient communication between central processing units, memory, accelerators, storage, and other computing resources. By adopting CXL, a composable computing architecture can be implemented, enabling flexible server resource configuration using a pool of computing resources. Thus, manufacturers are actively developing hardware and software solutions to support CXL. We present a survey of the latest software for CXL memory utilization and the most recent CXL memory emulation software. The former supports efficient use of CXL memory, and the latter offers a development environment that allows developers to optimize their software for the hardware architecture before commercial release of CXL memory devices. Furthermore, we review key technologies for improving the performance of both the CXL memory pool and CXL-based composable computing architecture along with various use cases.

Modeling of TLB Miss Rate and Page Fault Rate for Memory Management in Fast Storage Environments (고속 스토리지 환경의 메모리 관리를 위한 TLB 미스율 및 페이지 폴트율 모델링)

  • Park, Yunjoo;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.1
    • /
    • pp.65-70
    • /
    • 2022
  • As fast storage has become popular, the memory management system designed for hard disks needs to be reconsidered. In this paper, we observe that memory access latency is sensitive to the page size when fast storage is adopted. We find the reason from the TLB miss rate, which has the increased impact on the memory access latency in comparison with the page fault rate, and there is trade-off between the TLB miss rate and the page fault rate as the page size is varied. To handle such situations, we model the page fault rate and the TLB miss rate accurately as a function of the page size. Specifically, we show that the power fit and the exponential fit with two terms are appropriate for fitting the TLB miss rate and the page fault rate, respectively. We validate the effectiveness of our model by comparing the estimated values from the model and real values.

Divided Disk Cache and SSD FTL for Improving Performance in Storage

  • Park, Jung Kyu;Lee, Jun-yong;Noh, Sam H.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.1
    • /
    • pp.15-22
    • /
    • 2017
  • Although there are many efficient techniques to minimize the speed gap between processor and the memory, it remains a bottleneck for various commercial implementations. Since secondary memory technologies are much slower than main memory, it is challenging to match memory speed to the processor. Usually, hard disk drives include semiconductor caches to improve their performance. A hit in the disk cache eliminates the mechanical seek time and rotational latency. To further improve performance a divided disk cache, subdivided between metadata and data, has been proposed previously. We propose a new algorithm to apply the SSD that is flash memory-based solid state drive by applying FTL. First, this paper evaluates the performance of such a disk cache via simulations using DiskSim. Then, we perform an experiment to evaluate the performance of the proposed algorithm.

A study of workload consolidation considering NUMA affinity (NUMA affinity를 고려한 Workload Consolidation 연구)

  • Seo, Dongyou;Kim, Shin-gye;Choi, Chanho;Eom, Hyeonsang;Yeom, Heon Y.
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.204-206
    • /
    • 2012
  • SMP(Symmetric Multi-Processing)는 Shared memory bus 를 사용함으로써 scalability 가 제한적이었다. 이런 SMP의 scalability 제한을 극복하기 위해 제안 된 것이 NUMA(Non Uniform Memory Access)이다. NUMA는 memory bus 를 CPU 별 local 하게 가지고 있어 자신이 가지는 memory 영역에 대해서는 다른 영역을 접근하는 것 보다 더 빠른 latency 를 가지는 구조이다. Local 한 memory 영역의 존재는 scalability를 높여 주었지만 서버 가상화 환경에서 VM을 동적으로 scheduling 을 하였을 때 VM의 page 가 실행되는 core 의 local 한 메모리 영역에 존재하지 않게 되면 remote access로 인해 local access보다 성능이 떨어진다. 이 논문에서는 서버 가상화 환경에서 최신 architecture인 AMD bulldozer에서 NUMA affinity가 위반되었을 때 발생하는 성능 저하와 어떤 상황에서 이런 NUMA affinity가 위반되어도 성능저하가 없는지 연구하였다.

Locally weighted linear regression prefetching method for hybrid memory system (하이브리드 메모리 시스템의 지역 가중 선형회귀 프리페치 방법)

  • Tang, Qian;Kim, Jeong-Geun;Kim, Shin-Dug
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.12-15
    • /
    • 2020
  • Data access characteristics can directly affect the efficiency of the system execution. This research is to design an accurate predictor by using historical memory access information, where highly accessible data can be migrated from low-speed storage (SSD/HHD) to high-speed memory (Memory/CPU Cache) in advance, thereby reducing data access latency and further improving overall performance. For this goal, we design a locally weighted linear regression prefetch scheme to cope with irregular access patterns in large graph processing applications for a DARM-PCM hybrid memory structure. By analyzing the testing result, the appropriate structural parameters can be selected, which greatly improves the cache prefetching performance, resulting in overall performance improvement.

Implementation of Light-weight I/O Stack for NVMe-over-Fabrics

  • Ahn, Sungyong
    • International journal of advanced smart convergence
    • /
    • v.9 no.3
    • /
    • pp.253-259
    • /
    • 2020
  • Most of today's large-scale cloud systems and enterprise data centers are distributing resources to improve scalability and resource utilization. NVMe-over-Fabric protocol allows submitting NVMe commands to a remote NVMe SSD through RDMA (Remote Direct Memory Access) network. It is attracting attention recently because it is possible to construct a disaggregation storage system with low latency through the protocol. However, the current I/O stack of NVMe-over-Fabric has an inefficient structure for maintaining compatibility with the traditional I/O stack. Therefore, in this paper, we propose a new mechanism to reduce I/O latency and CPU overhead by modifying I/O path of NVMe-over-Fabric to pass through legacy block layer. According to the performance evaluation results, the proposed mechanism is able to reduce the I/O latency and CPU overhead by up to 22% and 24% compared to the existing NVMe-over-Fabrics protocol, respectively.