• Title/Summary/Keyword: Computing-In-Memory

Search Result 764, Processing Time 0.027 seconds

Hybrid in-memory storage for cloud infrastructure

  • Kim, Dae Won;Kim, Sun Wook;Oh, Soo Cheol
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.57-67
    • /
    • 2021
  • Modern cloud computing is rapidly changing from traditional hypervisor-based virtual machines to container-based cloud-native environments. Due to limitations in I/O performance required for both virtual machines and containers, the use of high-speed storage (SSD, NVMe, etc.) is increasing, and in-memory computing using main memory is also emerging. Running a virtual environment on main memory gives better performance compared to other storage arrays. However, RAM used as main memory is expensive and due to its volatile characteristics, data is lost when the system goes down. Therefore, additional work is required to run the virtual environment in main memory. In this paper, we propose a hybrid in-memory storage that combines a block storage such as a high-speed SSD with main memory to safely operate virtual machines and containers on main memory. In addition, the proposed storage showed 6 times faster write speed and 42 times faster read operation compared to regular disks for virtual machines, and showed the average 12% improvement of container's performance tests.

Gen-Z memory pool system implementation and performance measurement

  • Kwon, Won-ok;Sok, Song-Woo;Park, Chan-ho;Oh, Myeong-Hoon;Hong, Seokbin
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.450-461
    • /
    • 2022
  • The Gen-Z protocol is a memory semantic protocol between the memory and CPU used in computer architectures with large memory pools. This study presents the implementation of the Gen-Z hardware system configured using Gen-Z specification 1.0 and reports its performance. A hardware prototype of a DDR4 Gen-Z memory pool with an optimized character, a block device driver, and a file system for the Gen-Z hardware was designed. The Gen-Z IP was targeted to the FPGA, and a 512 GB Gen-Z memory pool was configured on an ×86 server. In the experiments, the latency and throughput of the Gen-Z memory were measured and compared with those of the local memory, SATA SSD, and NVMe using character or block device interfaces. The Gen-Z hardware exhibited superior throughput and latency performance compared with SATA SSD and NVMe at block sizes under 4 kB. The MySQL and File IO benchmark of Gen-Z showed good write performance in all block sizes and threads. Besides, it showed low latency in RocksDB's fillseq dbbench using the ext4 direct access filesystem.

Technology Trends in CXL Memory and Utilization Software (CXL 메모리 및 활용 소프트웨어 기술 동향 )

  • H.Y. Ahn;S.Y. Kim;Y.M. Park;W.J. Han
    • Electronics and Telecommunications Trends
    • /
    • v.39 no.1
    • /
    • pp.62-73
    • /
    • 2024
  • Artificial intelligence relies on data-driven analysis, and the data processing performance strongly depends on factors such as memory capacity, bandwidth, and latency. Fast and large-capacity memory can be achieved by composing numerous high-performance memory units connected via high-performance interconnects, such as Compute Express Link (CXL). CXL is designed to enable efficient communication between central processing units, memory, accelerators, storage, and other computing resources. By adopting CXL, a composable computing architecture can be implemented, enabling flexible server resource configuration using a pool of computing resources. Thus, manufacturers are actively developing hardware and software solutions to support CXL. We present a survey of the latest software for CXL memory utilization and the most recent CXL memory emulation software. The former supports efficient use of CXL memory, and the latter offers a development environment that allows developers to optimize their software for the hardware architecture before commercial release of CXL memory devices. Furthermore, we review key technologies for improving the performance of both the CXL memory pool and CXL-based composable computing architecture along with various use cases.

QPlayer: Lightweight, scalable, and fast quantum simulator

  • Ki-Sung Jin;Gyu-Il Cha
    • ETRI Journal
    • /
    • v.45 no.2
    • /
    • pp.304-317
    • /
    • 2023
  • With the rapid evolution of quantum computing, digital quantum simulations are essential for quantum algorithm verification, quantum error analysis, and new quantum applications. However, the exponential increase in memory overhead and operation time is challenging issues that have not been solved for years. We propose a novel approach that provides more qubits and faster quantum operations with smaller memory than before. Our method selectively tracks realized quantum states using a reduced quantum state representation scheme instead of loading the entire quantum states into memory. This method dramatically reduces memory space ensuring fast quantum computations without compromising the global quantum states. Furthermore, our empirical evaluation reveals that our proposed idea outperforms traditional methods for various algorithms. We verified that the Grover algorithm supports up to 55 qubits and the surface code algorithm supports up to 85 qubits in 512 GB memory on a single computational node, which is against the previous studies that support only between 35 qubits and 49 qubits.

Analysis Algorithm for Memory BISR as Imagination Zone (가상 구역에 따른 메모리 자가 치유에 대한 분석 알고리즘)

  • Park, Jae-Heung;Shim, Eun-Sung;Chang, Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.12
    • /
    • pp.73-79
    • /
    • 2009
  • With the advance of VLSI technology, the capacity and density of memories are rapidly growing. In this paper we proposed MRI (Memory built-in self Repair Imagination zone) as reallocation algorithm. All faulty cells of embedded memory are reallocated into the row and column spare memory. This work implements reallocation algorithm and BISR to verify its design.

Trends in Compute Express Link(CXL) Technology (CXL 인터커넥트 기술 연구개발 동향)

  • S.Y. Kim;H.Y. Ahn;Y.M. Park;W.J. Han
    • Electronics and Telecommunications Trends
    • /
    • v.38 no.5
    • /
    • pp.23-33
    • /
    • 2023
  • With the widespread demand from data-intensive tasks such as machine learning and large-scale databases, the amount of data processed in modern computing systems is increasing exponentially. Such data-intensive tasks require large amounts of memory to rapidly process and analyze massive data. However, existing computing system architectures face challenges when building large-scale memory owing to various structural issues such as CPU specifications. Moreover, large-scale memory may cause problems including memory overprovisioning. The Compute Express Link (CXL) allows computing nodes to use large amounts of memory while mitigating related problems. Hence, CXL is attracting great attention in industry and academia. We describe the overarching concepts underlying CXL and explore recent research trends in this technology.

Trends of the CCIX Interconnect and Memory Expansion Technology (CCIX 연결망과 메모리 확장기술 동향)

  • Kim, S.Y.;Ahn, H.Y.;Jun, S.I.;Park, Y.M.;Han, W.J.
    • Electronics and Telecommunications Trends
    • /
    • v.37 no.1
    • /
    • pp.42-52
    • /
    • 2022
  • With the advent of the big data era, the memory capacity required for computing systems is rapidly increasing, especially in High Performance Computing systems. However, the number of DRAMs that can be used in a computing node is limited by the structural limitations of the hardware (for example, CPU specifications). Memory expansion technology has attracted attention as a means of overcoming this limitation. This technology expands the memory capacity by leveraging the external memory connected to the host system through hardware interface such as PCIe and CCIX. In this paper, we present an overview and describe the development trends of the memory expansion technology. We also provide detailed descriptions and use cases of the CCIX that provides higher bandwidth and lower latency than cases of the PCIe.

The Development on Embedded Memory BIST IP Automatic Generation System for the Dual-Port of SRAM (SRAM 이중-포트를 위한 내장된 메모리 BIST IP 자동생성 시스템 개발)

  • Shim Eun-Sung;Lee Jung-Min;Lee Chan-Young;Chang Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.2 s.332
    • /
    • pp.57-64
    • /
    • 2005
  • In this paper, we develop the common CAD tool that creates the automatically BIST IP by user settings for the convenient test of embedded memory. Previous tools have defect that when memory model is changed, BIST IP must re-designed depending on memory model because existing tools is limited the widely used algorithms. We develop the tool that is created automatic BIST IP. It applies the algorithm according to the memory model which user requests We usually use the multi-port asynchronous SRAM needless to refresh as the embedded memory. However, This work researches on the dual-port SRAM.

Spark Framework Based on a Heterogenous Pipeline Computing with OpenCL (OpenCL을 활용한 이기종 파이프라인 컴퓨팅 기반 Spark 프레임워크)

  • Kim, Daehee;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.2
    • /
    • pp.270-276
    • /
    • 2018
  • Apache Spark is one of the high performance in-memory computing frameworks for big-data processing. Recently, to improve the performance, general-purpose computing on graphics processing unit(GPGPU) is adapted to Apache Spark framework. Previous Spark-GPGPU frameworks focus on overcoming the difficulty of an implementation resulting from the difference between the computation environment of GPGPU and Spark framework. In this paper, we propose a Spark framework based on a heterogenous pipeline computing with OpenCL to further improve the performance. The proposed framework overlaps the Java-to-Native memory copies of CPU with CPU-GPU communications(DMA) and GPU kernel computations to hide the CPU idle time. Also, CPU-GPU communication buffers are implemented with switching dual buffers, which reduce the mapped memory region resulting in decreasing memory mapping overhead. Experimental results showed that the proposed Spark framework based on a heterogenous pipeline computing with OpenCL had up to 2.13 times faster than the previous Spark framework using OpenCL.

SRAM-Based Area-Efficient Computing-in-Memory for AI Edge Devices (AI 엣지 디바이스를 위한 SRAM 기반 면적 효율적인 컴퓨팅 인 메모리)

  • Hyun-Ki Hong;Sung-Hun Jo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.5
    • /
    • pp.1051-1058
    • /
    • 2024
  • In semiconductors for AI, Computing in Memory (CIM) integrates computation and memory to minimize data movement and reduce processing bottlenecks, thereby improving performance. In AI tasks that handle large amounts of data, CIM is gaining attention as a key technology that optimizes the performance of AI systems by improving power efficiency and enabling faster computation. In this paper, a new CIM architecture for AI semiconductors is proposed. The proposed architecture can perform MAC operations by controlling the width of the transistor and the pulse width of the control signal, and can be implemented in a smaller area than the existing architecture.