• Title/Summary/Keyword: Computing-In-Memory

Search Result 764, Processing Time 0.028 seconds

Design and Implementation of Memory-Centric Computing System for Big Data Analysis

  • Jung, Byung-Kwon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.7
    • /
    • pp.1-7
    • /
    • 2022
  • Recently, as the use of applications such as big data programs and machine learning programs that are driven while generating large amounts of data in the program itself becomes common, the existing main memory alone lacks memory, making it difficult to execute the program quickly. In particular, the need to derive results more quickly has emerged in a situation where it is necessary to analyze whether the entire sequence is genetically altered due to the outbreak of the coronavirus. As a result of measuring performance by applying large-capacity data to a computing system equipped with a self-developed memory pool MOCA host adapter instead of processing large-capacity data from an existing SSD, performance improved by 16% compared to the existing SSD system. In addition, in various other benchmark tests, IO performance was 92.8%, 80.6%, and 32.8% faster than SSD in computing systems equipped with memory pool MOCA host adapters such as SortSampleBam, ApplyBQSR, and GatherBamFiles by task of workflow. When analyzing large amounts of data, such as electrical dielectric pipeline analysis, it is judged that the measurement delay occurring at runtime can be reduced in the computing system equipped with the memory pool MOCA host adapter developed in this research.

Memory BIST Circuit Generator System Design Based on Fault Model (고장 모델 기반 메모리 BIST 회로 생성 시스템 설계)

  • Lee Jeong-Min;Shim Eun-Sung;Chang Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.2 s.332
    • /
    • pp.49-56
    • /
    • 2005
  • In this paper, we propose a memory BIST Circuit Creation System which creates BIST circuit based on user defined fault model and generates the optimized march test algorithm. Traditional tools have some limit that regenerates BIST circuit after changing the memory type or test algorithm. However, this proposed creation system can automatically generate memory BIST circuit which is suitable in the various memory type and apply algorithm which is required by user. And it gets more efficient through optimizing algorithms for fault models which is selected randomly according to proposed nile. In addition, it support various address width and data and consider interface of IEEE 1149.1 circuit.

Trends in High Speed Fabric-Interconnect-Based Memory Centric Computing Architecture (고속 패브릭 연결망 기반 메모리 중심 컴퓨팅 기술 동향)

  • S.-J. Cha;S.-W. Sok;H.J. Kwon;Y.W. Kim;J. Kim;H.Y. Kim;K.-W. Koh;K.-H. Kim
    • Electronics and Telecommunications Trends
    • /
    • v.39 no.5
    • /
    • pp.98-107
    • /
    • 2024
  • Applications such as artificial intelligence continue to grow in complexity and scale. Thus, the demand for scalable computing is increasing for achieving faster data processing and improved efficiency. This requirement has led to the development of memory-centric computing and high-speed fabric interconnection technologies. Memory-centric computing reduces the latency and enhances the system performance by shifting the focus from the central processing unit to the memory, whereas high-speed fabric interconnects enable efficient data transfer across various computing resources. Technologies such as Gen-Z, OpenCAPI, and CCIX have been integrated into the CXL (Compute Express Link) standard since 2019 to improve communication and cache coherence. Ethernet-based interconnects such as RoCE, InfiniBand, and OmniXtend also play a crucial role in providing high-speed data transfer and low latency. We explore the latest trends and prospects of these technologies, highlighting their benefits and applications.

Parallel Computing Environment for R with on Supercomputer Systems (빅데이터 분석을 위한 슈퍼컴퓨터 환경에서 R의 병렬처리)

  • Lee, Sang Yeol;Won, Joong Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.4
    • /
    • pp.19-31
    • /
    • 2014
  • We study parallel processing techniques for the R programming language of high performance computing technology. In this study, we used massively parallel computing system which has 25,408 cpu cores. We conducted a performance evaluation of a distributed memory system using MPI and of a the shared memory system using OpenMP. Our findings are summarized as follows. First, For some particular algorithms, parallel processing is about 150 times faster than serial processing in R. Second, the distributed memory system gets faster as the number of nodes increases while shared memory system is limited in the improvement of performance, due to the limit of the number of cpus in a single system.

Development of the Efficient Compressed Data Management System for Embedded DBMS (모바일 DBMS를 위한 효율적인 압축 데이터 관리 시스템의 개발)

  • Shin, Young-Jae;Hwang, Jin-Ho;Kim, Hak-Soo;Lee, Seung-Mi;Son, Jin-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.15D no.5
    • /
    • pp.589-598
    • /
    • 2008
  • Recently, Mobile Computing Devices are used generally. And Information which is processed by Mobile computing devices is increasing. Because information is digitalizing. So Mobile computing Devices demand an Embedded DBMS for efficient management of information. Moreover Mobile computing Devices demand an efficient storage management in NAND-type flash memory because the NAND-type flash memory is using generally in Mobile computing devices and the NAND-type flash memory is more expensive than the magnetic disks. So that in this paper, we present an efficient Compressed Data Management System for the embedded DBMS that is used in flash memory. This proposed system improve the space utilization and extend a lifetime of a flash memory because it decreases the size of data.

Analysis of the GPGPU Performance for Various Combinations of Workloads Executed Concurrently (동시에 실행되는 워크로드 조합에 따른 GPGPU 성능 분석)

  • Kim, Dongwhan;Eom, Hyeonsang
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.165-170
    • /
    • 2017
  • Many studies have utilized GPGPU (General-Purpose Graphic Processing Unit) and its high computing power to compute complex tasks. The characteristics of GPGPU programs necessitate the operations of memory copy between the host and device. A high latency period can affect the performance of the program. Thus, it is required to significantly improve the performance of GPGPU programs by optimizations. By executing multiple GPGPU programs simultaneously, the latency hiding effect of memory copy is achieved by overlapping the memory copy and computing operations in GPGPU. This paper presents the results of analyzing the latency hiding effect for memory copy operations. Furthermore, we propose a performance anticipation model and an algorithm for the limitations of using pinned memory, and show that the use of the proposed algorithm results in a 41% performance increase.

Scratchpad Memory Architectures and Allocation Algorithms for Hard Real-Time Multicore Processors

  • Liu, Yu;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.2
    • /
    • pp.51-72
    • /
    • 2015
  • Time predictability is crucial in hard real-time and safety-critical systems. Cache memories, while useful for improving the average-case memory performance, are not time predictable, especially when they are shared in multicore processors. To achieve time predictability while minimizing the impact on performance, this paper explores several time-predictable scratch-pad memory (SPM) based architectures for multicore processors. To support these architectures, we propose the dynamic memory objects allocation based partition, the static allocation based partition, and the static allocation based priority L2 SPM strategy to retain the characteristic of time predictability while attempting to maximize the performance and energy efficiency. The SPM based multicore architectural design and the related allocation methods thus form a comprehensive solution to hard real-time multicore based computing. Our experimental results indicate the strengths and weaknesses of each proposed architecture and the allocation method, which offers interesting on-chip memory design options to enable multicore platforms for hard real-time systems.

Bandwidth-aware Memory Placement on Hybrid Memories targeting High Performance Computing Systems

  • Lee, Jongmin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Modern computers provide tremendous computing capability and a large memory system. Hybrid memories consist of next generation memory devices and are adopted in high performance systems. However, the increased complexity of the microprocessor makes it difficult to operate the system effectively. In this paper, we propose a simple data migration method called Bandwidth-aware Data Migration (BDM) to efficiently use memory systems for high performance processors with hybrid memory. BDM monitors the status of applications running on the system using hardware performance monitoring tools and migrates the appropriate pages of selected applications to High Bandwidth Memory (HBM). BDM selects applications whose bandwidth usages are high and also evenly distributed among the threads. Experimental results show that BDM improves execution time by an average of 20% over baseline execution.

Computationally Efficient Instance Memory Monitoring Scheme for a Security-Enhanced Cloud Platform (클라우드 보안성 강화를 위한 연산 효율적인 인스턴스 메모리 모니터링 기술)

  • Choi, Sang-Hoon;Park, Ki-Woong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.4
    • /
    • pp.775-783
    • /
    • 2017
  • As interest in cloud computing grows, the number of users using cloud computing services is increasing. However, cloud computing technology has been steadily challenged by security concerns. Therefore, various security breaches are springing up to enhance the system security for cloud services users. In particular, research on detection of malicious VM (Virtual Machine) is actively underway through the introspecting virtual machines on the cloud platform. However, memory analysis technology is not used as a monitoring tool in the environments where multiple virtual machines are run on a single server platform due to obstructive monitoring overhead. As a remedy to the challenging issue, we proposes a computationally efficient instance memory introspection scheme to minimize the overhead that occurs in memory dump and monitor it through a partial memory monitoring based on the well-defined kernel memory map library.

Design of In-Memory Computing Adder Using Low-Power 8+T SRAM (저 전력 8+T SRAM을 이용한 인 메모리 컴퓨팅 가산기 설계)

  • Chang-Ki Hong;Jeong-Beom Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.2
    • /
    • pp.291-298
    • /
    • 2023
  • SRAM-based in-memory computing is one of the technologies to solve the bottleneck of von Neumann architecture. In order to achieve SRAM-based in-memory computing, it is essential to design efficient SRAM bit-cell. In this paper, we propose a low-power differential sensing 8+T SRAM bit-cell which reduces power consumption and improves circuit performance. The proposed 8+T SRAM bit-cell is applied to ripple carry adder which performs SRAM read and bitwise operations simultaneously and executes each logic operation in parallel. Compared to the previous work, the designed 8+T SRAM-based ripple carry adder is reduced power consumption by 11.53%, but increased propagation delay time by 6.36%. Also, this adder is reduced power-delay-product (PDP) by 5.90% and increased energy-delay- product (EDP) by 0.08%. The proposed circuit was designed using TSMC 65nm CMOS process, and its feasibility was verified through SPECTRE simulation.