• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.025 seconds

MATE: Memory- and Retraining-Free Error Correction for Convolutional Neural Network Weights

  • Jang, Myeungjae;Hong, Jeongkyu
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.1
    • /
    • pp.22-28
    • /
    • 2021
  • Convolutional neural networks (CNNs) are one of the most frequently used artificial intelligence techniques. Among CNN-based applications, small and timing-sensitive applications have emerged, which must be reliable to prevent severe accidents. However, as the small and timing-sensitive systems do not have sufficient system resources, they do not possess proper error protection schemes. In this paper, we propose MATE, which is a low-cost CNN weight error correction technique. Based on the observation that all mantissa bits are not closely related to the accuracy, MATE replaces some mantissa bits in the weight with error correction codes. Therefore, MATE can provide high data protection without requiring additional memory space or modifying the memory architecture. The experimental results demonstrate that MATE retains nearly the same accuracy as the ideal error-free case on erroneous DRAM and has approximately 60% accuracy, even with extremely high bit error rates.

Analyzing the Overhead of the Memory Mapped File I/O for In-Memory File Systems (메모리 파일시스템에서 메모리 매핑을 이용한 파일 입출력의 오버헤드 분석)

  • Choi, Jungsik;Han, Hwansoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.10
    • /
    • pp.497-503
    • /
    • 2016
  • Emerging next-generation storage technologies such as non-volatile memory will help eliminate almost all of the storage latency that has plagued previous storage devices. In conventional storage systems, the latency of slow storage devices dominates access latency; hence, software efficiency is not critical. With low-latency storage, software costs can quickly dominate memory latency. Hence, researchers have proposed the memory mapped file I/O to avoid the software overhead. Mapping a file into the user memory space enables users to access the file directly. Therefore, it is possible to avoid the complicated I/O stack. This minimizes the number of user/kernel mode switchings. In addition, there is no data copy between kernel and user areas. Despite of the benefits in the memory mapped file I/O, its overhead still needs to be addressed, as the existing mechanism for the memory mapped file I/O is designed for slow block devices. In this paper, we identify the overheads of the memory mapped file I/O via experiments.

Efficient External Memory Algorithm for Finding the Maximum Suffix of a String (스트링의 최대 서픽스를 계산하는 효율적인 외부 메모리 알고리즘)

  • Kim, Sung-Kwon;Kim, Soo-Cheol;Cho, Jung-Sik
    • The KIPS Transactions:PartA
    • /
    • v.15A no.4
    • /
    • pp.239-242
    • /
    • 2008
  • We study the problem of finding the maximum suffix of a string on the external memory model of computation with one disk. In this model, we are primarily interested in designing algorithms that reduce the number of I/Os between the disk and the internal memory. A string of length N has N suffixes and among these, the lexicographically largest one is called the maximum suffix of the string. Finding the maximum suffix of a string plays a crucial role in solving some string problems. In this paper, we present an external memory algorithm for computing the maximum suffix of a string of length N. The algorithm uses four blocks in the internal memory and performs at most 4(N/L) disk I/Os, where L is the size of a block.

Design and Analysis of User's Libraries for Parallel Computing based on the Internet (인터넷 기반의 병렬 컴퓨팅을 위한 사용자 라이브러리 설계 및 성능 분석)

  • Sin, Pil-Seop;Jeong, Jun-Mok;Maeng, Hye-Seon;Hong, Won-Gi;Kim, Sin-Deok
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.11
    • /
    • pp.2932-2945
    • /
    • 1999
  • As the Internet and Java technology have been growing up, parallel processing approach to utilize those idle resources connected to the Internet has become quite attractive. In this paper, JICE(Java Internet Computing Environment) was implemented as a parallel computing platform based on the Internet using multithreading and RMI mechanisms provided by Java. The basic model of JICE is constructed as three components, such as a client, a set of workers, and a broker. A worker communicates with other workers via a globally shared memory system. It provides users with master-slave programming model and a collection of library functions. The basic model of JICE is also extended as a multimanaging system. This multimanaging system is evaluated by analysis to show its effectiveness. According to numerical analysis and experiments with several benchmarks, it is shown that the performance of basic model depends on the shared memory reference ratio and user's library is a quite promising.

  • PDF

TPMP: A Privacy-Preserving Technique for DNN Prediction Using ARM TrustZone (TPMP : ARM TrustZone을 활용한 DNN 추론 과정의 기밀성 보장 기술)

  • Song, Suhyeon;Park, Seonghwan;Kwon, Donghyun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.3
    • /
    • pp.487-499
    • /
    • 2022
  • Machine learning such as deep learning have been widely used in recent years. Recently deep learning is performed in a trusted execution environment such as ARM TrustZone to improve security in edge devices and embedded devices with low computing resource. To mitigate this problem, we propose TPMP that efficiently uses the limited memory of TEE through DNN model partitioning. TPMP achieves high confidentiality of DNN by performing DNN models that could not be run with existing memory scheduling methods in TEE through optimized memory scheduling. TPMP required a similar amount of computational resources to previous methodologies.

Designing Hybrid HDD using SLC/MLC combined Flash Memory (SLC/MLC 혼합 플래시 메모리를 이용한 하이브리드 하드디스크 설계)

  • Hong, Seong-Cheol;Shin, Dong-Kun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.789-793
    • /
    • 2010
  • Recently, flash memory-based non-volatile cache (NVC) is emerging as an effective solution to enhance both I/O performance and energy consumption of storage systems. To get significant performance and energy gains by NVC, it would be better to use multi-level-cell (MLC) flash memories since it can provide a large capacity of NVC with low cost. However, the number of available program/erase cycles of MLC flash memory is smaller than that of single-level-cell (SLC) flash memory limiting the lifespan of NVC. To overcome such a limitation, SLC/MLC combined flash memory is a promising solution for NVC. In this paper, we propose an effective management scheme for heterogeneous SLC and MLC regions of the combined flash memory.

Page Replacement Algorithm for Improving Performance of Hybrid Main Memory (하이브리드 메인 메모리의 성능 향상을 위한 페이지 교체 기법)

  • Lee, Minhoe;Kang, Dong Hyun;Kim, Junghoon;Eom, Young Ik
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.1
    • /
    • pp.88-93
    • /
    • 2015
  • In modern computer systems, DRAM is commonly used as main memory due to its low read/write latency and high endurance. However, DRAM is volatile memory that requires periodic power supply (i.e., memory refresh) to sustain the data stored in it. On the other hand, PCM is a promising candidate for replacement of DRAM because it is non-volatile memory, which could sustain the stored data without memory refresh. PCM is also available for byte-addressable access and in-place update. However, PCM is unsuitable for using main memory of a computer system because it has two limitations: high read/write latency and low endurance. To take the advantage of both DRAM and PCM, a hybrid main memory, which consists of DRAM and PCM, has been suggested and actively studied. In this paper, we propose a novel page replacement algorithm for hybrid main memory. To cope with the weaknesses of PCM, our scheme focuses on reducing the number of PCM writes in the hybrid main memory. Experimental results shows that our proposed page replacement algorithm reduces the number of PCM writes by up to 80.5% compared with the other page replacement algorithms.

System Software Modeling Based on Dual Priority Scheduling for Sensor Network (센서네트워크를 위한 Dual Priority Scheduling 기반 시스템 소프트웨어 모델링)

  • Hwang, Tae-Ho;Kim, Dong-Sun;Moon, Yeon-Guk;Kim, Seong-Dong;Kim, Jung-Guk
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.2 no.4
    • /
    • pp.260-273
    • /
    • 2007
  • The wireless sensor network (WSN) nodes are required to operate for several months with the limited system resource such as memory and power. The hardware platform of WSN has 128Kbyte program memory and 8Kbytes data memory. Also, WSN node is required to operate for several months with the two AA size batteries. The MAC, Network protocol, and small application must be operated in this WSN platform. We look around the problem of memory and power for WSN requirements. Then, we propose a new computing model of system software for WSN node. It is the Atomic Object Model (AOM) with Dual Priority Scheduling. For the verification of model, we design and implement IEEE 802.15.4 MAC protocol with the proposed model.

  • PDF

Lifetime Extension Method for Non-Volatile Memory based Deep Learning System by analyzing Data Write Pattern (데이터 쓰기 패턴 분석을 통한 비휘발성 메모리 기반 딥러닝 시스템의 수명 연장 기법)

  • Choi, Juhee
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.3
    • /
    • pp.1-6
    • /
    • 2022
  • Modern computer systems usually have special hardware for operations used in deep learning workload even edge computing environment. Non-volatile memories (NVMs) have been considered for alternative memory storage because they consume little static energy and occupy small area. However, there is a problem for NVMs to be directly adopted. An NVM cell has limited write endurance, so that the lifetime of NVM-based memory system is much shorter than that of conventional memory system. To overcome this problem for the deep learning system, this paper proposes a novel method to extend the lifetime based on the analysis of the deep learning workloads. If an incoming block has more than a predefined number of frequently used values, the cacheline is defined as write friendly block. During the victim selection, the cacheline has lower possibility to be chosen as victim. The experimental results show that the lifetime is increased by about 50% and energy consumption is decreased by 3% with a little performance hurt.

Roofline-based Data Migration Methodology for Hybrid Memories

  • Jongmin Lee;Kwangho Lee;Mucheol Kim;Geunchul Park;Chan Yeol Park
    • Journal of Internet Technology
    • /
    • v.21 no.3
    • /
    • pp.849-859
    • /
    • 2020
  • High-performance computing (HPC) systems provide huge computational resources and large memories. The hybrid memory is a promising memory technology that contains different types of memory devices, which have different characteristics regarding access time, retention time, and capacity. However, the increasing performance and employing hybrid memories induce more complexity as well. In this paper, we propose a roofline-based data migration methodology called HyDM to effectively use hybrid memories targeting at Intel Knight Landing (KNL) processor. HyDM monitors status of applications running on a system and migrates pages of selected applications to the High Bandwidth Memory (HBM). To select appropriate applications on system runtime, we adopt the roofline performance model, a visually intuitive method. HyDM also employs a feedback mechanism to change the target application dynamically. Experimental results show that our HyDM improves over the baseline execution the execution time by up to 44%.