• Title/Summary/Keyword: Memory Bandwidth

Search Result 238, Processing Time 0.027 seconds

Bandwidth-aware Memory Placement on Hybrid Memories targeting High Performance Computing Systems

  • Lee, Jongmin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Modern computers provide tremendous computing capability and a large memory system. Hybrid memories consist of next generation memory devices and are adopted in high performance systems. However, the increased complexity of the microprocessor makes it difficult to operate the system effectively. In this paper, we propose a simple data migration method called Bandwidth-aware Data Migration (BDM) to efficiently use memory systems for high performance processors with hybrid memory. BDM monitors the status of applications running on the system using hardware performance monitoring tools and migrates the appropriate pages of selected applications to High Bandwidth Memory (HBM). BDM selects applications whose bandwidth usages are high and also evenly distributed among the threads. Experimental results show that BDM improves execution time by an average of 20% over baseline execution.

Memory Design for Artificial Intelligence

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.1
    • /
    • pp.90-94
    • /
    • 2020
  • Artificial intelligence (AI) is software that learns large amounts of data and provides the desired results for certain patterns. In other words, learning a large amount of data is very important, and the role of memory in terms of computing systems is important. Massive data means wider bandwidth, and the design of the memory system that can provide it becomes even more important. Providing wide bandwidth in AI systems is also related to power consumption. AlphaGo, for example, consumes 170 kW of power using 1202 CPUs and 176 GPUs. Since more than 50% of the consumption of memory is usually used by system chips, a lot of investment is being made in memory technology for AI chips. MRAM, PRAM, ReRAM and Hybrid RAM are mainly studied. This study presents various memory technologies that are being studied in artificial intelligence chip design. Especially, MRAM and PRAM are commerciallized for the next generation memory. They have two significant advantages that are ultra low power consumption and nearly zero leakage power. This paper describes a comparative analysis of the four representative new memory technologies.

Satellite Link Simulator Development in 100 MHz Bandwidth to Simulate Satellite Communication Environment in the Geostationary Orbit (정지궤도 위성통신 환경모의를 위한 100 MHz 대역폭의 위성링크 시뮬레이터 개발)

  • Lee, Sung-Jae;Kim, Yong-Sun;Han, Tae-Kyun
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.14 no.5
    • /
    • pp.842-849
    • /
    • 2011
  • The transponder simulator designed to simulate the transponder of military satellite communication systems in the geostationary orbit is required to have time delay function, because of 250 ms delay time, when a radio wave transmits the distance of 36,000 km in free space. But, it is very difficult to develop 250 ms time delay device in the transponder simulator of 100 MHz bandwidth, due to unstable operation of FPGA, loss of memory data for the high speed rate signal processing. Up to date, bandwidth of the time delay device is limited to 45 MHz bandwidth. To solve this problem, we propose the new time delay techniques up to 100 MHz bandwidth without data loss. Proposed techniques are the low speed down scaling and high speed up scaling methods to read and write the external memory, and the matrix structure design of FPGA memory to treat data as high speed rate. We developed the satellite link simulator in 100 MHz bandwidth using the proposed new time delay techniques, implemented to the transponder simulator and verified the function of 265 ms time delay device in 100 MHz bandwidth.

Memory-Efficient Belief Propagation for Stereo Matching on GPU (GPU 에서의 고속 스테레오 정합을 위한 메모리 효율적인 Belief Propagation)

  • Choi, Young-Kyu;Williem, Williem;Park, In Kyu
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.11a
    • /
    • pp.52-53
    • /
    • 2012
  • Belief propagation (BP) is a commonly used global energy minimization algorithm for solving stereo matching problem in 3D reconstruction. However, it requires large memory bandwidth and data size. In this paper, we propose a novel memory-efficient algorithm of BP in stereo matching on the Graphics Processing Units (GPU). The data size and transfer bandwidth are significantly reduced by storing only a part of the whole message. In order to maintain the accuracy of the matching result, the local messages are reconstructed using shared memory available in GPU. Experimental result shows that there is almost an order of reduction in the global memory consumption, and 21 to 46% saving in memory bandwidth when compared to the conventional algorithm. The implementation result on a recent GPU shows that we can obtain 22.8 times speedup in execution time compared to the execution on CPU.

  • PDF

BLOCK-BASED ADAPTIVE BIT ALLOCATION FOR REFENCE MEMORY REDUCTION

  • Park, Sea-Nae;Nam, Jung-Hak;Sim, Dong-Gy;Joo, Young-Hun;Kim, Yong-Serk;Kim, Hyun-Mun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.258-262
    • /
    • 2009
  • In this paper, we propose an effective memory reduction algorithm to reduce the amount of reference frame buffer and memory bandwidth in video encoder and decoder. In general video codecs, decoded previous frames should be stored and referred to reduce temporal redundancy. Recently, reference frames are recompressed for memory efficiency and bandwidth reduction between a main processor and external memory. However, these algorithms could hurt coding efficiency. Several algorithms have been proposed to reduce the amount of reference memory with minimum quality degradation. They still suffer from quality degradation with fixed-bit allocation. In this paper, we propose an adaptive block-based min-max quantization that considers local characteristics of image. In the proposed algorithm, basic process unit is $8{\times}8$ for memory alignment and apply an adaptive quantization to each $4{\times}4$ block for minimizing quality degradation. We found that the proposed algorithm could improve approximately 37.5% in coding efficiency, compared with an existing memory reduction algorithm, at the same memory reduction rate.

  • PDF

A New Embedded Compression Algorithm for Memory Size and Bandwidth Reduction in Wavelet Transform Appliable to JPEG2000 (JPEG2000의 웨이블릿 변환용 메모리 크기 및 대역폭 감소를 위한 새로운 Embedded Compression 알고리즘)

  • Son, Chang-Hoon;Song, Sung-Gun;Kim, Ji-Won;Park, Seong-Mo;Kim, Young-Min
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.1
    • /
    • pp.94-102
    • /
    • 2011
  • To alleviate the size and bandwidth requirement in JPEG2000 system, a new Embedded Compression(EC) algorithm with minor image quality drop is proposed. For both random accessibility and low latency, very simple and efficient hadamard transform based compression algorithm is devised. We reduced LL intermediate memory and code-block memory to about half size and achieved significant memory bandwidth reductions(about 52~73%) through proposed multi-mode algorithms, without requiring any modification in JPEG2000 standard algorithm.

A Study on Efficient Use of Dual Data Memory Banks in Flight Control Computers

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.9 no.1
    • /
    • pp.29-34
    • /
    • 2017
  • Over the past several decades, embedded system and flight control computer technologies have been evolved to meet the diverse needs of the mobile device market. Current embedded systems are at the heart of technologies that can take advantage of small-sized specialized hardware while still providing high-efficiency performance at low cost. One of these key technologies is multiple memory banks. For example, a dual memory bank can provide two times more memory bandwidth in the same memory space. This benefit take lower cost to provide the same bandwidth. However, there is still few software technologies to support the efficient use of multiple memory banks. In this study, we present a technique to efficiently exploit multiple memory banks by software support. Specifically, our technique use an interference graph to optimally allocate data to different memory banks by an optimizing compiler. As a result, the execution time can be improved upto 7% with the proposed technique.

Multi-mode Embedded Compression Algorithm and Architecture for Code-block Memory Size and Bandwidth Reduction in JPEG2000 System (JPEG2000 시스템의 코드블록 메모리 크기 및 대역폭 감소를 위한 Multi-mode Embedded Compression 알고리즘 및 구조)

  • Son, Chang-Hoon;Park, Seong-Mo;Kim, Young-Min
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.41-52
    • /
    • 2009
  • In Motion JPEG2000 encoding, huge bandwidth requirement of data memory access is the bottleneck in required system performance. For the alleviation of this bandwidth requirement, a new embedded compression(EC) algorithm with a little bit of image quality drop is devised. For both random accessibility and low latency, very simple and efficient entropy coding algorithm is proposed. We achieved significant memory bandwidth reductions (about 53${\sim}$81%) and reduced code-block memory to about half size through proposed multi-mode algorithms, without requiring any modification in JPEG2000 standard algorithm.

An Analysis of Memory Access Complexity for HEVC Decoder (HEVC 복호화기의 메모리 접근 복잡도 분석)

  • Jo, Song Hyun;Kim, Youngnam;Song, Yong Ho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.5
    • /
    • pp.114-124
    • /
    • 2014
  • HEVC is a state-of-the-art video coding standard developed by JCT-VC. HEVC provides about 2 times higher subjective coding efficiency than H.264/AVC. One of the main goal of HEVC development is to efficiently coding UHD resolution video so that HEVC is expected to be widely used for coding UHD resolution video. Decoding such high resolution video generates a large number of memory accesses, so a decoding system needs high-bandwidth for memory system and/or internal communication architecture. In order to determine such requirements, this paper presents an analysis of the memory access complexity for HEVC decoder. we first estimate the amount of memory access performed by software HEVC decoder on an embedded system and a desktop computer. Then, we present the memory bandwidth models for HEVC decoder by analyzing the data flow of HEVC decoding tools. Experimental results show the software decoder produce 6.9-40.5 GB/s of DRAM accesses. also, the analysis reveals the hardware decoder requires 2.4 GB/s of DRAM bandwidth.

CUDA based Lossless Asynchronous Compression of Ultra High Definition Game Scenes using DPCM-GR (DPCM-GR 방식을 이용한 CUDA 기반 초고해상도 게임 영상 무손실 비동기 압축)

  • Kim, Youngsik
    • Journal of Korea Game Society
    • /
    • v.14 no.6
    • /
    • pp.59-68
    • /
    • 2014
  • Memory bandwidth requirements of UHD (Ultra High Definition $4096{\times}2160$) game scenes have been much more increasing. This paper presents a lossless DPCM-GR based compression algorithm using CUDA for solving the memory bandwidth problem without sacrificing image quality, which is modified from DDPCM-GR [4] to support bit parallel pipelining. The memory bandwidth efficiency increases because of using the shared memory of CUDA. Various asynchronous transfer configurations which can overlap the kernel execution and data transfer between host and CUDA are implemented with the page-locked host memory. Experimental results show that the maximum 31.3 speedup is obtained according to CPU time. The maximum 30.3% decreases in the computation time among various configurations.