• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.026 seconds

Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units (범용 그래픽 처리 장치의 메모리 설계를 위한 그래픽 처리 장치의 메모리 특성 분석)

  • Choi, Hongjun;Kim, Cheolhong
    • Smart Media Journal
    • /
    • v.3 no.1
    • /
    • pp.33-38
    • /
    • 2014
  • Even though the performance of microprocessor is improved continuously, the performance improvement of computing system becomes hard to increase, in order to some drawbacks including increased power consumption. To solve the problem, general-purpose computing on graphics processing units(GPGPUs), which execute general-purpose applications by using specialized parallel-processing device representing graphics processing units(GPUs), have been focused. However, the characteristics of applications related with graphics is substantially different from the characteristics of general-purpose applications. Therefore, GPUs cannot exploit the outstanding computational resources sufficiently due to various constraints, when they execute general-purpose applications. When designing GPUs for GPGPU, memory system is important to effectively exploit the GPUs since typically general-purpose applications requires more memory accesses than graphics applications. Especially, external memory access requiring long latency impose a big overhead on the performance of GPUs. Therefore, the GPU performance must be improved if hierarchical memory architecture which can reduce the number of external memory access is applied. For this reason, we will investigate the analysis of GPU performance according to hierarchical cache architectures in executing various benchmarks.

Built-In Redundancy Analysis Algorithm for Embedded Memory Built-In Self Repair with 2-D Redundancy (내장 메모리 자가 복구를 위한 여분의 메모리 분석 알고리즘)

  • Shim, Eun-Sung;Chang, Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.2
    • /
    • pp.113-120
    • /
    • 2007
  • With the advance of VLSI technology, the capacity and density of memories is rapidly growing. In this paper we proposed reallocation algorithm. All faulty cell of embedded memory is reallocated into the row and column spare memory. This work implements reallocation algorithm and BISR to verify its design.

Parallel String Matching and Optimization Using OpenCL on FPGA (FPGA 상에서 OpenCL을 이용한 병렬 문자열 매칭 구현과 최적화 방향)

  • Yoon, Jin Myung;Choi, Kang-Il;Kim, Hyun Jin
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.1
    • /
    • pp.100-106
    • /
    • 2017
  • In this paper, we propose a parallel optimization method of Aho-Corasick (AC) algorithm and Parallel Failureless Aho-Corasick (PFAC) algorithm using Open Computing Language (OpenCL) on Field Programmable Gate Array (FPGA). The low throughput of string matching engine causes the performance degradation of network process. Recently, many researchers have studied the string matching engine using parallel computing. FPGA's vendors offer a parallel computing platform using OpenCL. In this paper, we apply the AC and PFAC algorithm on DE1-SoC board with Cyclone V FPGA, where the optimization that considers FPGA architecture is performed. Experiments are performed considering global id, local id, local memory, and loop unrolling optimizations using PFAC algorithm. The performance improvement using loop unrolling is 129 times greater than AC algorithm that not adopt loop unrolling. The performance improvements using loop unrolling are 1.1, 0.2, and 1.5 times greater than those using global id, local id, and local memory optimizations mentioned above.

Two-Level Scratchpad Memory Architectures to Achieve Time Predictability and High Performance

  • Liu, Yu;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.8 no.4
    • /
    • pp.215-227
    • /
    • 2014
  • In modern computer architectures, caches are widely used to shorten the gap between processor speed and memory access time. However, caches are time-unpredictable, and thus can significantly increase the complexity of worst-case execution time (WCET) analysis, which is crucial for real-time systems. This paper proposes a time-predictable two-level scratchpad-based architecture and an ILP-based static memory objects assignment algorithm to support real-time computing. Moreover, to exploit the load/store latencies that are known statically in this architecture, we study a Scratch-pad Sensitive Scheduling method to further improve the performance. Our experimental results indicate that the performance and energy consumption of the two-level scratchpad-based architecture are superior to the similar cache based architecture for most of the benchmarks we studied.

Comparative and Combined Performance Studies of OpenMP and MPI Codes (OpenMP와 MPI 코드의 상대적, 혼합적 성능 고찰)

  • Lee Myung-Ho
    • The KIPS Transactions:PartA
    • /
    • v.13A no.2 s.99
    • /
    • pp.157-162
    • /
    • 2006
  • Recent High Performance Computing (HPC) platforms can be classified as Shared-Memory Multiprocessors (SMP), Massively Parallel Processors (MPP), and Clusters of computing nodes. These platforms are deployed in many scientific and engineering applications which require very high demand on computing power. In order to realize an optimal performance for these applications, it is crucial to find and use the suitable computing platforms and programming paradigms. In this paper, we use SPEC HPC 2002 benchmark suite developed in various parallel programming models (MPI, OpenMP, and hybrid of MPI/OpenMP) to find an optimal computing environments and programming paradigms for them through their performance analyses.

Algorithm for Economic Load Dispatch by the Nonlinear Programming Method (비선형계획법에 의한 자동경제급전 알고리즘의 개발에 관한 연구)

  • 박영문;김건중
    • 전기의세계
    • /
    • v.26 no.1
    • /
    • pp.77-81
    • /
    • 1977
  • This paper aims to develope a new algorithm to overcome the disadvantages of the conventional E.L.D system based on the B-Constants and Penalty-Factors scheme. The main features of this paper are that the Variabiable Decoupled Method usually employed in the Load-Flow studies is introduced to the E.L.D. algorithm developed by Sasson, using the Powell's Nonlinear Programming Scheme. Besides this, other minor refinements are made to reduce memory spaces and computing time. Case studies show that the method suggested here has the remarkable advantages of computing efficiency and memory requirements over Sasson's.

  • PDF

Memory Design for Artificial Intelligence

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.1
    • /
    • pp.90-94
    • /
    • 2020
  • Artificial intelligence (AI) is software that learns large amounts of data and provides the desired results for certain patterns. In other words, learning a large amount of data is very important, and the role of memory in terms of computing systems is important. Massive data means wider bandwidth, and the design of the memory system that can provide it becomes even more important. Providing wide bandwidth in AI systems is also related to power consumption. AlphaGo, for example, consumes 170 kW of power using 1202 CPUs and 176 GPUs. Since more than 50% of the consumption of memory is usually used by system chips, a lot of investment is being made in memory technology for AI chips. MRAM, PRAM, ReRAM and Hybrid RAM are mainly studied. This study presents various memory technologies that are being studied in artificial intelligence chip design. Especially, MRAM and PRAM are commerciallized for the next generation memory. They have two significant advantages that are ultra low power consumption and nearly zero leakage power. This paper describes a comparative analysis of the four representative new memory technologies.

TLC NAND-type Flash Memory Built-in Self Test (TLC NAND-형 플래시 메모리 내장 자체테스트)

  • Kim, Jin-Wan;Chang, Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.12
    • /
    • pp.72-82
    • /
    • 2014
  • Recently, the size of semiconductor industry market is constantly growing, due to the increase in diffusion of smart-phone, tablet PC and SSD(Solid State Drive). Also, it is expected that the demand for TLC NAND-type flash memory would gradually increase, with the recent release of TLC NAND-type flash memory in the SSD market. There have been a lot of studies on SLC NAND flash memory, but no research on TLC NAND flash memory has been conducted, yet. Also, a test of NAND-type flash memory is depending on a high-priced external equipment. Therefore, this study aims to suggest a structure for an autonomous test with no high-priced external test device by modifying the existing SLC NAND flash memory and MLC NAND flash memory test algorithms and patterns and applying them to TLC NAND flash memory.

An Accuracy Improvement in Solving Scalar Wave Equation by Finite Difference Method in Frequency Domain Using 49 Points Weighted Average Method (주파수영역에서 49점 가중평균을 이용한 scalar 파동방정식의 유한차분식 정확도 향상을 위한 연구)

  • Jang, Seong Hyung;Shin, Chang Soo;Yang, Dong Woo;Yang, Sung Jin
    • Economic and Environmental Geology
    • /
    • v.29 no.2
    • /
    • pp.183-192
    • /
    • 1996
  • Much computing time and large computer memory are needed to solve the wave equation in a large complex subsurface layer using finite difference method. The time and memory can be reduced by decreasing the number of grid per minimun wave length. However, decrease of grid may cause numerical dispersion and poor accuracy. In this study, we present 49 points weighted average method which save the computing time and memory and improve the accuracy. This method applies a new weighted average to the coordinate determined by transforming the coordinate of conventional 5 points finite difference stars to $0^{\circ}$ and $45^{\circ}$, 25 points finite differenc stars to $0^{\circ}$, $26.56^{\circ}$, $45^{\circ}$, $63.44^{\circ}$ and 49 finite difference stars to $0^{\circ}$, $18.43^{\circ}$, $33.69^{\circ}$, $45^{\circ}$, $56.30^{\circ}$, $71.56^{\circ}$. By this method, the grid points per minimum wave length can be reduced to 2.5, the computing time to $(2.5/13)^3$, and the required core memory to $(2.5/13)^4$ computing with the conventional method.

  • PDF

Characterization Studies on Data Access Bias in Mobile Platforms

  • Bahn, Hyokyung
    • International journal of advanced smart convergence
    • /
    • v.10 no.4
    • /
    • pp.52-58
    • /
    • 2021
  • Data access bias can be observed in various types of computing systems. In this paper, we characterize the data access bias in modern mobile computing platforms. In particular, we focus on the access bias of data observed at three different subsystems based on our experiences. First, we show the access bias of file data in mobile platforms. Second, we show the access bias of memory data in mobile platforms. Third, we show the access bias of web data and web servers. We expect that the characterization study in this paper will be helpful in the efficient management of mobile computing systems.