• Title/Summary/Keyword: In-Memory Computing

검색결과 759건 처리시간 0.023초

FPGA 상에서 OpenCL을 이용한 병렬 문자열 매칭 구현과 최적화 방향 (Parallel String Matching and Optimization Using OpenCL on FPGA)

  • 윤진명;최강일;김현진
    • 전기학회논문지
    • /
    • 제66권1호
    • /
    • pp.100-106
    • /
    • 2017
  • In this paper, we propose a parallel optimization method of Aho-Corasick (AC) algorithm and Parallel Failureless Aho-Corasick (PFAC) algorithm using Open Computing Language (OpenCL) on Field Programmable Gate Array (FPGA). The low throughput of string matching engine causes the performance degradation of network process. Recently, many researchers have studied the string matching engine using parallel computing. FPGA's vendors offer a parallel computing platform using OpenCL. In this paper, we apply the AC and PFAC algorithm on DE1-SoC board with Cyclone V FPGA, where the optimization that considers FPGA architecture is performed. Experiments are performed considering global id, local id, local memory, and loop unrolling optimizations using PFAC algorithm. The performance improvement using loop unrolling is 129 times greater than AC algorithm that not adopt loop unrolling. The performance improvements using loop unrolling are 1.1, 0.2, and 1.5 times greater than those using global id, local id, and local memory optimizations mentioned above.

Two-Level Scratchpad Memory Architectures to Achieve Time Predictability and High Performance

  • Liu, Yu;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • 제8권4호
    • /
    • pp.215-227
    • /
    • 2014
  • In modern computer architectures, caches are widely used to shorten the gap between processor speed and memory access time. However, caches are time-unpredictable, and thus can significantly increase the complexity of worst-case execution time (WCET) analysis, which is crucial for real-time systems. This paper proposes a time-predictable two-level scratchpad-based architecture and an ILP-based static memory objects assignment algorithm to support real-time computing. Moreover, to exploit the load/store latencies that are known statically in this architecture, we study a Scratch-pad Sensitive Scheduling method to further improve the performance. Our experimental results indicate that the performance and energy consumption of the two-level scratchpad-based architecture are superior to the similar cache based architecture for most of the benchmarks we studied.

OpenMP와 MPI 코드의 상대적, 혼합적 성능 고찰 (Comparative and Combined Performance Studies of OpenMP and MPI Codes)

  • 이명호
    • 정보처리학회논문지A
    • /
    • 제13A권2호
    • /
    • pp.157-162
    • /
    • 2006
  • 최근의 고성능 컴퓨팅 플랫폼들은 공유 메모리 다중 프로세서(SMP: Shared Memory Multiprocessor) 시스템, 대규모 병렬 프로세서 (Massively Parallel Processor) 시스템, 여러 개의 컴퓨팅 노드들을 연결한 클러스터(Cluster) 시스템 등으로 분류된다. 이러한 고성능 컴퓨팅 시스템들은 높은 수준의 컴퓨팅 성능을 요구하는 과학 기술용 응용 프로그램들을 위하여 사용된다. 이러한 응용 프로그램들의 실행시 최적의 성능을 얻기 위해서는 적절한 컴퓨팅 플랫폼과 프로그래밍 방식의 선택이 중요하다. 본 연구 논문에서는 여러 방식의 병렬 프로그래밍 모델을 사용하여 개발된 SPEC HPC2002 벤치마크 suite을 위한 최적의 컴퓨팅 플랫폼과 프로그래밍 모델을 그들의 성능 분석 및 평가 작업을 통하여 찾아간다.

비선형계획법에 의한 자동경제급전 알고리즘의 개발에 관한 연구 (Algorithm for Economic Load Dispatch by the Nonlinear Programming Method)

  • 박영문;김건중
    • 전기의세계
    • /
    • 제26권1호
    • /
    • pp.77-81
    • /
    • 1977
  • This paper aims to develope a new algorithm to overcome the disadvantages of the conventional E.L.D system based on the B-Constants and Penalty-Factors scheme. The main features of this paper are that the Variabiable Decoupled Method usually employed in the Load-Flow studies is introduced to the E.L.D. algorithm developed by Sasson, using the Powell's Nonlinear Programming Scheme. Besides this, other minor refinements are made to reduce memory spaces and computing time. Case studies show that the method suggested here has the remarkable advantages of computing efficiency and memory requirements over Sasson's.

  • PDF

Memory Design for Artificial Intelligence

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제12권1호
    • /
    • pp.90-94
    • /
    • 2020
  • Artificial intelligence (AI) is software that learns large amounts of data and provides the desired results for certain patterns. In other words, learning a large amount of data is very important, and the role of memory in terms of computing systems is important. Massive data means wider bandwidth, and the design of the memory system that can provide it becomes even more important. Providing wide bandwidth in AI systems is also related to power consumption. AlphaGo, for example, consumes 170 kW of power using 1202 CPUs and 176 GPUs. Since more than 50% of the consumption of memory is usually used by system chips, a lot of investment is being made in memory technology for AI chips. MRAM, PRAM, ReRAM and Hybrid RAM are mainly studied. This study presents various memory technologies that are being studied in artificial intelligence chip design. Especially, MRAM and PRAM are commerciallized for the next generation memory. They have two significant advantages that are ultra low power consumption and nearly zero leakage power. This paper describes a comparative analysis of the four representative new memory technologies.

TLC NAND-형 플래시 메모리 내장 자체테스트 (TLC NAND-type Flash Memory Built-in Self Test)

  • 김진완;장훈
    • 전자공학회논문지
    • /
    • 제51권12호
    • /
    • pp.72-82
    • /
    • 2014
  • 최근 스마트폰, 태블릿 PC, SSD(Solid State Drive)의 보급률 증가로 메모리 반도체 산업시장의 규모는 지속적으로 증가하고 있다. 또한 최근 SSD시장에 TLC NAND-형 플래시 메모리 제품의 출시로 인해 TLC NAND-형 플래시 메모리의 수요가 점차 증가할 것으로 예상된다. SLC NAND 플래시 메모리는 많은 연구가 진행되었지만 TLC NAND 플래시 메모리는 연구가 진행되지 않고 있다. 또한 NAND-형 플래시 메모리는 고가의 외부장비에 의존하여 테스트를 하고 있다. 따라서 본 논문은 기존에 제안된 SLC NAND 플래시 메모리와 MLC NAND 플래시 메모리 테스트 알고리즘을 TLC NAND 플래시 메모리에 맞게 알고리즘과 패턴을 수정하여 적용하고 고가의 외부 테스트 장비 없이 자체 테스트 수행이 가능한 구조를 제안한다.

주파수영역에서 49점 가중평균을 이용한 scalar 파동방정식의 유한차분식 정확도 향상을 위한 연구 (An Accuracy Improvement in Solving Scalar Wave Equation by Finite Difference Method in Frequency Domain Using 49 Points Weighted Average Method)

  • 장성형;신창수;양동우;양승진
    • 자원환경지질
    • /
    • 제29권2호
    • /
    • pp.183-192
    • /
    • 1996
  • Much computing time and large computer memory are needed to solve the wave equation in a large complex subsurface layer using finite difference method. The time and memory can be reduced by decreasing the number of grid per minimun wave length. However, decrease of grid may cause numerical dispersion and poor accuracy. In this study, we present 49 points weighted average method which save the computing time and memory and improve the accuracy. This method applies a new weighted average to the coordinate determined by transforming the coordinate of conventional 5 points finite difference stars to $0^{\circ}$ and $45^{\circ}$, 25 points finite differenc stars to $0^{\circ}$, $26.56^{\circ}$, $45^{\circ}$, $63.44^{\circ}$ and 49 finite difference stars to $0^{\circ}$, $18.43^{\circ}$, $33.69^{\circ}$, $45^{\circ}$, $56.30^{\circ}$, $71.56^{\circ}$. By this method, the grid points per minimum wave length can be reduced to 2.5, the computing time to $(2.5/13)^3$, and the required core memory to $(2.5/13)^4$ computing with the conventional method.

  • PDF

Characterization Studies on Data Access Bias in Mobile Platforms

  • Bahn, Hyokyung
    • International journal of advanced smart convergence
    • /
    • 제10권4호
    • /
    • pp.52-58
    • /
    • 2021
  • Data access bias can be observed in various types of computing systems. In this paper, we characterize the data access bias in modern mobile computing platforms. In particular, we focus on the access bias of data observed at three different subsystems based on our experiences. First, we show the access bias of file data in mobile platforms. Second, we show the access bias of memory data in mobile platforms. Third, we show the access bias of web data and web servers. We expect that the characterization study in this paper will be helpful in the efficient management of mobile computing systems.

저지연 서비스를 위한 Multi-access Edge Computing 스케줄러 (Multi-access Edge Computing Scheduler for Low Latency Services)

  • 김태현;김태영;진성근
    • 대한임베디드공학회논문지
    • /
    • 제15권6호
    • /
    • pp.299-305
    • /
    • 2020
  • We have developed a scheduler that additionally consider network performance by extending the Kubernetes developed to manage lots of containers in cloud computing nodes. The network delay adapt characteristics of the compute nodes were learned during server operation and the learned results were utilized to develop placement algorithm by considering the existing measurement units, CPU, memory, and volume together, and it was confirmed that the low delay network service was provided through placement algorithm.

머신러닝 기반 메모리 성능 개선 연구 (Study on Memory Performance Improvement based on Machine Learning)

  • 조두산
    • 문화기술의 융합
    • /
    • 제7권1호
    • /
    • pp.615-619
    • /
    • 2021
  • 이 연구는 사물인터넷, 클라우드 컴퓨팅 그리고 에지 컴퓨팅 등 많은 임베디드 시스템에서 성능 및 에너지 효율을 높이고자 최적화하는 메모리 시스템에 초점을 맞추어 그 성능 개선 기법을 제안한다. 제안하는 기법은 최근 많이 이용되고 있는 머신 러닝 알고리즘을 기반으로 메모리 시스템 성능을 도모한다. 머신 러닝 기법은 학습을 통하여 다양한 응용에 사용될 수 있는데, 메모리 시스템 성능 개선에서 사용되는 데이터의 분류 태스크에 적용될 수 있다. 정확도 높은 머신 러닝 기법 기반 데이터 분류는 데이터의 사용 패턴에 따라 데이터를 적절하게 배치할 수 있게 하여 전체 시스템 성능 개선을 도모할 수 있게 한다.