• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.024 seconds

EFFICIENT COMPUTATION OF COMPRESSIBLE FLOW BY HIGHER-ORDER METHOD ACCELERATED USING GPU (고차 정확도 수치기법의 GPU 계산을 통한 효율적인 압축성 유동 해석)

  • Chang, T.K.;Park, J.S.;Kim, C.
    • Journal of computational fluids engineering
    • /
    • v.19 no.3
    • /
    • pp.52-61
    • /
    • 2014
  • The present paper deals with the efficient computation of higher-order CFD methods for compressible flow using graphics processing units (GPU). The higher-order CFD methods, such as discontinuous Galerkin (DG) methods and correction procedure via reconstruction (CPR) methods, can realize arbitrary higher-order accuracy with compact stencil on unstructured mesh. However, they require much more computational costs compared to the widely used finite volume methods (FVM). Graphics processing unit, consisting of hundreds or thousands small cores, is apt to massive parallel computations of compressible flow based on the higher-order CFD methods and can reduce computational time greatly. Higher-order multi-dimensional limiting process (MLP) is applied for the robust control of numerical oscillations around shock discontinuity and implemented efficiently on GPU. The program is written and optimized in CUDA library offered from NVIDIA. The whole algorithms are implemented to guarantee accurate and efficient computations for parallel programming on shared-memory model of GPU. The extensive numerical experiments validates that the GPU successfully accelerates computing compressible flow using higher-order method.

Programmable Multimedia Platform for Video Processing of UHD TV (UHD TV 영상신호처리를 위한 프로그래머블 멀티미디어 플랫폼)

  • Kim, Jaehyun;Park, Goo-man
    • Journal of Broadcast Engineering
    • /
    • v.20 no.5
    • /
    • pp.774-777
    • /
    • 2015
  • This paper introduces the world's first programmable video-processing platform for the enhancement of the video quality of the 8K(7680x4320) UHD(Ultra High Definition) TV operating up to 60 frames per second. In order to support required computing capacity and memory bandwidth, the proposed platform implemented several key features such as symmetric multi-cluster architecture for parallel data processing, a ring-data path between the clusters for data pipelining and hardware accelerators for computing filter operations. The proposed platform based on RP(Reconfigurable Processor) processes video quality enhancement algorithms and handles effectively new UHD broadcasting standards and display panels.

Control Flow Checking at Virtual Edges

  • Liu, LiPing;Ci, LinLin;Liu, Wei;Yang, Hui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.1
    • /
    • pp.396-413
    • /
    • 2017
  • Dynamically checking the integrity of software at run-time is always a hot and difficult spot for trusted computing. Control-flow integrity is a basic and important safety property of software integrity. Many classic and emerging security attacks who introduce illegal control-flow to applications can cause unpredictable behaviors of computer-based systems. In this paper, we present a software-based approach to checking violation of control flow integrity at run-time. This paper proposes a high-performance and low-overhead software control flow checking solution, control flow checking at virtual edges (CFCVE). CFCVE assigns a unique signature to each basic block and then inserts a virtual vertex into each edge at compile time. This together with insertion of signature updating instructions and checking instructions into corresponding vertexes and virtual vertexes. Control flow faults can be detected by comparing the run-time signature with the saved one at compile time. Our experimental results show that CFCVE incurs only 10.61% performance overhead on average for several C benchmark programs and the average undetected error rate is only 9.29%. Compared with previous techniques, CFCVE has the characteristics of both high fault coverage and low memory and performance overhead.

Convergence Study on Fabrication and Plasma Module Process Technology of ReRAM Device for Neuromorphic Based (뉴로모픽 기반의 저항 변화 메모리 소자 제작 및 플라즈마 모듈 적용 공정기술에 관한 융합 연구)

  • Kim, Geunho;Shin, Dongkyun;Lee, Dong-Ju;Kim, Eundo
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.10
    • /
    • pp.1-7
    • /
    • 2020
  • The manufacturing process of the resistive variable memory device, which is the based of neuromorphic device, maintained the continuity of vacuum process and applied plasma module suitable for the production of the ReRAM(resistive random access memory) and process technology for the neuromorphic computing, which ensures high integrated and high reliability. The ReRAM device of the oxide thin-film applied to the plasma module was fabricated, and research to improve the properties of the device was conducted through various experiments through changes in materials and process methods. ReRAM device based on TiO2/TiOx of oxide thin-film using plasma module was completed. Crystallinity measured by XRD rutile, HRS:LRS current value is 2.99 × 103 ratio or higher, driving voltage was measured using a semiconductor parameter, and it was confirmed that it can be driven at low voltage of 0.3 V or less. It was possible to fabricate a neuromorphic ReRAM device using oxygen gas in a previously developed plasma module, and TiOx thin-films were deposited to confirm performance.

IpCSB+ - tree : An Enhanced Main Memory Index Structure Employing the Level Prefetching Technique (레벨 프리페칭 기법을 이용한 향상된 주기억장치 상주형 색인구조)

  • Hong Hyun-Taek;Kang Tae-Ho;Yoo Jae-Soo
    • Journal of Internet Computing and Services
    • /
    • v.4 no.6
    • /
    • pp.75-86
    • /
    • 2003
  • In main-memory resident index structures, secondary cache misses considerably have an effect on the performance of index structures. Recently, several main-memory resident index structures that consider cache have been proposed to reduce the impact of secondary cache misses. However they still suffer from full secondary cache misses whenever visiting each level of a index tree, In this paper, we propose a new index structure that eliminates cache misses even when visiting each level of index tree. The proposed index structure prefetches the grandchildren of a current node. The basic structure of the proposed index structure is from CSB+-tree that uses the concepts of the node group to increase fan-out. However the insert algorithm of the proposed index structure reduces the cost of a split significantly, Also, we show the superiority of our algorithm through various performance evaluation.

  • PDF

CTIS: Cross-platform Tester Interface Software for Memory Semiconductor (메모리 반도체 검사 장비 인터페이스를 위한 크로스플랫폼 소프트웨어 기술)

  • Kim, Dong Su;Kang, Dong Hyun;Lee, Eun Seok;Lee, Kyu Sung;Eom, Young Ik
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.10
    • /
    • pp.645-650
    • /
    • 2015
  • Tester Interface Software (TIS) provides all software functions that are necessary for a testing device to perform the test process on a memory semiconductor package from the time the device is put into the test equipment until the device is discharged from the equipment. TIS should perform the same work over all types of equipment regardless of their tester models. However, TIS has been developed and managed independently of the tester models because there are various equipment and computer models that are used in the test process. Therefore, more maintenance, time and cost are required for development, which adversely affects the quality of the software, and the problem becomes more serious when the new tester model is introduced. In this paper, we propose the Cross-platform Tester Interface Software (CTIS) framework, which can be integrated and operated on heterogeneous equipment and OSs.

Performance Analysis of Flash Translation Layer Algorithms for Windows-based Flash Memory Storage Device (윈도우즈 기반 플래시 메모리의 플래시 변환 계층 알고리즘 성능 분석)

  • Park, Won-Joo;Park, Sung-Hwan;Park, Sang-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.4
    • /
    • pp.213-225
    • /
    • 2007
  • Flash memory is widely used as a storage device for potable equipments such as digital cameras, MP3 players and cellular phones because of its characteristics such as its large volume and nonvolatile feature, low power consumption, and good performance. However, a block in flash memories should be erased to write because of its hardware characteristic which is called as erase-before-write architecture. The erase operation is much slower than read or write operations. FTL is used to overcome this problem. We compared the performance of the existing FTL algorithms on Windows-based OS. We have developed a tool called FTL APAT in order to gather I/O patterns of the disk and analyze the performance of the FTL algorithms. It is the log buffer scheme with full associative sector translation(FAST) that the performance is best.

An Attention Method-based Deep Learning Encoder for the Sentiment Classification of Documents (문서의 감정 분류를 위한 주목 방법 기반의 딥러닝 인코더)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.4
    • /
    • pp.268-273
    • /
    • 2017
  • Recently, deep learning encoder-based approach has been actively applied in the field of sentiment classification. However, Long Short-Term Memory network deep learning encoder, the commonly used architecture, lacks the quality of vector representation when the length of the documents is prolonged. In this study, for effective classification of the sentiment documents, we suggest the use of attention method-based deep learning encoder that generates document vector representation by weighted sum of the outputs of Long Short-Term Memory network based on importance. In addition, we propose methods to modify the attention method-based deep learning encoder to suit the sentiment classification field, which consist of a part that is to applied to window attention method and an attention weight adjustment part. In the window attention method part, the weights are obtained in the window units to effectively recognize feeling features that consist of more than one word. In the attention weight adjustment part, the learned weights are smoothened. Experimental results revealed that the performance of the proposed method outperformed Long Short-Term Memory network encoder, showing 89.67% in accuracy criteria.

Modeling of Virtual Switch in Cloud System (클라우드 시스템의 가상 스위치 모델링)

  • Ro, Cheul-Woo
    • Journal of Digital Convergence
    • /
    • v.11 no.12
    • /
    • pp.479-485
    • /
    • 2013
  • Virtualization is a promising approach to consolidating multiple online services onto a smaller number of computing resources. A virtualized server environment allows computing resources to be shared among multiple performance isolated platforms called virtual machines. Through server virtualization software, applications servers are encapsulated into VMs, and deployed with APIs on top generalized pools of CPU and memory resources. Networking and security have been moved to a software abstraction layer that transformed computing, network virtualization. And it paves the way for enterprise to rapidly deploy networking and security for any application by creating the virtual network. Stochastic reward net (SRN) is an extension of stochastic Petri nets which provides compact modeling facilities for system analysis. In this paper, we develop SRN model of network virtualization based on virtual switch. Measures of interest such as switching delay and throughput are considered. These measures are expressed in terms of the expected values of reward rate functions for SRNs. Numerical results are obtained according to the virtual switch capacity and number of active VMs.

A Fault-tolerant Inertial Navigation System for UAVs Based on Partition Computing (파티션 컴퓨팅 기반의 무인기 고장 감내 관성 항법 시스템)

  • Jung, Byeongyong;Kim, Jungguk
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.1
    • /
    • pp.29-39
    • /
    • 2015
  • When new inertial navigation systems for an unmanned aerial vehicles are being developed and tested, construction of a fault-tolerant system is required because of various types of hazards caused by S/W and H/W faults. In this paper, a new fault-tolerant flight system that can be deployed into one or more FCCs (Flight Control Computers) is introduced, based on a partition scheme wherein each OFP (Operational Flight Program) partition uses an independent CPU and memory slot. The new fault-tolerant navigation system utilizes one or two FCCs, and executes a primary navigation OFP under development and a stable shadow OFP partition on each node. The fault-tolerant navigation system based on a single FCC can be used for UAVs with small payloads. For larger UAVs, an additional FCC with two OFP partitions can be used to provide both H/W and S/W fault-tolerance. The developed fault-tolerant navigation system significantly removes various hazards in testing new navigation S/Ws for UAVs.