• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.029 seconds

Performance Evaluation and Optimization of Dual-Port SDRAM Architecture for Mobile Embedded Systems (모바일 내장형 시스템을 위한 듀얼-포트SDRAM의 성능 평가 및 최적화)

  • Yang, Hoe-Seok;Kim, Sung-Chan;Park, Hae-Woo;Kim, Jin-Woo;Ha, Soon-Hoi
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.5
    • /
    • pp.542-546
    • /
    • 2008
  • Recently dual-port SDRAM (DPSDRAM) architecture tailored for dual-processor based mobile embedded systems has been announced where a single memory chip plays the role of the local memories and the shared memory for both processors. In order to maintain memory consistency from simultaneous accesses of both ports, every access to the shared memory should be protected by a synchronization mechanism, which can result in substantial access latency. We propose two optimization techniques by exploiting the communication patterns of target applications: lock-priority scheme and static-copy scheme. Further, by dividing the shared bank into multiple blocks, we allow simultaneous accesses to different blocks thus achieve considerable performance gain. Experiments on a virtual prototyping system show a promising result - we could achieve about 20-50% performance gain compared to the base DPSDRAM architecture.

Efficient Prefetching and Asynchronous Writing for Flash Memory (플래시 메모리를 위한 효율적인 선반입과 비동기 쓰기 기법)

  • Park, Kwang-Hee;Kim, Deok-Hwan
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.2
    • /
    • pp.77-88
    • /
    • 2009
  • According to the size of NAND flash memory as the storage system of mobile device becomes large, the performance of address translation and life cycle management in FTL (Flash Translation Layer) to interact with file system becomes very important. In this paper, we propose the continuity counters, which represent the number of continuous physical blocks whose logical addresses are consecutive, to reduce the number of address translation. Furthermore we propose the prefetching method which preloads frequently accessed pages into main memory to enhance I/O performance of flash memory. Besides, we use the 2-bit write prediction and asynchronous writing method to predict addresses repeatedly referenced from host and prevent from writing overhead. The experiments show that the proposed method improves the I/O performance and extends the life cycle of flash memory. As a result, proposed CFTL (Clustered Flash Translation Layer)'s performance of address translation is faster 20% than conventional FTLs. Furthermore, CFTL is reduced about 50% writing time than that of conventional FTLs.

Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing

  • Huangfu, Yijie;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.2
    • /
    • pp.69-77
    • /
    • 2017
  • Recent GPUs have adopted cache memory to benefit general-purpose GPU (GPGPU) programs. However, unlike CPU programs, GPGPU programs typically have considerably less temporal/spatial locality. Moreover, the L1 data cache is used by many threads that access a data size typically considerably larger than the L1 cache, making it critical to bypass L1 data cache intelligently to enhance GPU cache performance. In this paper, we examine GPU cache access behavior and propose a simple hardware-based GPU cache bypassing method that can be applied to GPU applications without recompiling programs. Moreover, we introduce a hybrid method that integrates static profiling information and hardware-based bypassing to further enhance performance. Our experimental results reveal that hardware-based cache bypassing can boost performance for most benchmarks, and the hybrid method can achieve performance comparable to state-of-the-art compiler-based bypassing with considerably less profiling cost.

Stent modeling and simulation of truss structure using SMA (형상기억합금 트러스 구조물을 이용한 스텐트의 설계 및 해석)

  • Yang, Seong-Pil;Kim, Sang-Haun;Cho, Mang-Hyo
    • Proceedings of the KSME Conference
    • /
    • 2008.11a
    • /
    • pp.518-522
    • /
    • 2008
  • Recently, many patients related to heart disease have surgical operation by expanding a blood vessel to treat the angiostenosis. So far most angioplasties have been performed using balloon-dilative stent made of stainless steel. Some researchers are studying the stent made of shape memory alloy (SMA) to operate the angioplasty more easily. and there are several papers which introduce the angioplasty using SMA. However, most of the analysis models for stents are constructed using solid elements. So much computing time is required to solve the analysis model. In this study, we suggest the SMA stent model using 1D truss element which is much faster than stent model using 3D solid element. To represent non-linear behavior of SMA, we apply 1D SMA constitutive equation of Lagoudas'. Pseudo-elastic behavior of stent structures is presented as a numerical example.

  • PDF

A Study on Optimal Memory Configuration and the Number of Channels for In-Memory Computing (인메모리 컴퓨팅을 위한 최적의 메모리 구성 및 채널 개수에 대한 연구)

  • Kim, Bong-jeong;Kim, Young-Kyu;Moon, Byungin
    • Annual Conference of KIPS
    • /
    • 2012.11a
    • /
    • pp.268-270
    • /
    • 2012
  • DRAM 가격의 하락으로 인메모리 컴퓨팅에 대한 연구 및 개발이 다시 활발해지고 있으나 효율적인 메모리 시스템 구성을 위한 연구는 아직 부족한 실정이다. 이에 본 논문은 64 비트 멀티프로세서와 대용량의 메모리로 구성되는 인메모리 컴퓨팅 시스템을 모델링하고, 메모리 크기 및 채널 개수에 따른 시스템의 성능을 시뮬레이션 하였다. 그리고 처리된 트랜잭션의 수를 성능평가의 기준으로 하여 메모리의 크기와 채널 개수에 따른 비용을 고려한 최적의 인메모리 컴퓨팅 메모리 시스템 구조를 제안하였다.

Sentiment Orientation Using Deep Learning Sequential and Bidirectional Models

  • Alyamani, Hasan J.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.11
    • /
    • pp.23-30
    • /
    • 2021
  • Sentiment Analysis has become very important field of research because posting of reviews is becoming a trend. Supervised, unsupervised and semi supervised machine learning methods done lot of work to mine this data. Feature engineering is complex and technical part of machine learning. Deep learning is a new trend, where this laborious work can be done automatically. Many researchers have done many works on Deep learning Convolutional Neural Network (CNN) and Long Shor Term Memory (LSTM) Neural Network. These requires high processing speed and memory. Here author suggested two models simple & bidirectional deep leaning, which can work on text data with normal processing speed. At end both models are compared and found bidirectional model is best, because simple model achieve 50% accuracy and bidirectional deep learning model achieve 99% accuracy on trained data while 78% accuracy on test data. But this is based on 10-epochs and 40-batch size. This accuracy can also be increased by making different attempts on epochs and batch size.

Parallel Integration for Real-Time Simulation (실시간 시뮬레이션을 위한 병렬적분)

  • Lee, W.S.;Samson, J.
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.2 no.1
    • /
    • pp.106-115
    • /
    • 1994
  • A parallel integration approach is proposed for real-time simulation of controlled mechanical systems. The proposed approach, which employs the dual-rate integration method in a parallel computing environment, is developed to deal with stiffness and high frequency characteristics of the controlled mechanical systems effectively. Numerical experiments are performed to demonstrate the effectiveness of the approach in shared memory multiprocessors, Alliant FX/8 and Alliant FX/80.

  • PDF

Confidence Interval Estimation Using SV in LS-SVM

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.451-459
    • /
    • 2003
  • The present paper suggests a method to estimate confidence interval using SV(Support Vector) in LS-SVM(Least-Squares Support Vector Machine). To get the proposed method we used the fact that the values of the hessian matrix obtained by full data set and SV are not different significantly. Since the suggested method implement only SV, a part of full data, we can save computing time and memory space. Through simulation study we justified the proposed method.

  • PDF

Computation of Circuit Parameters of Multiconductor Transmission Lines with Arbitrary Cross Section (임의 단말을 가진 전송선의 회로정수 산출)

  • 김종민;김종해;하상욱;라극환
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.12
    • /
    • pp.1918-1925
    • /
    • 1989
  • A method for computing the capacitance and inductance matrix for 2-D multiconductor transmission lines with arbitrary cross section in dielectric medium is presented. The integral equation is obtained by using a free space Green function in conjunction with free and bound charges existing on boundary surfaces. The numerical analysis is based on the moment method using point matching and Galerkin method. And kthe scheme to reduce memory and computation time is presented for symmetric structure.

  • PDF

A Study on Effect of Code Distribution and Data Replication for Multicore Computing Architectures

  • Cho, Doosan
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.282-287
    • /
    • 2021
  • A multicore system must be able to take full advantage of the program's instruction and data parallelism. This study introduces the data replication technique as a support technique to maximize the program's instruction and data parallelism. Instruction level parallelism can be limited by data dependency. In this case, if data is replicated to each processor core and used, instruction level parallelism can be used to the maximum. The technique proposed in this study can maximize the performance improvement effect when applied to scientific applications such as matrix multiplication operation.