• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.027 seconds

An Algorithm For Approximating The Performance of Multi-mode Network System Using Algebraic Property of System States (시스템 상태의 대수적 성질을 이용한 다중모드 네트워크 시스템 성능 근사계산 알고리즘)

  • Oh, Dae-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.12
    • /
    • pp.127-137
    • /
    • 2009
  • A practical algorithm of generating most probable states in decreasing order of probability of the network system state is suggested for approximating the performance of multi-mode network system using algebraic structure of the system states. Most complex system having network structure with multi-mode unit state is difficult to evaluate the performance or reliability due to exponentially increasing size of state space. Hence not an exact computing method but an approximated one is reasonable approach to solve the problem. To achieve the goal we should enumerate the network system states in order as a pre-processing step. In this paper, we suggest an improved algorithm of generating most probable multi-mode states to get the ordered system states efficiently. The method is compared with the previous algorithms in respective to memory requirement and empirical computing time. From the experiment proposed method has some advantages with regard to the criterion of algorithm performance. We investigate the advantages and disadvantage by illustrating experiment examples.

Performance Enhancement and Evaluation of AES Cryptography using OpenCL on Embedded GPGPU (OpenCL을 이용한 임베디드 GPGPU환경에서의 AES 암호화 성능 개선과 평가)

  • Lee, Minhak;Kang, Woochul
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.303-309
    • /
    • 2016
  • Recently, an increasing number of embedded processors such as ARM Mali begin to support GPGPU programming frameworks, such as OpenCL. Thus, GPGPU technologies that have been used in PC and server environments are beginning to be applied to the embedded systems. However, many embedded systems have different architectural characteristics compare to traditional PCs and low-power consumption and real-time performance are also important performance metrics in these systems. In this paper, we implement a parallel AES cryptographic algorithm for a modern embedded GPU using OpenCL, a standard parallel computing framework, and compare performance against various baselines. Experimental results show that the parallel GPU AES implementation can reduce the response time by about 1/150 and the energy consumption by approximately 1/290 compare to OpenMP implementation when 1000KB input data is applied. Furthermore, an additional 100 % performance improvement of the parallel AES algorithm was achieved by exploiting the characteristics of embedded GPUs such as removing copying data between GPU and host memory. Our results also demonstrate that higher performance improvement can be achieved with larger size of input data.

Design & Implementation of the RMMC and Global Time based on the RT-eCos 3.0 (RT-eCos 3.0 기반의 RMMC 및 글로벌 타임 설계 및 구현)

  • Han, Seoung-Yeon;Kim, Jung-Guk
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.759-767
    • /
    • 2010
  • RT-eCos 3.0 is a micro-sized embedded real-time kernel that has been developed based on the open source eCos 3.0 to support the basic task model of the well-known distributed real-time object model, TMO(Time-Triggered Message-triggered Object). In this paper, the design and implementation techniques of the RMMC(Real-time Multicast & Memory replication Channel) that is a standard distributed IPC model of the TMO is described based on the RT-eCos 3.0. And the support technique of the global time for using the same time in a distributed environment using the RMMC is also described. The developed global time based RMMC supports highly abstracted distributed IPC environment in a wide area distributed computing environment with the RT-eCos 3.0.

Dynamic Bandwidth Distribution Method for High Performance Non-volatile Memory in Cloud Computing Environment (클라우드 환경에서 고성능 저장장치를 위한 동적 대역폭 분배 기법)

  • Kwon, Piljin;Ahn, Sungyong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.3
    • /
    • pp.97-103
    • /
    • 2020
  • Linux Cgroups takes a fundamental role for sharing system resources among multiple containers on container-based cloud computing environment. Especially for I/O resource, Linux Cgroups supports a mechanism for sharing I/O bandwidth in proportion to I/O weight. However, the current mechanism of Linux Cgroups using BFQ I/O scheduler seriously degrades the I/O performance with high bandwidth storage device such as NVMe SSDs. In this paper, we proposed a new feedback based I/O bandwidth sharing scheme for Linux Cgroups which allocates I/O credits to containers according to I/O weights and adjusts the amount of credits to performance fluctuation of NVMe SSDs. The proposed scheme is implemented on Linux kernel 5.3 and evaluated. The evaluation results show that it can share the I/O bandwidth among multiple containers proportionally to I/O weights while improving I/O performance more than twice as high as the existing scheme.

Light-weight Signal Processing Method for Detection of Moving Object based on Magnetometer Applications (이동 물체 탐지를 위한 자기센서 응용 신호처리 기법)

  • Kim, Ki-Taae;Kwak, Chul-Hyun;Hong, Sang-Gi;Park, Sang-Jun;Kim, Keon-Wook
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.6
    • /
    • pp.153-162
    • /
    • 2009
  • This paper suggests the novel light-weight signal processing algorithm for wireless sensor network applications which needs low computing complexity and power consumption. Exponential average method (EA) is utilized by real time, to process the magnetometer signal which is analyzed to understand the own physical characteristic in time domain. EA provides the robustness about noise, magnetic drift by temperature and interference, furthermore, causes low memory consumption and computing complexity for embedded processor. Hence, optimal parameter of proposal algorithm is extracted by statistical analysis. Using general and precision magnetometer, detection probability over 90% is obtained which restricted by 5% false alarm rate in simulation and using own developed magnetometer H/W, detection probability over 60~70% is obtained under 1~5% false alarm rate in simulation and experiment.

Design of visitor counting system using edge computing method

  • Kim, Jung-Jun;Kim, Min-Gyu;Kim, Ju-Hyun;Lee, Man-Gi;Kim, Da-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.7
    • /
    • pp.75-82
    • /
    • 2022
  • There are various exhibition halls, shopping malls, theme parks around us and analysis of interest in exhibits or contents is mainly done through questionnaires. These questionnaires are mainly depend on the subjective memory of the person being investigated, resulting in incorrect statistical results. Therefore, it is possible to identify an exhibition space with low interest by tracking the movement and counting the number of visitors. Based on this, it can be used as quantitative data for exhibits that need replacement. In this paper, we use deep learning-based artificial intelligence algorithms to recognize visitors, assign IDs to the recognized visitors, and continuously track them to identify the movement path. When visitors pass the counting line, the system is designed to count the number and transmit data to the server for integrated management.

Hypernetwork Memory-Based Model for Infant's Language Learning (유아 언어학습에 대한 하이퍼망 메모리 기반 모델)

  • Lee, Ji-Hoon;Lee, Eun-Seok;Zhang, Byoung-Tak
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.983-987
    • /
    • 2009
  • One of the critical themes in the language acquisition is its exposure to linguistic environments. Linguistic environments, which interact with infants, include not only human beings such as its parents but also artificially crafted linguistic media as their functioning elements. An infant learns a language by exploring these extensive language environments around it. Based on such large linguistic data exposure, we propose a machine learning based method on the cognitive mechanism that simulate flexibly and appropriately infant's language learning. The infant's initial stage of language learning comes with sentence learning and creation, which can be simulated by exposing it to a language corpus. The core of the simulation is a memory-based learning model which has language hypernetwork structure. The language hypernetwork simulates developmental and progressive language learning using the structure of new data stream through making it representing of high level connection between language components possible. In this paper, we simulates an infant's gradual and developmental learning progress by training language hypernetwork gradually using 32,744 sentences extracted from video scripts of commercial animation movies for children.

Apply Locally Weight Parameter Elimination for CNN Model Compression (지역적 가중치 파라미터 제거를 적용한 CNN 모델 압축)

  • Lim, Su-chang;Kim, Do-yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.9
    • /
    • pp.1165-1171
    • /
    • 2018
  • CNN requires a large amount of computation and memory in the process of extracting the feature of the object. Also, It is trained from the network that the user has configured, and because the structure of the network is fixed, it can not be modified during training and it is also difficult to use it in a mobile device with low computing power. To solve these problems, we apply a pruning method to the pre-trained weight file to reduce computation and memory requirements. This method consists of three steps. First, all the weights of the pre-trained network file are retrieved for each layer. Second, take an absolute value for the weight of each layer and obtain the average. After setting the average to a threshold, remove the weight below the threshold. Finally, the network file applied the pruning method is re-trained. We experimented with LeNet-5 and AlexNet, achieved 31x on LeNet-5 and 12x on AlexNet.

A Performance Evaluation of Parallel Color Conversion based on the Thread Number on Multi-core Systems (멀티코어 시스템에서 쓰레드 수에 따른 병렬 색변환 성능 검증)

  • Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.9 no.4
    • /
    • pp.73-76
    • /
    • 2014
  • With the increasing popularity of multi-core processors, they have been adopted even in embedded systems. Under this circumstance many multimedia applications can be parallelized on multi-core platforms because they usually require heavy computations and extensive memory accesses. This paper proposes an efficient thread-level parallel implementation for color space conversion on multi-core CPU. Thread-level parallelism has been becoming very useful parallel processing paradigm especially on shared memory computing systems. In this work, it is exploited by allocating different input pixels to each thread for concurrent loop executions. For the performance evaluation, this paper evaluate the performace improvements for color conversion on multi-core processors based on the processing speed comparison between its serial implementation and parallel ones. The results shows that thread-level parallel implementations show the overall similar ratios of performance improvements regardless of different multi-cores.

Quad Tree Based 2D Smoke Super-resolution with CNN (CNN을 이용한 Quad Tree 기반 2D Smoke Super-resolution)

  • Hong, Byeongsun;Park, Jihyeok;Choi, Myungjin;Kim, Changhun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.3
    • /
    • pp.105-113
    • /
    • 2019
  • Physically-based fluid simulation takes a lot of time for high resolution. To solve this problem, there are studies that make up the limitation of low resolution fluid simulation by using deep running. Among them, Super-resolution, which converts low-resolution simulation data to high resolution is under way. However, traditional techniques require to the entire space where there are no density data, so there are problems that are inefficient in terms of the full simulation speed and that cannot be computed with the lack of GPU memory as input resolution increases. In this paper, we propose a new method that divides and classifies 2D smoke simulation data into the space using the quad tree, one of the spatial partitioning methods, and performs Super-resolution only required space. This technique accelerates the simulation speed by computing only necessary space. It also processes the divided input data, which can solve GPU memory problems.