• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.028 seconds

Extracting optimal moving patterns of edge devices for efficient resource placement in an FEC environment (FEC 환경에서 효율적 자원 배치를 위한 엣지 디바이스의 최적 이동패턴 추출)

  • Lee, YonSik;Nam, KwangWoo;Jang, MinSeok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.1
    • /
    • pp.162-169
    • /
    • 2022
  • In a dynamically changing time-varying network environment, the optimal moving pattern of edge devices can be applied to distributing computing resources to edge cloud servers or deploying new edge servers in the FEC(Fog/Edge Computing) environment. In addition, this can be used to build an environment capable of efficient computation offloading to alleviate latency problems, which are disadvantages of cloud computing. This paper proposes an algorithm to extract the optimal moving pattern by analyzing the moving path of multiple edge devices requiring application services in an arbitrary spatio-temporal environment based on frequency. A comparative experiment with A* and Dijkstra algorithms shows that the proposed algorithm uses a relatively fast execution time and less memory, and extracts a more accurate optimal path. Furthermore, it was deduced from the comparison result with the A* algorithm that applying weights (preference, congestion, etc.) simultaneously with frequency can increase path extraction accuracy.

Enhancing the Performance of Multiple Parallel Applications using Heterogeneous Memory on the Intel's Next-Generation Many-core Processor (인텔 차세대 매니코어 프로세서에서의 다중 병렬 프로그램 성능 향상기법 연구)

  • Rho, Seungwoo;Kim, Seoyoung;Nam, Dukyun;Park, Geunchul;Kim, Jik-Soo
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.878-886
    • /
    • 2017
  • This paper discusses performance bottlenecks that may occur when executing high-performance computing MPI applications in the Intel's next generation many-core processor called Knights Landing(KNL), as well as effective resource allocation techniques to solve this problem. KNL is composed of a host processor to enable self-booting in addition to an existing accelerator consisting of a many-core processor, and it was released with a new type of on-package memory with improved bandwidth on top of existing DDR4 based memory. We empirically verified an improvement of the execution performance of multiple MPI applications and the overall system utilization ratio by studying a resource allocation method optimized for such new many-core processor architectures.

WWCLOCK: Page Replacement Algorithm Considering Asymmetric I/O Cost of Flash Memory (WWCLOCK: 플래시 메모리의 비대칭적 입출력 비용을 고려한 페이지 교체 알고리즘)

  • Park, Jun-Seok;Lee, Eun-Ji;Seo, Hyun-Min;Koh, Kern
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.913-917
    • /
    • 2009
  • Flash memories have asymmetric I/O costs for read and write in terms of latency and energy consumption. However, the ratio of these costs is dependent on the type of storage. Moreover, it is becoming more common to use two flash memories on a system as an internal memory and an external memory card. For this reason, buffer cache replacement algorithms should consider I/O costs of device as well as possibility of reference. This paper presents WWCLOCK(Write-Weighted CLOCK) algorithm which directly uses I/O costs of devices along with recency and frequency of cache blocks to selecting a victim to evict from the buffer cache. WWCLOCK can be used for wide range of storage devices with different I/O cost and for systems that are using two or more memory devices at the same time. In addition to this, it has low time and space complexity comparable to CLOCK algorithm. Trace-driven simulations show that the proposed algorithm reduces the total I/O time compared with LRU by 36.2% on average.

Improving the Read Performance of OneNAND Flash Memory using Virtual I/O Segment (가상 I/O 세그먼트를 이용한 OneNAND 플래시 메모리의 읽기 성능 향상 기법)

  • Hyun, Seung-Hwan;Koh, Kern
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.7
    • /
    • pp.636-645
    • /
    • 2008
  • OneNAND flash is a high-performance hybrid flash memory that combines the advantages of both NAND flash and NOR flash. OneNAND flash has not only all virtues of NAND flash but also greatly enhanced read performance which is considered as a downside of NAND flash. As a result, it is widely used in mobile applications such as mobile phones, digital cameras, PMP, and portable game players. However, most of the general purpose operating systems, such as Linux, can not exploit the read performance of OneNAND flash because of the restrictions imposed by their virtual memory system and block I/O architecture. In order to solve that problem, we suggest a new approach called virtual I/O segment. By using virtual I/O segment, the superior read performance of OneNAND flash can be exploited without modifying the existing block I/O architecture and MTD subsystem. Experiments by implementations show that this approach can reduce read latency of OneNAND flash as much as 54%.

Processing large-scale data with Apache Spark (Apache Spark를 활용한 대용량 데이터의 처리)

  • Ko, Seyoon;Won, Joong-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1077-1094
    • /
    • 2016
  • Apache Spark is a fast and general-purpose cluster computing package. It provides a new abstraction named resilient distributed dataset, which is capable of support for fault tolerance while keeping data in memory. This type of abstraction results in a significant speedup compared to legacy large-scale data framework, MapReduce. In particular, Spark framework is suitable for iterative machine learning applications such as logistic regression and K-means clustering, and interactive data querying. Spark also supports high level libraries for various applications such as machine learning, streaming data processing, database querying and graph data mining thanks to its versatility. In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications. We also review the machine learning package MLlib, and the R language interface SparkR.

Optimization of Parallel Code for Noise Prediction in an Axial Fan Using MPI One-Sided Communication (MPI 일방향통신을 이용한 축류 팬 주위 소음해석 병렬프로그램 최적화)

  • Kwon, Oh-Kyoung;Park, Keuntae;Choi, Haecheon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.3
    • /
    • pp.67-72
    • /
    • 2018
  • Recently, noise reduction in an axial fan producing the small pressure rise and large flow rate, which is one type of turbomachine, is recognized as essential. This study describes the design and optimization techniques of MPI parallel program to simulate the flow-induced noise in the axial fan. In order to simulate the code using 100 million number of grids for flow and 70,000 points for noise sources, we parallelize it using the 2D domain decomposition. However, when it is involved many computing cores, it is getting slower because of MPI communication overhead among nodes, especially for the noise simulation. Thus, it is adopted the one-sided communication to reduce the overhead of MPI communication. Moreover, the allocated memory and communication between cores are optimized, thereby improving 2.97x compared to the original one. Finally, it is achieved 12x and 6x faster using 6,144 and 128 computing cores of KISTI Tachyon2 than using 256 and 16 computing cores for the flow and noise simulations, respectively.

Dynamic Clustering based Optimization Technique and Quality Assessment Model of Mobile Cloud Computing (동적 클러스터링 기반 모바일 클라우드 컴퓨팅의 최적화 기법 및 품질 평가 모델)

  • Kim, Dae Young;La, Hyun Jung;Kim, Soo Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.6
    • /
    • pp.383-394
    • /
    • 2013
  • As a way of augmenting constrained resources of mobile devices such as CPU and memory, many works on mobile cloud computing (MCC), where mobile devices utilize remote resources of cloud services or PCs, have been proposed. Typically, in MCC, many nodes with different operating systems and platform and diverse mobile applications or services are located, and a central manager autonomously performs several management tasks to maintain a consistent level of MCC overall quality. However, as there are a larger number of nodes, mobile applications, and services subscribed by the mobile applications and their interactions are extremely increased, a traditional management method of MCC reveals a fundamental problem of degrading its overall performance due to overloaded management tasks to the central manager, i.e. a bottle neck phenomenon. Therefore, in this paper, we propose a clustering-based optimization method to solve performance-related problems on large-scaled MCC and to stabilize its overall quality. With our proposed method, we can ensure to minimize the management overloads and stabilize the quality of MCC in an active and autonomous way.

Relation Based Bayesian Network for NBNN

  • Sun, Mingyang;Lee, YoonSeok;Yoon, Sung-eui
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.4
    • /
    • pp.204-213
    • /
    • 2015
  • Under the conditional independence assumption among local features, the Naive Bayes Nearest Neighbor (NBNN) classifier has been recently proposed and performs classification without any training or quantization phases. While the original NBNN shows high classification accuracy without adopting an explicit training phase, the conditional independence among local features is against the compositionality of objects indicating that different, but related parts of an object appear together. As a result, the assumption of the conditional independence weakens the accuracy of classification techniques based on NBNN. In this work, we look into this issue, and propose a novel Bayesian network for an NBNN based classification to consider the conditional dependence among features. To achieve our goal, we extract a high-level feature and its corresponding, multiple low-level features for each image patch. We then represent them based on a simple, two-level layered Bayesian network, and design its classification function considering our Bayesian network. To achieve low memory requirement and fast query-time performance, we further optimize our representation and classification function, named relation-based Bayesian network, by considering and representing the relationship between a high-level feature and its low-level features into a compact relation vector, whose dimensionality is the same as the number of low-level features, e.g., four elements in our tests. We have demonstrated the benefits of our method over the original NBNN and its recent improvement, and local NBNN in two different benchmarks. Our method shows improved accuracy, up to 27% against the tested methods. This high accuracy is mainly due to consideration of the conditional dependences between high-level and its corresponding low-level features.

Garbage Collection Method using Proxy Block considering Index Data Structure based on Flash Memory (플래시 메모리 기반 인덱스 구조에서 대리블록 이용한 가비지 컬렉션 기법)

  • Kim, Seon Hwan;Kwak, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.6
    • /
    • pp.1-11
    • /
    • 2015
  • Recently, NAND flash memories are used for storage devices because of fast access speed and low-power. However, applications of FTL on low power computing devices lead to heavy workloads which result in a memory requirement and an implementation overhead. Consequently, studies of B+-Tree on embedded devices without the FTL have been proposed. The studies of B+-Tree are optimized for performance of inserting and updating records, considering to disadvantages of the NAND flash memory that it can not support in-place update. However, if a general garbage collection method is applied to the previous studies of B+-Tree, a performance of the B+-Tree is reduced, because it generates a rearrangement of the B+-Tree by changing of page positions on the NAND flash memory. Therefor, we propose a novel garbage collection method which can apply to the B+-Tree based on the NAND flash memory without the FTL. The proposed garbage collection method does not generate a rearrangement of the B+-Tree by using a block information table and a proxy block. We implemented the B+-Tree and ${\mu}$-Tree with the proposed garbage collection on physical devices with the NAND flash memory. In experiment results, the proposed garbage collection scheme compared to greedy algorithm garbage collection scheme increased the number of inserted keys by up to about 73% on B+-Tree and decreased elapsed time of garbage collection by up to about 39% on ${\mu}$-Tree.

An Efficient Data Block Replacement and Rearrangement Technique for Hybrid Hard Disk Drive (하이브리드 하드디스크를 위한 효율적인 데이터 블록 교체 및 재배치 기법)

  • Park, Kwang-Hee;Lee, Geun-Hyung;Kim, Deok-Hwan
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.1
    • /
    • pp.1-10
    • /
    • 2010
  • Recently heterogeneous storage system such as hybrid hard disk drive (H-HDD) combining flash memory and magnetic disk is launched, according as the read performance of NAND flash memory is enhanced as similar to that of hard disk drive (HDD) and the power consumption of NAND flash memory is reduced less than that of HDD. However, the read and write operations of NAND flash memory are slower than those of rotational disk. Besides, serious overheads are incurred on CPU and main memory in the case that intensive write requests to flash memory are repeatedly occurred. In this paper, we propose the Least Frequently Used-Hot scheme that replaces the data blocks whose reference frequency of read operation is low and update frequency of write operation is high, and the data flushing scheme that rearranges the data blocks into the multi-zone of the rotation disk. Experimental results show that the execution time of the proposed method is 38% faster than those of conventional LRU and LFU block replacement schemes in I/O performance aspect and the proposed method increases the life span of Non-Volatile Cache 40% higher than those of conventional LRU, LFU, FIFO block replacement schemes.