• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.027 seconds

Implementation of a TCP/IP Offload Engine Using High Performance Lightweight TCP/IP (고성능 경량 TCP/IP를 이용한 소프트웨어 기반 TCP/IP 오프로드 엔진 구현)

  • Jun, Yong-Tae;Chung, Sang-Hwa;Yoon, In-Su
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.4
    • /
    • pp.369-377
    • /
    • 2008
  • Today, Ethernet technology is rapidly developing to have a bandwidth of 10Gbps beyond 1Gbps. In such high-speed networks, the existing method that host CPU processes TCP/IP in the operating system causes numerous overheads. As a result of the overheads, user applications cannot get the enough computing power from the host CPU. To solve this problem, the TCP/IP Offload Engine(TOE) technology was emerged. TOE is a specialized NIC which processes the TCP/IP instead of the host CPU. In this paper, we implemented a high-performance, lightweight TCP/IP(HL-TCP) for the TOE and applied it to an embedded system. The HL-TCP supports existing fundamental TCP/IP functions; flow control, congestion control, retransmission, delayed ACK, processing out-of-order packets. And it was implemented to utilize Ethernet MAC's hardware features such as TCP segmentation offload(TSO), checksum offload(CSO) and interrupt coalescing. Also we eliminated the copy overhead from the host memory to the NIC memory when sending data and we implemented an efficient DMA mechanism for the TCP retransmission. The TOE using the HL-TCP has the CPU utilization of less than 6% and the bandwidth of 453Mbps.

GPU Resource Contention Management Technique for Simultaneous GPU Tasks in the Container Environments with Share the GPU (GPU를 공유하는 컨테이너 환경에서 GPU 작업의 동시 실행을 위한 GPU 자원 경쟁 관리기법)

  • Kang, Jihun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.10
    • /
    • pp.333-344
    • /
    • 2022
  • In a container-based cloud environment, multiple containers can share a graphical processing unit (GPU), and GPU sharing can minimize idle time of GPU resources and improve resource utilization. However, in a cloud environment, GPUs, unlike CPU or memory, cannot logically multiplex computing resources to provide users with some of the resources in an isolated form. In addition, containers occupy GPU resources only when performing GPU operations, and resource usage is also unknown because the timing or size of each container's GPU operations is not known in advance. Containers unrestricted use of GPU resources at any given point in time makes managing resource contention very difficult owing to where multiple containers run GPU tasks simultaneously, and GPU tasks are handled in black box form inside the GPU. In this paper, we propose a container management technique to prevent performance degradation caused by resource competition when multiple containers execute GPU tasks simultaneously. Also, this paper demonstrates the efficiency of container management techniques that analyze and propose the problem of degradation due to resource competition when multiple containers execute GPU tasks simultaneously through experiments.

A parallel tasks Scheduling heuristic in the Cloud with multiple attributes

  • Wang, Qin;Hou, Rongtao;Hao, Yongsheng;Wang, Yin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.287-307
    • /
    • 2018
  • There are two targets to schedule parallel jobs in the Cloud: (1) scheduling the jobs as many as possible, and (2) reducing the average execution time of the jobs. Most of previous work mainly focuses on the computing speed of resources without considering other attributes, such as bandwidth, memory and so on. Especially, past work does not consider the supply-demand condition from those attributes. Resources have different attributes, considering those attributes together makes the scheduling problem more difficult. This is the problem that we try to solve in this paper. First of all, we propose a new parallel job scheduling method based on a classification method of resources from different attributes, and then a scheduling method-CPLMT (Cloud parallel scheduling based on the lists of multiple attributes) is proposed for the parallel tasks. The classification method categories resources into different kinds according to the number of resources that satisfy the job from different attributes of the resource, such as the speed of the resource, memory and so on. Different kinds have different priorities in the scheduling. For the job that belongs to the same kinds, we propose CPLMT to schedule those jobs. Comparisons between our method, FIFO (First in first out), ASJS (Adaptive Scoring Job Scheduling), Fair and CMMS (Cloud-Minmin) are executed under different environments. The simulation results show that our proposed CPLMT not only reduces the number of unfinished jobs, but also reduces the average execution time.

S-PARAFAC: Distributed Tensor Decomposition using Apache Spark (S-PARAFAC: 아파치 스파크를 이용한 분산 텐서 분해)

  • Yang, Hye-Kyung;Yong, Hwan-Seung
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.280-287
    • /
    • 2018
  • Recently, the use of a recommendation system and tensor data analysis, which has high-dimensional data, is increasing, as they allow us to analyze the tensor and extract potential elements and patterns. However, due to the large size and complexity of the tensor, it needs to be decomposed in order to analyze the tensor data. While several tools are used for tensor decomposition such as rTensor, pyTensor, and MATLAB, since such tools run on a single machine, they are unable to handle large data. Also, while distributed tensor decomposition tools based on Hadoop can handle a scalable tensor, its computing speed is too slow. In this paper, we propose S-PARAFAC, which is a tensor decomposition tool based on Apache Spark, in distributed in-memory environments. We converted the PARAFAC algorithm into an Apache Spark version that enables rapid processing of tensor data. We also compared the performance of the Hadoop based tensor tool and S-PARAFAC. The result showed that S-PARAFAC is approximately 4~25 times faster than the Hadoop based tensor tool.

A Comparison of Performance between STMP/MST and Existing Spatio-Temporal Moving Pattern Mining Methods (STMP/MST와 기존의 시공간 이동 패턴 탐사 기법들과의 성능 비교)

  • Lee, Yon-Sik;Kim, Eun-A
    • Journal of Internet Computing and Services
    • /
    • v.10 no.5
    • /
    • pp.49-63
    • /
    • 2009
  • The performance of spatio-temporal moving pattern mining depends on how to analyze and process the huge set of spatio-temporal data due to the nature of it. The several method was presented in order to solve the problems in which existing spatio-temporal moving pattern mining methods[1-10] have, such as increasing execution time and required memory size during the pattern mining, but they did not solve properly yet. Thus, we proposed the STMP/MST method[11] as a preceding research in order to extract effectively sequential and/or periodical frequent occurrence moving patterns from the huge set of spatio-temporal moving data. The proposed method reduces patterns mining execution time, using the moving sequence tree based on hash tree. And also, to minimize the required memory space, it generalizes detailed historical data including spatio-temporal attributes into the real world scopes of space and time by using spatio-temporal concept hierarchy. In this paper, in order to verify the effectiveness of the STMP/MST method, we compared and analyzed performance with existing spatio-temporal moving pattern mining methods based on the quantity of mining data and minimum support factor.

  • PDF

EEG Dimensional Reduction with Stack AutoEncoder for Emotional Recognition using LSTM/RNN (LSTM/RNN을 사용한 감정인식을 위한 스택 오토 인코더로 EEG 차원 감소)

  • Aliyu, Ibrahim;Lim, Chang-Gyoon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.4
    • /
    • pp.717-724
    • /
    • 2020
  • Due to the important role played by emotion in human interaction, affective computing is dedicated in trying to understand and regulate emotion through human-aware artificial intelligence. By understanding, emotion mental diseases such as depression, autism, attention deficit hyperactivity disorder, and game addiction will be better managed as they are all associated with emotion. Various studies for emotion recognition have been conducted to solve these problems. In applying machine learning for the emotion recognition, the efforts to reduce the complexity of the algorithm and improve the accuracy are required. In this paper, we investigate emotion Electroencephalogram (EEG) feature reduction and classification using Stack AutoEncoder (SAE) and Long-Short-Term-Memory/Recurrent Neural Networks (LSTM/RNN) classification respectively. The proposed method reduced the complexity of the model and significantly enhance the performance of the classifiers.

Improving Performance of I/O Virtualization Framework based on Multi-queue SSD (다중 큐 SSD 기반 I/O 가상화 프레임워크의 성능 향상 기법)

  • Kim, Tae Yong;Kang, Dong Hyun;Eom, Young Ik
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.27-33
    • /
    • 2016
  • Virtualization has become one of the most helpful techniques in computing systems, and today it is prevalent in several computing environments including desktops, data-centers, and enterprises. However, since I/O layers are implemented to be oblivious to the I/O behaviors on virtual machines (VM), there still exists an I/O scalability issue in virtualized systems. In particular, when a multi-queue solid state drive (SSD) is used as a secondary storage, each system reveals a semantic gap that degrades the overall performance of the VM. This is due to two key problems, accelerated lock contentions and the I/O parallelism issue. In this paper, we propose a novel approach, including the design of virtual CPU (vCPU)-dedicated queues and I/O threads, which efficiently distributes the lock contentions and addresses the parallelism issue of Virtio-blk-data-plane in virtualized environments. Our approach is based on the above principle, which allocates a dedicated queue and an I/O thread for each vCPU to reduce the semantic gap. Our experimental results with various I/O traces clearly show that our design improves the I/O operations per second (IOPS) in virtualized environments by up to 155% over existing QEMU-based systems.

Analysis on Lightweight Methods of On-Device AI Vision Model for Intelligent Edge Computing Devices (지능형 엣지 컴퓨팅 기기를 위한 온디바이스 AI 비전 모델의 경량화 방식 분석)

  • Hye-Hyeon Ju;Namhi Kang
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.1-8
    • /
    • 2024
  • On-device AI technology, which can operate AI models at the edge devices to support real-time processing and privacy enhancement, is attracting attention. As intelligent IoT is applied to various industries, services utilizing the on-device AI technology are increasing significantly. However, general deep learning models require a lot of computational resources for inference and learning. Therefore, various lightweighting methods such as quantization and pruning have been suggested to operate deep learning models in embedded edge devices. Among the lightweighting methods, we analyze how to lightweight and apply deep learning models to edge computing devices, focusing on pruning technology in this paper. In particular, we utilize dynamic and static pruning techniques to evaluate the inference speed, accuracy, and memory usage of a lightweight AI vision model. The content analyzed in this paper can be used for intelligent video control systems or video security systems in autonomous vehicles, where real-time processing are highly required. In addition, it is expected that the content can be used more effectively in various IoT services and industries.

Design and Implementation of Transactional Write Buffer Cache with Storage Class Memory (트랜잭션 단위 쓰기를 보장하는 스토리지 클래스 메모리 쓰기 버퍼캐시의 설계 및 구현)

  • Kim, Young-Jin;Doh, In-Hwan;Kim, Eun-Sam;Choi, Jong-Moo;Lee, Dong-Hee;Noh, Sam-H.
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.2
    • /
    • pp.247-251
    • /
    • 2010
  • Using SCM in storage systems introduce new potentials for improving I/O performance and reliability. In this paper, we study the use of SCM as a buffer cache that guarantees transactional unit writes. Our proposed method can improve storage system reliability and performance at the same time and can recover the storage system immediately upon a system crash. The Proposed method is based on the LINUX JBD(Journaling Block Device), thus reliability is equivalent to JBD. In our experiments, the file system that adopts our method shows better I/O performance even while guaranteeing high reliability and shows fast file system recovery time (about 0.2 seconds).

A Secure Deletion Method for NAND Flash File System (NAND 플래시 파일 시스템을 위한 안전 삭제 기법)

  • Lee, Jae-Heung;Oh, Jin-Ha;Kim, Seok-Hyun;Yi, Sang-Ho;Heo, Jun-Young;Cho, Yoo-Kun;Hong, Ji-Man
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.3
    • /
    • pp.251-255
    • /
    • 2008
  • In most file systems, if a file is deleted, only the metadata of the file is deleted or modified and the file's data is still stored on the physical media. Some users require that deleted files no longer be accessible. This requirement is more important in embedded systems that employ flash memory as a storage medium. In this paper, we propose a secure deletion method for NAND flash file system and apply the method to YAFFS. Our method uses encryption to delete files and forces all keys of a specific file to be stored in the same block. Therefore, only one erase operation is required to securely delete a file. Our simulation results show that the amortized number of block erases is smaller than the simple encryption method. Even though we apply our method only to the YAFFS, our method can be easily applied to other NAND flash file systems.