• Title/Summary/Keyword: high throughput computing

Search Result 95, Processing Time 0.029 seconds

Design and Implementation of Workflow-based User Environment on Computational Grid (계산 그리드에서 워크플로우 기반의 사용자 환경 설계 및 구현)

  • Hwang, Sun-Tae;Sim, Gyu-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.4 s.36
    • /
    • pp.165-171
    • /
    • 2005
  • High speed computer, large scale storage device and high speed computer network are computing infrastructure which we can easily access to in these days. However, many computer simulations in natural or applied science such as molecular simulation require more computing power as well as larger scale of storage. Grid computing which is a next generation of distributed computing environment, is one of solution for the new requirements. Even though many researches have been going on Grid computing, those are oriented to communication interface and protocols, and middleware like globus tool kits[2, 3]. Therefore research on application level platform or application itself is yet premature and it makes real users be difficult to utilize Grid system for their research. In this paper, we suggest a new user environment and an abstract job model for simulation experiments on MGrid(Molecular Simulation Grid). It will make users be able to utilize Grid resources efficiently and reliably.

  • PDF

Task Migration in Cooperative Vehicular Edge Computing (협력적인 차량 엣지 컴퓨팅에서의 태스크 마이그레이션)

  • Moon, Sungwon;Lim, Yujin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.12
    • /
    • pp.311-318
    • /
    • 2021
  • With the rapid development of the Internet of Things(IoT) technology recently, multi-access edge computing(MEC) is emerged as a next-generation technology for real-time and high-performance services. High mobility of users between MECs with limited service areas is considered one of the issues in the MEC environment. In this paper, we consider a vehicle edge computing(VEC) environment which has a high mobility, and propose a task migration algorithm to decide whether or not to migrate and where to migrate using DQN, as a reinforcement learning method. The objective of the proposed algorithm is to improve the system throughput while satisfying QoS(Quality of Service) requirements by minimizing the difference between queueing delays in vehicle edge computing servers(VECSs). The results show that compared to other algorithms, the proposed algorithm achieves approximately 14-49% better QoS satisfaction and approximately 14-38% lower service blocking rate.

40-TFLOPS artificial intelligence processor with function-safe programmable many-cores for ISO26262 ASIL-D

  • Han, Jinho;Choi, Minseok;Kwon, Youngsu
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.468-479
    • /
    • 2020
  • The proposed AI processor architecture has high throughput for accelerating the neural network and reduces the external memory bandwidth required for processing the neural network. For achieving high throughput, the proposed super thread core (STC) includes 128 × 128 nano cores operating at the clock frequency of 1.2 GHz. The function-safe architecture is proposed for a fault-tolerance system such as an electronics system for autonomous cars. The general-purpose processor (GPP) core is integrated with STC for controlling the STC and processing the AI algorithm. It has a self-recovering cache and dynamic lockstep function. The function-safe design has proved the fault performance has ASIL D of ISO26262 standard fault tolerance levels. Therefore, the entire AI processor is fabricated via the 28-nm CMOS process as a prototype chip. Its peak computing performance is 40 TFLOPS at 1.2 GHz with the supply voltage of 1.1 V. The measured energy efficiency is 1.3 TOPS/W. A GPP for control with a function-safe design can have ISO26262 ASIL-D with the single-point fault-tolerance rate of 99.64%.

High Throughput Parallel KMP Algorithm Considering CPU-GPU Memory Hierarchy (CPU-GPU 메모리 계층을 고려한 고처리율 병렬 KMP 알고리즘)

  • Park, Soeun;Kim, Daehee;Lee, Myungho;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.5
    • /
    • pp.656-662
    • /
    • 2018
  • Pattern matching algorithm is widely used in many application fields such as bio-informatics, intrusion detection, etc. Among many string matching algorithms, KMP (Knuth-Morris-Pratt) algorithm is commonly used because of its fast execution time when using large texts. However, the processing speed of KMP algorithm is also limited when the text size increases significantly. In this paper, we propose a high throughput parallel KMP algorithm considering CPU-GPU memory hierarchy based on OpenCL in GPGPU (General Purpose computing on Graphic Processing Unit). We focus on the optimization for the allocation of work-times and work-groups, the local memory copy of the pattern data and the failure table, and the overlapping of the data transfer with the string matching operations. The experimental results show that the execution time of the optimized parallel KMP algorithm is about 3.6 times faster than that of the non-optimized parallel KMP algorithm.

Sub-Frame Analysis-based Object Detection for Real-Time Video Surveillance

  • Jang, Bum-Suk;Lee, Sang-Hyun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.76-85
    • /
    • 2019
  • We introduce a vision-based object detection method for real-time video surveillance system in low-end edge computing environments. Recently, the accuracy of object detection has been improved due to the performance of approaches based on deep learning algorithm such as Region Convolutional Neural Network(R-CNN) which has two stage for inferencing. On the other hand, one stage detection algorithms such as single-shot detection (SSD) and you only look once (YOLO) have been developed at the expense of some accuracy and can be used for real-time systems. However, high-performance hardware such as General-Purpose computing on Graphics Processing Unit(GPGPU) is required to still achieve excellent object detection performance and speed. To address hardware requirement that is burdensome to low-end edge computing environments, We propose sub-frame analysis method for the object detection. In specific, We divide a whole image frame into smaller ones then inference them on Convolutional Neural Network (CNN) based image detection network, which is much faster than conventional network designed forfull frame image. We reduced its computationalrequirementsignificantly without losing throughput and object detection accuracy with the proposed method.

A Lightweight Software-Defined Routing Scheme for 5G URLLC in Bottleneck Networks

  • Math, Sa;Tam, Prohim;Kim, Seokhoon
    • Journal of Internet Computing and Services
    • /
    • v.23 no.2
    • /
    • pp.1-7
    • /
    • 2022
  • Machine learning (ML) algorithms have been intended to seamlessly collaborate for enabling intelligent networking in terms of massive service differentiation, prediction, and provides high-accuracy recommendation systems. Mobile edge computing (MEC) servers are located close to the edge networks to overcome the responsibility for massive requests from user devices and perform local service offloading. Moreover, there are required lightweight methods for handling real-time Internet of Things (IoT) communication perspectives, especially for ultra-reliable low-latency communication (URLLC) and optimal resource utilization. To overcome the abovementioned issues, this paper proposed an intelligent scheme for traffic steering based on the integration of MEC and lightweight ML, namely support vector machine (SVM) for effectively routing for lightweight and resource constraint networks. The scheme provides dynamic resource handling for the real-time IoT user systems based on the awareness of obvious network statues. The system evaluations were conducted by utillizing computer software simulations, and the proposed approach is remarkably outperformed the conventional schemes in terms of significant QoS metrics, including communication latency, reliability, and communication throughput.

Candidate Path Selection Method for TCP Performance Improvement in Fixed Robust Routing

  • Fukushima, Yukinobu;Matsumura, Takashi;Urushibara, Kazutaka;Yokohira, Tokumi
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.6
    • /
    • pp.445-453
    • /
    • 2016
  • Fixed robust routing is attracting attention as routing that achieves high robustness against changes in traffic patterns without conducting traffic measurement and performing dynamic route changes. Fixed robust routing minimizes the worst-case maximum link load by distributing traffic of every source-destination (s-d) router pair onto multiple candidate paths (multipath routing). Multipath routing, however, can result in performance degradation of Transmission Control Protocol (TCP) because of frequent out-of-order packet arrivals. In this paper, we first investigate the influence of multipath routing on TCP performance under fixed robust routing with a simulation using ns-2. The simulation results clarify that TCP throughput greatly degrades with multipath routing. We next propose a candidate path selection method to improve TCP throughput while suppressing the worst-case maximum link load to less than the allowed level under fixed robust routing. The method selects a single candidate path for each of a predetermined ratio of s-d router pairs in order to avoid TCP performance degradation, and it selects multiple candidate paths for each of the other router pairs in order to suppress the worst-case maximum link load. Numerical examples show that, provided the worst-case maximum link load is less than 1.0, our proposed method achieves about six times the TCP throughput as the original fixed robust routing.

Proteomics Data Analysis using Representative Database

  • Kwon, Kyung-Hoon;Park, Gun-Wook;Kim, Jin-Young;Park, Young-Mok;Yoo, Jong-Shin
    • Bioinformatics and Biosystems
    • /
    • v.2 no.2
    • /
    • pp.46-51
    • /
    • 2007
  • In the proteomics research using mass spectrometry, the protein database search gives the protein information from the peptide sequences that show the best match with the tandem mass spectra. The protein sequence database has been a powerful knowledgebase for this protein identification. However, as we accumulate the protein sequence information in the database, the database size gets to be huge. Now it becomes hard to consider all the protein sequences in the database search because it consumes much computing time. For the high-throughput analysis of the proteome, usually we have used the non-redundant refined database such as IPI human database of European Bioinformatics Institute. While the non-redundant database can supply the search result in high speed, it misses the variation of the protein sequences. In this study, we have concerned the proteomics data in the point of protein similarities and used the network analysis tool to build a new analysis method. This method will be able to save the computing time for the database search and keep the sequence variation to catch the modified peptides.

  • PDF

Forecasting Chemical Tanker Freight Rate with ANN

  • Lim, Sangseop;Kim, Seokhun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.113-118
    • /
    • 2021
  • In this paper, we propose an efficient dynamic workload balancing strategy which improves the performance of high-performance computing system. The key idea of this dynamic workload balancing strategy is to minimize execution time of each job and to maximize the system throughput by effectively using system resource such as CPU, memory. Also, this strategy dynamically allocates job by considering demanded memory size of executing job and workload status of each node. If an overload node occurs due to allocated job, the proposed scheme migrates job, executing in overload nodes, to another free nodes and reduces the waiting time and execution time of job by balancing workload of each node. Through simulation, we show that the proposed dynamic workload balancing strategy based on CPU, memory improves the performance of high-performance computing system compared to previous strategies.

Concurrent Support Vector Machine Processor (Concurrent Support Vector Machine 프로세서)

  • 위재우;이종호
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.8
    • /
    • pp.578-584
    • /
    • 2004
  • The CSVM(Current Support Vector Machine) that is a digital architecture performing all phases of recognition process including kernel computing, learning, and recall of SVM(Support Vector Machine) on a chip is proposed. Concurrent operation by parallel architecture of elements generates high speed and throughput. The classification problems of bio data having high dimension are solved fast and easily using the CSVM. Quadratic programming in original SVM learning algorithm is not suitable for hardware implementation, due to its complexity and large memory consumption. Hardware-friendly SVM learning algorithms, kernel adatron and kernel perceptron, are embedded on a chip. Experiments on fixed-point algorithm having quantization error are performed and their results are compared with floating-point algorithm. CSVM implemented on FPGA chip generates fast and accurate results on high dimensional cancer data.