• Title/Summary/Keyword: 다중CPU

Search Result 108, Processing Time 0.021 seconds

Design of Crossbar Switch On-chip Bus for Performance Improvement of SoC (SoC의 성능 향상을 위한 크로스바 스위치 온칩 버스 설계)

  • Heo, Jung-Burn;Ryoo, Kwang-Ki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.3
    • /
    • pp.684-690
    • /
    • 2010
  • Most of the existing SoCs have shared bus architecture which always has a bottleneck state. The more IPs are in an SOC, the less performance it is of the SOC, Therefore, its performance is effected by the entire communication rather than CPU speed. In this paper, we propose cross-bar switch bus architecture for the reduction of the bottleneck state and the improvement of the performance. The cross-bar switch bus supports up to 8 masters and 16 slaves and parallel communication with architecture of multiple channel bus. Each slave has an arbiter which stores priority information about masters. So, it prevents only one master occupying one slave and supports efficient communication. We compared WISHBONE on-chip shared bus architecture with crossbar switch bus architecture of the SOC platform, which consists of an OpenRISC processor, a VGA/LCD controller, an AC97 controller, a debug interface, a memory interface, and the performance improved by 26.58% than the previous shared bus.

Neural networks optimization for multi-dimensional digital signal processing in IoT devices (IoT 디바이스에서 다차원 디지털 신호 처리를 위한 신경망 최적화)

  • Choi, KwonTaeg
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1165-1173
    • /
    • 2017
  • Deep learning method, which is one of the most famous machine learning algorithms, has proven its applicability in various applications and is widely used in digital signal processing. However, it is difficult to apply deep learning technology to IoT devices with limited CPU performance and memory capacity, because a large number of training samples requires a lot of memory and computation time. In particular, if the Arduino with a very small memory capacity of 2K to 8K, is used, there are many limitations in implementing the algorithm. In this paper, we propose a method to optimize the ELM algorithm, which is proved to be accurate and efficient in various fields, on Arduino board. Experiments have shown that multi-class learning is possible up to 15-dimensional data on Arduino UNO with memory capacity of 2KB and possible up to 42-dimensional data on Arduino MEGA with memory capacity of 8KB. To evaluate the experiment, we proved the effectiveness of the proposed algorithm using the data sets generated using gaussian mixture modeling and the public UCI data sets.

Acceleration for Removing Sea-fog using Graphic Processors and Parallel Processing (그래픽 프로세서를 이용한 병렬연산 기반 해무 제거 고속화)

  • Kim, Young-doo;Kwak, Jae-min;Seo, Young-ho;Choi, Hyun-jun
    • Journal of Advanced Navigation Technology
    • /
    • v.21 no.5
    • /
    • pp.485-490
    • /
    • 2017
  • In this paper, we propose a technique for high speed removal of sea-fog using a graphic processor. This technique uses a host processor(CPU) and several graphics processors(GPU) capable of parallel processing to remove sea-fog from the input image. In the process of removing sea-fog, the dark channel extraction, the maximum brightness channel extraction, and the calculation of the transmission are performed by the host processor, and the process of refining the transmission by applying the bidirectional filter is performed in parallel through the graphic processor. To verify the proposed parallel processing method, three NVIDIA GTX 1070 GPUs were used to construct the verification environment. As a result, it takes about 140ms when implemented with one graphics processor, and 26ms when implemented using OpenMP and multiple GPGPUs. The proposed a parallel processing algorithm based on the graphics processor unit can be used for safe navigation, port control and monitoring system.

Streaming Device and App Development to Transmit and Play without store for Multimedia Contents (멀티미디어 콘텐츠를 저장 없이 스트리밍 전송 및 재생 가능한 스트리밍 기기 및 스마트 앱 개발)

  • Kang, Young-Man;Cho, Hyug-Hyun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.2
    • /
    • pp.287-294
    • /
    • 2017
  • Recently, many kinds of TV contents based on multimedia need transmission systems without the saved. In this study, we develop the streaming device and smart app to transmit and play without store for multimedia contents such a broadcasting, internet, VOD, video, and animation. To do this, we compare and analysis main CPU products, broadcasting tuners and interfaces for streaming sources, and memory products. We then design and implement the streaming device which supports 30fps@FHD H.264 decoding and streaming, and multimedia sources with more than three. We develop the software for these requirements. Thissystemis useful to support billing service for multimedia contents.

Proposal For Improving Data Processing Performance Using Python (파이썬 활용한 데이터 처리 성능 향상방법 제안)

  • Kim, Hyo-Kwan;Hwang, Won-Yong
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.4
    • /
    • pp.306-311
    • /
    • 2020
  • This paper deals with how to improve the performance of Python language with various libraries when developing a model using big data. The Python language uses the Pandas library for processing spreadsheet-format data such as Excel. In processing data, Python operates on an in-memory basis. There is no performance issue when processing small scale of data. However, performance issues occur when processing large scale of data. Therefore, this paper introduces a method for distributed processing of execution tasks in a single cluster and multiple clusters by using a Dask library that can be used with Pandas when processing data. The experiment compares the speed of processing a simple exponential model using only Pandas on the same specification hardware and the speed of processing using a dask together. This paper presents a method to develop a model by distributing a large scale of data by CPU cores in terms of performance while maintaining that python's advantage of using various libraries is easy.

GPU-based Parallel Ant Colony System for Traveling Salesman Problem

  • Rhee, Yunseok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • In this paper, we design and implement a GPU-based parallel algorithm to effectively solve the traveling salesman problem through an ant color system. The repetition process of generating hundreds or thousands of tours simultaneously in TSP utilizes GPU's task-level parallelism, and the update process of pheromone trails data actively exploits data parallelism by 32x32 thread blocks. In particular, through simultaneous memory access of multiple threads, the coalesced accesses on continuous memory addresses and concurrent accesses on shared memory are supported. This experiment used 127 to 1002 city data provided by TSPLIB, and compared the performance of sequential and parallel algorithms by using Intel Core i9-9900K CPU and Nvidia Titan RTX system. Performance improvement by GPU parallelization shows speedup of about 10.13 to 11.37 times.

GPU Resource Contention Management Technique for Simultaneous GPU Tasks in the Container Environments with Share the GPU (GPU를 공유하는 컨테이너 환경에서 GPU 작업의 동시 실행을 위한 GPU 자원 경쟁 관리기법)

  • Kang, Jihun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.10
    • /
    • pp.333-344
    • /
    • 2022
  • In a container-based cloud environment, multiple containers can share a graphical processing unit (GPU), and GPU sharing can minimize idle time of GPU resources and improve resource utilization. However, in a cloud environment, GPUs, unlike CPU or memory, cannot logically multiplex computing resources to provide users with some of the resources in an isolated form. In addition, containers occupy GPU resources only when performing GPU operations, and resource usage is also unknown because the timing or size of each container's GPU operations is not known in advance. Containers unrestricted use of GPU resources at any given point in time makes managing resource contention very difficult owing to where multiple containers run GPU tasks simultaneously, and GPU tasks are handled in black box form inside the GPU. In this paper, we propose a container management technique to prevent performance degradation caused by resource competition when multiple containers execute GPU tasks simultaneously. Also, this paper demonstrates the efficiency of container management techniques that analyze and propose the problem of degradation due to resource competition when multiple containers execute GPU tasks simultaneously through experiments.

Fast Multi-GPU based 3D Backprojection Method (다중 GPU 기반의 고속 삼차원 역전사 기법)

  • Lee, Byeong-Hun;Lee, Ho;Kye, Hee-Won;Shin, Yeong-Gil
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.2
    • /
    • pp.209-218
    • /
    • 2009
  • 3D backprojection is a kind of reconstruction algorithm to generate volume data consisting of tomographic images, which provides spatial information of the original 3D data from hundreds of 2D projections. The computational time of backprojection increases in proportion to the size of volume data and the number of projection images since the value of every voxel in volume data is calculated by considering corresponding pixels from hundreds of projections. For the reduction of computational time, fast GPU based 3D backprojection methods have been studied recently and the performance of them has been improved significantly. This paper presents two multiple GPU based methods to maximize the parallelism of GPU and compares the efficiencies of two methods by considering both the number of projections and the size of volume data. The first method is to generate partial volume data independently for all projections after allocating a half size of volume data on each GPU. The second method is to acquire the entire volume data by merging the incomplete volume data of each GPU on CPU. The in-complete volume data is generated using the half size of projections after allocating the full size of volume data on each GPU. In experimental results, the first method performed better than the second method when the entire volume data can be allocated on GPU. Otherwise, the second method was efficient than the first one.

  • PDF

Development of a Remotely Sensed Image Processing/Analysis System : GeoPixel Ver. 1.0 (JAVA를 이용한 위성영상처리/분석 시스템 개발 : GeoPixel Ver. 1.0)

  • 안충현;신대혁
    • Korean Journal of Remote Sensing
    • /
    • v.13 no.1
    • /
    • pp.13-30
    • /
    • 1997
  • Recent improvements of satellite remote sensing sensors which are represented by hyperspectral imaging sensors and high spatial resolution sensors provide a large amount of data, typically several hundred megabytes per one scene. Moreover, increasing information exchange via internet and information super-highway requires the developments of more active service systems for processing and analysing of remote sensing data in order to provide value-added products. In this sense, an advanced satellite data processing system is being developed to achive high performance in computing speed and efficieney in processing a huge volume of data, and to make possible network computing and easy improving, upgrading and managing of systems. JAVA internet programming language provides several advantages for developing software such as object-oriented programming, multi-threading and robust memory managent. Using these features, a satellite data processing system named as GeoPixel has been developing using JAVA language. The GeoPixel adopted newly developed techniques including object-pipe connect method between each process and multi-threading structure. In other words, this system has characteristics such as independent operating platform and efficient data processing by handling a huge volume of remote sensing data with robustness. In the evaluation of data processing capability, the satisfactory results were shown in utilizing computer resources(CPU and Memory) and processing speeds.

Implementation of Ring Topology Interconnection Network with PCIe Non-Transparent Bridge Interface (PCIe Non-Transparent Bridge 인터페이스 기반 링 네트워크 인터커넥트 시스템 구현)

  • Kim, Sang-Gyum;Lee, Yang-Woo;Lim, Seung-Ho
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.3
    • /
    • pp.65-72
    • /
    • 2019
  • HPC(High Performance Computing) is the computing system that connects a number of computing nodes with high performance interconnect network. In the HPC, interconnect network technology is one of the key player to make high performance systems, and mainly, Infiniband or Ethernet are used for interconnect network technology. Nowadays, PCIe interface is main interface within computer system in that host CPU connects high performance peripheral devices through PCIe bridge interface. For connecting between two computing nodes, PCIe Non-Transparent Bridge(NTB) standard can be used, however it basically connects only two hosts with its original standards. To give cost-effective interconnect network interface with PCIe technology, we develop a prototype of interconnect network system with PCIe NTB. In the prototyped system, computing nodes are connected to each other via PCIe NTB interface constructing switchless interconnect network such as ring network. Also, we have implemented prototyped data sharing mechanism on the prototyped interconnect network system. The designed PCIe NTB-based interconnect network system is cost-effective as well as it provides competitive data transferring bandwidth within the interconnect network.