• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.033 seconds

Deployment and Performance Analysis of Data Transfer Node Cluster for HPC Environment (HPC 환경을 위한 데이터 전송 노드 클러스터 구축 및 성능분석)

  • Hong, Wontaek;An, Dosik;Lee, Jaekook;Moon, Jeonghoon;Seok, Woojin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.9
    • /
    • pp.197-206
    • /
    • 2020
  • Collaborative research in science applications based on HPC service needs rapid transfers of massive data between research colleagues over wide area network. With regard to this requirement, researches on enhancing data transfer performance between major superfacilities in the U.S. have been conducted recently. In this paper, we deploy multiple data transfer nodes(DTNs) over high-speed science networks in order to move rapidly large amounts of data in the parallel filesystem of KISTI's Nurion supercomputer, and perform transfer experiments between endpoints with approximately 130ms round trip time. We have shown the results of transfer throughput in different size file sets and compared them. In addition, it has been confirmed that the DTN cluster with three nodes can provide about 1.8 and 2.7 times higher transfer throughput than a single node in two types of concurrency and parallelism settings.

A Scheduling Algorithm for Parsing of MPEG Video on the Heterogeneous Distributed Environment (이질적인 분산 환경에서의 MPEG비디오의 파싱을 위한 스케줄링 알고리즘)

  • Nam Yunyoung;Hwang Eenjun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.12
    • /
    • pp.673-681
    • /
    • 2004
  • As the use of digital videos is getting popular, there is an increasing demand for efficient browsing and retrieval of video. To support such operations, effective video indexing should be incorporated. One of the most fundamental steps in video indexing is to parse video stream into shots and scenes. Generally, it takes long time to parse a video due to the huge amount of computation in a traditional single computing environment. Previous studies had widely used Round Robin scheduling which basically allocates tasks to each slave for a time interval of one quantum. This scheduling is difficult to adapt in a heterogeneous environment. In this paper, we propose two different parallel parsing algorithms which are Size-Adaptive Round Robin and Dynamic Size-Adaptive Round Robin for the heterogeneous distributed computing environments. In order to show their performance, we perform several experiments and show some of the results.

GLOVE: Distributed Shared Memory Based Parallel Visualization Tool for Massive Scientific Dataset (GLOVE: 대용량 과학 데이터를 위한 분산공유메모리 기반 병렬 가시화 도구)

  • Lee, Joong-Youn;Kim, Min Ah;Lee, Sehoon;Hur, Young Ju
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.6
    • /
    • pp.273-282
    • /
    • 2016
  • Visualization tool can be divided by three components - data I/O, visual transformation and interactive rendering. In this paper, we present requirements of three major components on visualization tools for massive scientific dataset and propose strategies to develop the tool which satisfies those requirements. In particular, we present how to utilize open source softwares to efficiently realize our goal. Furthermore, we also study the way to combine several open source softwares which are separately made to produce a single visualization software and optimize it for realtime visualization of massiv espatio-temporal scientific dataset. Finally, we propose a distributed shared memory based scientific visualization tool which is called "GLOVE". We present a performance comparison among GLOVE and well known open source visualization tools such as ParaView and VisIt.

A Software VIA based PC Cluster System on SCI Network (SCI 네트워크 상의 소프트웨어 VIA기반 PC글러스터 시스템)

  • Shin, Jeong-Hee;Chung, Sang-Hwa;Park, Se-Jin
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.4
    • /
    • pp.192-200
    • /
    • 2002
  • The performance of a PC cluster system is limited by the use of traditional communication protocols, such as TCP/IP because these protocols are accompanied with significant software overheads. To overcome the problem, systems based on user-level interface for message passing without intervention of kernel have been developed. The VIA(Virtual Interface Architecture) is one of the representative user-level interfaces which provide low latency and high bandwidth. In this paper, a VIA system is implemented on an SCI(Scalable Coherent Interface) network based PC cluster. The system provides both message-passing and shared-memory programming environments and shows the maximum bandwidth of 84MB/s and the latency of $8{\mu}s$. The system also shows better performance in comparison with other comparable computer systems in carrying out parallel benchmark programs.

Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing (터커 분해 및 은닉층 병렬처리를 통한 임베디드 시스템의 다중 DNN 가속화 기법)

  • Kim, Ji-Min;Kim, In-Mo;Kim, Myung-Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.842-849
    • /
    • 2022
  • With the development of deep learning technology, there are many cases of using DNNs in embedded systems such as unmanned vehicles, drones, and robotics. Typically, in the case of an autonomous driving system, it is crucial to run several DNNs which have high accuracy results and large computation amount at the same time. However, running multiple DNNs simultaneously in an embedded system with relatively low performance increases the time required for the inference. This phenomenon may cause a problem of performing an abnormal function because the operation according to the inference result is not performed in time. To solve this problem, the solution proposed in this paper first reduces the computation by applying the Tucker decomposition to DNN models with big computation amount, and then, make DNN models run in parallel as much as possible in the unit of hidden layer inside the GPU. The experimental result shows that the DNN inference time decreases by up to 75.6% compared to the case before applying the proposed technique.

Optimal Design Space Exploration of Multi-core Architecture for Real-time Lane Detection Algorithm (실시간 차선인식 알고리즘을 위한 최적의 멀티코어 아키텍처 디자인 공간 탐색)

  • Jeong, Inkyu;Kim, Jongmyon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.3
    • /
    • pp.339-349
    • /
    • 2017
  • This paper proposes a four-stage algorithm for detecting lanes on a driving car. In the first stage, it extracts region of interests in an image. In the second stage, it employs a median filter to remove noise. In the third stage, a binary algorithm is used to classify two classes of backgrond and foreground of an input image. Finally, an image erosion algorithm is utilized to obtain clear lanes by removing noises and edges remained after the binary process. However, the proposed lane detection algorithm requires high computational time. To address this issue, this paper presents a parallel implementation of a real-time line detection algorithm on a multi-core architecture. In addition, we implement and simulate 8 different processing element (PE) architectures to select an optimal PE architecture for the target application. Experimental results indicate that 40×40 PE architecture show the best performance, energy efficiency and area efficiency.

The optimization of output coupler reflectivity of high repetitive pulsed Nd:YAG laser system adopted 3-mesh parallel sequential charge and discharge method (3단 병렬 충.방전 방식을 적용한 고반복 펄스형 Nd:YAG 레이저 출력거울 반사율의 최적화)

  • 김휘영;홍수열;김동수
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.3
    • /
    • pp.369-376
    • /
    • 2001
  • The optimization of resonator and laser power supply has been considered to be significant for improving the efficiency of a pulsed Nd:YAG laser system. We have proposed a new method of 3-mesh parallel sequential charge and discharge circuit as a laser power supply; more compact than conventional power supply, competitive in price, easy to control the laser power density according to various material processing, and equipped with the optimum reflectivity of output coupler. In this study, we could find that the maximum laser output was obtained by using 85% of reflectivity in the case of 50[W]-class. In addition using the power supply of new method, it's possible to charge each capacitor bank with a higher energy within the given charging time adopted a new method mentioned above; namely, we can allow each capacitor to have much more charging time and storage energy. So, higher laser output was obtained than conventional power supply.

  • PDF

A High-Speed Low-Complexity 128/64-point $Radix-2^4$ FFT Processor for MIMO-OFDM Systems (MIMO-OFDM 시스템을 위한 고속 저면적 128/64-point $Radix-2^4$ FFT 프로세서 설계)

  • Hang, Liu;Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.2
    • /
    • pp.15-23
    • /
    • 2009
  • This paper presents a novel high-speed, low-complexity flexible 128/64-point $radix-2^4$ FFT/IFFT processor for the applications in high-throughput MIMO-OFDM systems. The high radix multi-path delay feed-back (MDF) FFT architecture provides a higher throughput rate and low hardware complexity by using a four-parallel data-path scheme. The proposed processor not only supports the operation of FFT/IFFT in 128-point and 64-point but can also provide a high data processing rate by using a four-parallel data-path scheme. Furthermore, the proposed design has a less hardware complexity compared with traditional 128/64-point FFT/IFFT processors. Our proposed processor has a high throughput rate of up to 560Msample/s at 140MHz while requiring much smaller hardware expenditure satisfying IEEE 802.11n standard requirements.

A Simple Multi-rate Parallel Interference Canceller for the IMT-2000 3GPP System (IMT-2000 3GPP 시스템을 위한 간단한 다중 전송률 병렬형 간섭제거기)

  • Kim, Jin-Kyeom;Oh, Seong-Keun;Sunwoo, Myung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.38 no.12
    • /
    • pp.10-19
    • /
    • 2001
  • In this paper, we propose an effective but simple multi-rate parallel interference canceller(PIC) for the international mobile telecommunications-2000(IMT-2000) 3rd generation partnership project (3GPP) system. For effective multi-rate processing, we define the basic block as one symbol period of the dedicated physical control channel(DPCCH) having the lowest data rate and common to all users. Then, decision and interference cancellation are performed at every basic block. For an asynchronous channel, we propose an advance removal scheme that removes in advance multiple access interference(MAI) due to the next blockof other users with shorter delay. Introducing a pipeline structure at a sample base, we can implement efficiently the PIC using the advance removal scheme with a minimum hardware and no extra computations. Through computer simulations, we analyze the bit error rate(BER) performance of the proposed PIC with respect to signal-to-noise ratio(SNR) and the number of users.

  • PDF

Parallelizing 3D Frequency-domain Acoustic Wave Propagation Modeling using a Xeon Phi Coprocessor (제온 파이 보조 프로세서를 이용한 3차원 주파수 영역 음향파 파동 전파 모델링 병렬화)

  • Ryu, Donghyun;Jo, Sang Hoon;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.20 no.3
    • /
    • pp.129-136
    • /
    • 2017
  • 3D seismic data processing methods such as full waveform inversion or reverse-time migration require 3D wave propagation modeling and heavy calculations. We compared efficiency and accuracy of a Xeon Phi coprocessor to those of a high-end server CPU using 3D frequency-domain wave propagation modeling. We adopted the OpenMP parallel programming to the time-domain finite difference algorithm by considering the characteristics of the Xeon Phi coprocessors. We applied the Fourier transform using a running-integration to obtain the frequency-domain wavefield. A numerical test on frequency-domain wavefield modeling was performed using the 3D SEG/EAGE salt velocity model. Consequently, we could obtain an accurate frequency-domain wavefield and attain a 1.44x speedup using the Xeon Phi coprocessor compared to the CPU.