• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.028 seconds

Task Balancing Scheme of MPI Gridding for Large-scale LiDAR Data Interpolation (대용량 LiDAR 데이터 보간을 위한 MPI 격자처리 과정의 작업량 발란싱 기법)

  • Kim, Seon-Young;Lee, Hee-Zin;Park, Seung-Kyu;Oh, Sang-Yoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.1-10
    • /
    • 2014
  • In this paper, we propose MPI gridding algorithm of LiDAR data that minimizes the communication between the cores. The LiDAR data collected from aircraft is a 3D spatial information which is used in various applications. Since there are many cases where the LiDAR data has too high resolution than actually required or non-surface information is included in the data, filtering the raw LiDAR data is required. In order to use the filtered data, the interpolation using the data structure to search adjacent locations is conducted to reconstruct the data. Since the processing time of LiDAR data is directly proportional to the size of it, there have been many studies on the high performance parallel processing system using MPI. However, previously proposed methods in parallel approach possess possible performance degradations such as imbalanced data size among cores or communication overhead for resolving boundary condition inconsistency. We conduct empirical experiments to verify the effectiveness of our proposed algorithm. The results show that the total execution time of the proposed method decreased up to 4.2 times than that of the conventional method on heterogeneous clusters.

GPU-Based ECC Decode Unit for Efficient Massive Data Reception Acceleration

  • Kwon, Jisu;Seok, Moon Gi;Park, Daejin
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1359-1371
    • /
    • 2020
  • In transmitting and receiving such a large amount of data, reliable data communication is crucial for normal operation of a device and to prevent abnormal operations caused by errors. Therefore, in this paper, it is assumed that an error correction code (ECC) that can detect and correct errors by itself is used in an environment where massive data is sequentially received. Because an embedded system has limited resources, such as a low-performance processor or a small memory, it requires efficient operation of applications. In this paper, we propose using an accelerated ECC-decoding technique with a graphics processing unit (GPU) built into the embedded system when receiving a large amount of data. In the matrix-vector multiplication that forms the Hamming code used as a function of the ECC operation, the matrix is expressed in compressed sparse row (CSR) format, and a sparse matrix-vector product is used. The multiplication operation is performed in the kernel of the GPU, and we also accelerate the Hamming code computation so that the ECC operation can be performed in parallel. The proposed technique is implemented with CUDA on a GPU-embedded target board, NVIDIA Jetson TX2, and compared with execution time of the CPU.

Parallel Testing Circuits with Versatile Data Patterns for SOP Image SRAM Buffer (SOP Image SRAM Buffer용 다양한 데이터 패턴 병렬 테스트 회로)

  • Jeong, Kyu-Ho;You, Jae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.9
    • /
    • pp.14-24
    • /
    • 2009
  • Memory cell array and peripheral circuits are designed for system on panel style frame buffer. Moreover, a parallel test methodology to test multiple blocks of memory cells is proposed to overcome low yield of system on panel processing technologies. It is capable of faster fault detection compared to conventional memory tests and also applicable to the tests of various embedded memories and conventional SRAMs. The various patterns of conventional test vectors can be used to enhance fault coverage. The proposed testing method is also applicable to hierarchical bit line and divided word line, one of design trends of recent memory architectures.

A Parallel Video Encoding Technique for U-HDTV (U-HDTV를 위한 향상된 병렬 비디오 부호화 기법)

  • Jung, Seung-Won;Ko, Sung-Jea
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.1
    • /
    • pp.132-140
    • /
    • 2011
  • Ultra-High Definition Television (U-HDTV) is a promising candidate for the next generation television. Since the U-HDTV video signal requires a huge amount of data, parallel implementation of the U-HDTV compression system is highly demanding. In the conventional parallel video codec, a video is divided into sub-sequences and the sub-sequences are independently encoded. In this paper, for efficient parallel processing, we propose a pipelined encoding structure which exploits cross-correlation among the sub-sequences. The experimental results demonstrate that the proposed technique improves the coding efficiency and provides the sub-sequences of the balanced visual quality.

The Construction of A Parallel type Bloom Filter (병렬 구조의 블룸필터 설계)

  • Jang, Young-dal;Kim, Ji-hong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.6
    • /
    • pp.1113-1120
    • /
    • 2017
  • As the size of the data is getting larger and larger due to improvement of the telecommunication techniques, it would be main issues to develop and process the database. The bloom filter used to lookup a particular element under the given set is very useful structure because of the space efficiency. In this paper, we analyse the main factor of the false positive and propose the new parallel type bloom filter in order to minimize the false positive which is caused by other hash functions. The proposed method uses the memory as large as the conventional bloom filter use, but it can improve the processing speed using parallel processing. In addition, if we use the perfect hash function, the insertion and deletion function in the proposed bloom filter would be possible.

Implementation of a Parallel Viterbi Decoder for High Speed Multimedia Communications (멀티미디어 통신용 병렬 아키텍쳐 고속 비터비 복호기 설계)

  • Lee, Byeong-Cheol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.2
    • /
    • pp.78-84
    • /
    • 2000
  • The Viterbi decoders can be classified into serial Viterbi decoders and parallel Viterbi decoders. Parallel Viterbi decoders can handle higher data rates than serial Viterbl decoders. This paper designs and implements a fully parallel Viterbi decoder for high speed multimedia communications. For high speed operations, the ACS (Add-Compare-Select) module consisting of 64 PEs (Processing Elements) can compute one stage in a clock. In addition, the systolic away structure with 32 pipeline stages is developed for the TB (traceback) module. The implemented Viterbi decoder can support code rates 1/2, 2/3, 3/4, 5/6 and 7/8 using punctured codes. We have developed Verilog HDL models and performed logic synthesis. The 0.6 ${\mu}{\textrm}{m}$ SAMSUNG KG75000 SOG cell library has been used. The implemented Viterbi decoder has about 100,400 gates, and is running at 70 MHz in the worst case simulation.

  • PDF

Multi-Scale Contact Analysis Between Net and Numerous Particles (그물망과 대량입자의 멀티 스케일 접촉해석)

  • Jun, Chul Woong;Sohn, Jeong Hyun
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.38 no.1
    • /
    • pp.17-23
    • /
    • 2014
  • Graphics processing units (GPUs) are ideal for solving problems involving parallel data computations. In this study, the GPU is used for effectively carrying out a multi-body dynamic simulation with particle dynamics. The Hilber-Hushes-Taylor (HHT) implicit integration algorithm is used to solve the integral equations. For detecting collisions among particles, the spatial subdivision algorithm and discrete-element methods (DEM) are employed. The developed program is verified by comparing its results with those of ADAMS. The numerical efficiencies of the serial program using the CPU and the parallel program using the GPU are compared in terms of the number of particles, and it is observed that when the number of particles is greater, more computing time is saved by using the GPU. In the present example, when the number of particles is 1,300, the computational speed of the parallel analysis program is about 5 times faster than that of the serial analysis program.

A real-time high speed full search block matching motion estimation processor (고속 실시간 처리 full search block matching 움직임 추정 프로세서)

  • 유재희;김준호
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.12
    • /
    • pp.110-119
    • /
    • 1996
  • A novel high speed VLSI architecture and its VLSI realization methodologies for a motion estimation processor based on full search block matching algorithm are presentd. The presented architecture is designed in order to be suitable for highly parallel and pipelined processing with identical PE's and adjustable in performance and hardware amount according to various application areas. Also, the throughput is maximized by enhancing PE utilization up to 100% and the chip pin count is reduced by reusing image data with embedded image memories. Also, the uniform and identical data processing structure of PE's eases VLSI implementation and the clock rate of external I/O data can be made slower compared to internal clock rate to resolve I/O bottleneck problem. The logic and spice simulation results of the proposed architecture are presented. The performances of the proposed architecture are evaluated and compared with other architectures. Finally, the chip layout is shown.

  • PDF

Transonic/Supersonic Nonlinear Aeroelastic Analysis of a Complete Aircraft Using High Speed Parallel Processing Technique (고속 병렬처리 기법을 이용한 전기체 항공기 형상의 천음속/초음속 비선형 공탄성 해석)

  • Kim, Dong-Hyun;Kwon, Hyuk-Jun;Lee, In;Kwon, Oh-Joon;Paek, Seung-Kil;Hyun, Yong-Hee
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.30 no.8
    • /
    • pp.46-55
    • /
    • 2002
  • A nonlinear aeroelastic analysis system in transonic and supersonic flows has been developed using high speed parallel processing technique on the network based PC-clustered machines. This paper includes the coupling of advanced numerical techniques such as computational structural dynamics (CSD), finite element method (FEM) and computational fluid dynamics (CFD). The unsteady Euler solver on dynamic unstructured meshes is employed and coupled with computational aeroelastic solvers. Thus it can give very accurate engineering data in the structural and aeroelastic design of flight vehicles. To show the great potential of useful application, transonic and supersonic flutter analyses have been conducted for a complete aircraft model under developing in Korea.

PARALLEL IMAGE RECONSTRUCTION FOR NEW VACUUM SOLAR TELESCOPE

  • Li, Xue-Bao;Wang, Feng;Xiang, Yong Yuan;Zheng, Yan Fang;Liu, Ying Bo;Deng, Hui;Ji, Kai Fan
    • Journal of The Korean Astronomical Society
    • /
    • v.47 no.2
    • /
    • pp.43-47
    • /
    • 2014
  • Many advanced ground-based solar telescopes improve the spatial resolution of observation images using an adaptive optics (AO) system. As any AO correction remains only partial, it is necessary to use post-processing image reconstruction techniques such as speckle masking or shift-and-add (SAA) to reconstruct a high-spatial-resolution image from atmospherically degraded solar images. In the New Vacuum Solar Telescope (NVST), the spatial resolution in solar images is improved by frame selection and SAA. In order to overcome the burden of massive speckle data processing, we investigate the possibility of using the speckle reconstruction program in a real-time application at the telescope site. The code has been written in the C programming language and optimized for parallel processing in a multi-processor environment. We analyze the scalability of the code to identify possible bottlenecks, and we conclude that the presented code is capable of being run in real-time reconstruction applications at NVST and future large aperture solar telescopes if care is taken that the multi-processor environment has low latencies between the computation nodes.