• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.032 seconds

Efficient Implementation of a Pseudorandom Sequence Generator for High-Speed Data Communications

  • Hwang, Soo-Yun;Park, Gi-Yoon;Kim, Dae-Ho;Jhang, Kyoung-Son
    • ETRI Journal
    • /
    • v.32 no.2
    • /
    • pp.222-229
    • /
    • 2010
  • A conventional pseudorandom sequence generator creates only 1 bit of data per clock cycle. Therefore, it may cause a delay in data communications. In this paper, we propose an efficient implementation method for a pseudorandom sequence generator with parallel outputs. By virtue of the simple matrix multiplications, we derive a well-organized recursive formula and realize a pseudorandom sequence generator with multiple outputs. Experimental results show that, although the total area of the proposed scheme is 3% to 13% larger than that of the existing scheme, our parallel architecture improves the throughput by 2, 4, and 6 times compared with the existing scheme based on a single output. In addition, we apply our approach to a $2{\times}2$ multiple input/multiple output (MIMO) detector targeting the 3rd Generation Partnership Project Long Term Evolution (3GPP LTE) system. Therefore, the throughput of the MIMO detector is significantly enhanced by parallel processing of data communications.

GPU-ACCELERATED SPECKLE MASKING RECONSTRUCTION ALGORITHM FOR HIGH-RESOLUTION SOLAR IMAGES

  • Zheng, Yanfang;Li, Xuebao;Tian, Huifeng;Zhang, Qiliang;Su, Chong;Shi, Lingyi;Zhou, Ta
    • Journal of The Korean Astronomical Society
    • /
    • v.51 no.3
    • /
    • pp.65-71
    • /
    • 2018
  • The near real-time speckle masking reconstruction technique has been developed to accelerate the processing of solar images to achieve high resolutions for ground-based solar telescopes. However, the reconstruction of solar subimages in such a speckle reconstruction is very time-consuming. We design and implement a new parallel speckle masking reconstruction algorithm based on the Compute Unified Device Architecture (CUDA) on General Purpose Graphics Processing Units (GPGPU). Tests are performed to validate the correctness of our program on NVIDIA GPGPU. Details of several parallel reconstruction steps are presented, and the parallel implementation between various modules shows a significant speed increase compared to the previous serial implementations. In addition, we present a comparison of runtimes across serial programs, the OpenMP-based method, and the new parallel method. The new parallel method shows a clear advantage for large scale data processing, and a speedup of around 9 to 10 is achieved in reconstructing one solar subimage of $256{\times}256pixels$. The speedup performance of the new parallel method exceeds that of OpenMP-based method overall. We conclude that the new parallel method would be of value, and contribute to real-time reconstruction of an entire solar image.

A linear array SliM-II image processor chip (선형 어레이 SliM-II 이미지 프로세서 칩)

  • 장현만;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.2
    • /
    • pp.29-35
    • /
    • 1998
  • This paper describes architectures and design of a SIMD type parallel image processing chip called SliM-II. The chiphas a linear array of 64 processing elements (PEs), operates at 30 MHz in the worst case simulation and gives at least 1.92 GIPS. In contrast to existing array processors, such as IMAP, MGAP-2, VIP, etc., each PE has a multiplier that is quite effective for convolution, template matching, etc. The instruction set can execute an ALU operation, data I/O, and inter-PE communication simulataneously in a single instruction cycle. In addition, during the ALU/multiplier operation, SliM-II provides parallel move between the register file and on-chip memory as in DSP chips, SliM-II can greatly reduce the inter-PE communication overhead, due to the idea a sliding, which is a technique of overlapping inter-PE communication with computation. Moreover, the bandwidth of data I/O and inter-PE communication increases due to bit-parallel data paths. We used the COMPASS$^{TM}$ 3.3 V 0.6.$\mu$m standrd cell library (v8r4.10). The total number of transistors is about 1.5 muillions, the core size is 13.2 * 13.0 mm$^{2}$ and the package type is 208 pin PQ2 (Power Quad 2). The performance evaluation shows that, compared to a existing array processors, a proposed architeture gives a significant improvement for algorithms requiring multiplications.s.

  • PDF

Algorithmic GPGPU Memory Optimization

  • Jang, Byunghyun;Choi, Minsu;Kim, Kyung Ki
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.391-406
    • /
    • 2014
  • The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

An Empirical Study on Jitter between Two Servers Port Connections

  • Lee, Sang-Bock;Kim, Hyun-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.87-94
    • /
    • 2006
  • The purpose of this paper is to measure jitter between two server systems. Given 3 empirical models as user and port parallel server types under some conditions of 100 Mbps and optimal CPU temperature suggested by Lee and Kim(2005), our results are shown; jitter was usually measured above 3000 ms in most empirical cases, jumping points were observed around 250 processing traffics, and port parallel model was optimal in our cases.

  • PDF

Efficient Processing Technique for Unavailable Data in Hardware Implementation of Motion Estimator with Parallel Processing Architecture (움직임 추정기의 병렬처리 구조 하드웨어 구현시비유효 데이터의 효율적인처리 방법)

  • Park, Jong-Hwa;Kang, Hyun-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.2
    • /
    • pp.1-9
    • /
    • 2009
  • In this paper, we propose the efficient processing technique for unavailable data in hardware implementation of motion estimator in H.264/AVC with parallel processing architecture. Motion estimation processing in the hardware is generally based on pipe-lining, some MV data of neighbor blocks are not available, whereas all MV data are valid in software processing where the data are sequentially processed. In this paper, we solve the problem of data being unavailable in MVp computation. To minimize the quality degradation caused by unavailable MVs, in the proposed method, the unavailable MV of a neighboring block is replaced with an integer pel unit MV, an MVp of neighboring blocks, or an MVcol (MV of co-located block). Comparing to the conventional method [7], our method outperformed maximally 0.832dB and 0.179dB for QCIF and CIF, respectively, in terms of BDPSNR.

PARALLEL IMPROVEMENT IN STRUCTURED CHIMERA GRID ASSEMBLY FOR PC CLUSTER (PC 클러스터를 위한 정렬 중첩 격자의 병렬처리)

  • Kim, Eu-Gene;Kwon, Jang-Hyuk
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2005.10a
    • /
    • pp.157-162
    • /
    • 2005
  • Parallel implementation and performance assessment of the grid assembly in a structured chimera grid approach is studied. The grid assembly process, involving hole cutting and searching donor, is parallelized on the PC cluster. A message passing programming model based on the MPI library is implemented using the single program multiple data(SPMD) paradigm. The coarse-grained communication is optimized with the minimized memory allocation because that the parallel grid assembly can access the decomposed geometry data in other processors by only message passing in the distributed memory system such as a PC cluster. The grid assembly workload is based on the static load balancing tied to flow solver. A goal of this work is a development of parallelized grid assembly that is suited for handling multiple moving body problems with large grid size.

  • PDF

Benchmarks for Performance Testing of MPI-IO on the General Parallel File System (범용 병렬화일 시스템 상에서 MPI-IO 방안의 성능 평가 벤티마크)

  • Park, Seong-Sun
    • The KIPS Transactions:PartA
    • /
    • v.8A no.2
    • /
    • pp.125-132
    • /
    • 2001
  • IBM developed the MPI-IO, we call it MPI-2, on the General Parallel File System. We designed and implemented various Matrix Multiplication Benchmarks to evaluate its performances. The MPI-IO on the General Parallel File System shows four kinds of data access methods : the non-collective and blocking, the collective and blocking, the non-collective and non-blocking, and the split collective operation. In this paper, we propose benchmarks to measure the IO time and the computation time for the data access methods. We describe not only its implementation but also the performance evaluation results.

  • PDF

The Efficient Method of Parallel Genetic Algorithm using MapReduce of Big Data (빅 데이터의 MapReduce를 이용한 효율적인 병렬 유전자 알고리즘 기법)

  • Hong, Sung-Sam;Han, Myung-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.5
    • /
    • pp.385-391
    • /
    • 2013
  • Big Data is data of big size which is not processed, collected, stored, searched, analyzed by the existing database management system. The parallel genetic algorithm using the Hadoop for BigData technology is easily realized by implementing GA(Genetic Algorithm) using MapReduce in the Hadoop Distribution System. The previous study that the genetic algorithm using MapReduce is proposed suitable transforming for the GA by MapReduce. However, they did not show good performance because of frequently occurring data input and output. In this paper, we proposed the MRPGA(MapReduce Parallel Genetic Algorithm) using improvement Map and Reduce process and the parallel processing characteristic of MapReduce. The optimal solution can be found by using the topology, migration of parallel genetic algorithm and local search algorithm. The convergence speed of the proposal method is 1.5 times faster than that of the existing MapReduce SGA, and is the optimal solution can be found quickly by the number of sub-generation iteration. In addition, the MRPGA is able to improve the processing and analysis performance of Big Data technology.

화상처리를 이용한 표면 실장 기판 외관 검사

  • 백갑환;김현곤;김기현;유건희
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 1992.04a
    • /
    • pp.343-348
    • /
    • 1992
  • Using the real-time image processing technique, we have developed an automatic visual inspection system which detects the defects of the surface muonted components in PCB( missing components, mislocation, mismounts, and reverse polarity, etc ) and collects the quality control and production management data. An image processing system based on a commercial parallel processor, TRANSPUTER by which the image processing time can be largely reduced was designed. Analyzing the collected data, the proposed inspection system contributes to the productivity improvement throughthe reduction of defective rate.