• Title/Summary/Keyword: Parallel Implementation

Search Result 880, Processing Time 0.039 seconds

On a Parallel-Structured High-Speed Implementation of the Word-Based Stream Cipher (워드기반 스트림암호의 병렬화 고속 구현 방안)

  • Lee, Hoon-Jae;Do, Kyung-Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.4
    • /
    • pp.859-867
    • /
    • 2010
  • In this paper, we propose some parallel structures of the word-based nonlinear combining functions in word-based stream cipher, high-speed versions of general (bit-based) nonlinear combining functions. Especially, we propose the high-speed structures of popular four kinds in word-based nonlinear combiners using by PS-WFSR (Parallel-Shifting or Parallel-Structured Word-based FSR): m-parallel word-based nonlinear combiner without memory, m-parallel word-based nonlinear combiner with memories, m-parallel word-based nonlinear filter function, and m-parallel word-based clock-controlled function. In addition, we propose an implementation example of the m-parallel word-based DRAGON stream cipher, and determine its cryptographic security and performance.

Finite element analysis of welding process by parallel computation (병렬 처리를 이용한 용접 공정 유한 요소 해석)

  • 임세영;김주완;최강혁;임재혁
    • Proceedings of the KWS Conference
    • /
    • 2003.11a
    • /
    • pp.156-158
    • /
    • 2003
  • An implicit finite element implementation for Leblond's transformation plasticity constitutive equations, which are widely used in welded steel structure is proposed in the framework of parallel computing. The implementation is based upon the multiplicative decomposition of deformation gradient and hyper elastic formulation. We examine the efficiency of parallel computation for the finite element analysis of a welded structure using domain-wise multi-frontal solver.

  • PDF

An experimental study on parallel implementation of an iterative method for large scale, sparse linear system (반복기법을 이용한 대규모, 소선형시스템의 병렬처리에 관한 연구)

  • 김상원;장수영
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1991.10a
    • /
    • pp.6-22
    • /
    • 1991
  • This thesis presents a parallel implementation of an iterative method for large scale, sparse linear system and gives result of computational experiments performed on both single transputer and multi transputer parallel computers. To solve linear system, we use conjugate gradient method and develope data storage techinique, data communication scheme. In addition to the explanation of conjugate gradient method, the result of computational experiment is summarized.

  • PDF

Three dimensional finite element analysis of art-welding processor via parallel compuating (아크 용접 공정의 3차원 병렬처리 유한 요소 해석)

  • 임세영;김주완;김현규;조영삼
    • Proceedings of the KWS Conference
    • /
    • 2002.05a
    • /
    • pp.161-163
    • /
    • 2002
  • An implicit finite element implementation for Leblond's transformation plasticity constitutive equations, which are widely used in welded steel structure is proposed in the framework of parallel computing. The implementation is based upon the updated Lagrangian formulation. We examine the efficiency of parallel compuatation for the finite element analysis of a welded structure using multi-frontal solver.

  • PDF

Efficient Parallel Block-layered Nonbinary Quasi-cyclic Low-density Parity-check Decoding on a GPU

  • Thi, Huyen Pham;Lee, Hanho
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.3
    • /
    • pp.210-219
    • /
    • 2017
  • This paper proposes a modified min-max algorithm (MMMA) for nonbinary quasi-cyclic low-density parity-check (NB-QC-LDPC) codes and an efficient parallel block-layered decoder architecture corresponding to the algorithm on a graphics processing unit (GPU) platform. The algorithm removes multiplications over the Galois field (GF) in the merger step to reduce decoding latency without any performance loss. The decoding implementation on a GPU for NB-QC-LDPC codes achieves improvements in both flexibility and scalability. To perform the decoding on the GPU, data and memory structures suitable for parallel computing are designed. The implementation results for NB-QC-LDPC codes over GF(32) and GF(64) demonstrate that the parallel block-layered decoding on a GPU accelerates the decoding process to provide a faster decoding runtime, and obtains a higher coding gain under a low $10^{-10}$ bit error rate and low $10^{-7}$ frame error rate, compared to existing methods.

Parallel Implementation of Radon Transform on TMS320C80-based System (TMS320C80시스템에서 Radon 변환의 병렬 구현)

  • 송정호;성효경최흥문
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.727-730
    • /
    • 1998
  • In this paper, we propose an implementation of an efficient parallel Radon transform on TMS320C80-based system. For an N$\times$N SAR image, we can obtain O(NM/p) of the conventional parallel Radon transform, by representing the projection patterns in Radon space variables instead of the image space variables, and pipelining the algorithm, where p is the number of processors and M is the number of projection angles. Also, we can reduce the time for the dynamic load distribution among the nodes and the communication overheads of accessing the global memories, by pipelining the memory and processing operations by using tripple buffer structure. Experimental results show an efficient parallel Radon transform of speedup Sp=3.9 and efficiency E=97.5% for 256$\times$256 image, when implemented on TMS320C80 composed of four parallel slave processors with three memory blocks.

  • PDF

Parallel Implementation Strategy for Content Based Video Copy Detection Using a Multi-core Processor

  • Liao, Kaiyang;Zhao, Fan;Zhang, Mingzhu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.10
    • /
    • pp.3520-3537
    • /
    • 2014
  • Video copy detection methods have emerged in recent years for a variety of applications. However, the lack of efficiency in the usual retrieval systems restricts their use. In this paper, we propose a parallel implementation strategy for content based video copy detection (CBCD) by using a multi-core processor. This strategy can support video copy detection effectively, and the processing time tends to decrease linearly as the number of processors increases. Experiments have shown that our approach is successful in speeding up computation and as well as in keeping the performance.

A design of synchronous nonlinear and parallel for pipeline stage on IP-based H.264 decoder implementation (IP기반 H.264 디코더 설계를 위한 동기식 비선형 및 병렬화 파이프라인 설계)

  • Ko, Byung-Soo;Kong, Jin-Hyeung
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.409-410
    • /
    • 2008
  • This paper presents nonlinear and parallel design for synchronous pipelining in IP-based H.264 decoder implementation. Since H.264 decoder includes the dataflow of feedback loop, the data dependency requires one NOP stage per pipelining latency to drop the throughput into 1/2. Further, it is found that, in execution time, the stage scheduled for MC is more occupied than that for CAVLD/ITQ/DF. The less efficient stage would be improved by nonlinear scheduling, while the fully-utilized stage could be accelerated by parallel scheduling of IP. The optimization yields 3 nonlinear {CAVLD&ITQ}|3 parallel (MC/IP&Rec.)| 3 nonlinear {DF} pipelined architecture for IP-based H.264 decoder. In experiments, the nonlinear and parallel pipelined H.264 decoder, including existing IPs, could deal with full HD video at 41.86MHz, in real time processing.

  • PDF

PARALLEL IMPLEMENTATION OF HYBRID ITERATIVE METHODS FOR NONSYMMETRIC LINEAR SYSTEMS

  • Yun, Jae-Heon;Kim, Sang-Wook
    • Journal of applied mathematics & informatics
    • /
    • v.4 no.1
    • /
    • pp.1-16
    • /
    • 1997
  • In this paper we study efficient parallel implementation for hybrid iterative methods BICGSTAB and BICGSTAB $(\ell)$ with ${Well}=2$ on the CRAY C90 and the efficiency of their parallel performance is evaluated. numerical experiments suggest that on the CRAY C90 a parallel inner product algorithm called PDOTB be used for the par-allelization of hybrid iterative methods containing sensitive values of inner products. Lastly it is shown that the number of iterations in which parallel hybrid iterative methods satisfy a certain convergence criterion depends on the number of processors to be used.

Analysis of a Parallel 3 Degree-of-Freedom Spherical Module and its Implementation as a Force Reflecting Manual Controller (병렬형 3자유도 구형 모듈의 해석과 힘반영 원격조종기로의 구현)

  • 김희국;이병주
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.18 no.10
    • /
    • pp.2501-2513
    • /
    • 1994
  • In this paper, a compact, light-weight, universal, spherical 3-degree-of-freedom, parallel-structured manual controller with high reflecting-force capability is implemented. First, the position analysis, kinematic modeling and analysis, force reflecting transformation, and applied force control schemes for a parallel structured 3 degree-of-freedom spherical system have been described. Then, a brief description of the system integration, its actual implementation hardware, and its preliminary analysis results are presented. The implemented parallel 3 degree-of-freedom spherical module is equipped with high gear-ratio reducers, and the friction due to the reducers is minimized by employing a force control algorithm, which results in a "power steering" effect for enhanced smoothness and transparency (for compactness and reduced weight).d weight).