• Title/Summary/Keyword: parallel implementation

Search Result 877, Processing Time 0.03 seconds

Optimized Implementation of PIPO Lightweight Block Cipher on 32-bit RISC-V Processor (32-bit RISC-V상에서의 PIPO 경량 블록암호 최적화 구현)

  • Eum, Si Woo;Jang, Kyung Bae;Song, Gyeong Ju;Lee, Min Woo;Seo, Hwa Jeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.6
    • /
    • pp.167-174
    • /
    • 2022
  • PIPO lightweight block ciphers were announced in ICISC'20. In this paper, a single-block optimization implementation and parallel optimization implementation of PIPO lightweight block cipher ECB, CBC, and CTR operation modes are performed on a 32-bit RISC-V processor. A single block implementation proposes an efficient 8-bit unit of Rlayer function implementation on a 32-bit register. In a parallel implementation, internal alignment of registers for parallel implementation is performed, and a method for four different blocks to perform Rlayer function operations on one register is described. In addition, since it is difficult to apply the parallel implementation technique to the encryption process in the parallel implementation of the CBC operation mode, it is proposed to apply the parallel implementation technique in the decryption process. In parallel implementation of the CTR operation mode, an extended initialization vector is used to propose a register internal alignment omission technique. This paper shows that the parallel implementation technique is applicable to several block cipher operation modes. As a result, it is confirmed that the performance improvement is 1.7 times in a single-block implementation and 1.89 times in a parallel implementation compared to the performance of the existing research implementation that includes the key schedule process in the ECB operation mode.

Novel Parallel Approach for SIFT Algorithm Implementation

  • Le, Tran Su;Lee, Jong-Soo
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.298-306
    • /
    • 2013
  • The scale invariant feature transform (SIFT) is an effective algorithm used in object recognition, panorama stitching, and image matching. However, due to its complexity, real-time processing is difficult to achieve with current software approaches. The increasing availability of parallel computers makes parallelizing these tasks an attractive approach. This paper proposes a novel parallel approach for SIFT algorithm implementation using a block filtering technique in a Gaussian convolution process on the SIMD Pixel Processor. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and input/output capabilities of the processor, which results in a system that can perform real-time image and video compression. We apply this implementation to images and measure the effectiveness of such an approach. Experimental simulation results indicate that the proposed method is capable of real-time applications, and the result of our parallel approach is outstanding in terms of the processing performance.

Parallel Implementation of the Recursive Least Square for Hyperspectral Image Compression on GPUs

  • Li, Changguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3543-3557
    • /
    • 2017
  • Compression is a very important technique for remotely sensed hyperspectral images. The lossless compression based on the recursive least square (RLS), which eliminates hyperspectral images' redundancy using both spatial and spectral correlations, is an extremely powerful tool for this purpose, but the relatively high computational complexity limits its application to time-critical scenarios. In order to improve the computational efficiency of the algorithm, we optimize its serial version and develop a new parallel implementation on graphics processing units (GPUs). Namely, an optimized recursive least square based on optimal number of prediction bands is introduced firstly. Then we use this approach as a case study to illustrate the advantages and potential challenges of applying GPU parallel optimization principles to the considered problem. The proposed parallel method properly exploits the low-level architecture of GPUs and has been carried out using the compute unified device architecture (CUDA). The GPU parallel implementation is compared with the serial implementation on CPU. Experimental results indicate remarkable acceleration factors and real-time performance, while retaining exactly the same bit rate with regard to the serial version of the compressor.

Implementation and Performance Analysis of a Parallel SIMPLER Model Based on Domain Decomposition (영역 분할에 의한 SIMPLER 모델의 병렬화와 성능 분석)

  • Kwak Ho Sang;Lee Sangsan
    • Journal of computational fluids engineering
    • /
    • v.3 no.1
    • /
    • pp.22-29
    • /
    • 1998
  • Parallel implementation is conducted for a SIMPLER finite volume model. The present parallelism is based on domain decomposition and explicit message passing using MPI and SHMEM. Two parallel solvers to tridiagonal matrix equation are employed. The implementation is verified on the Cray T3E system for a benchmark problem of natural convection in a sidewall-heated cavity. The test results illustrate good scalability of the present parallel models. Performance issues are elaborated in view of convergence as well as conventional parallel overheads and single processor performance. The effectiveness of a localized matrix solution algorithm is demonstrated.

  • PDF

Parallel Implementation of SIMPLER by Using Domain Decomposition Technique (영역분할법에 의한 SIMPLER 기법의 병렬화)

  • Kwak Ho Sang
    • 한국전산유체공학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.23-28
    • /
    • 1997
  • A parallel implementation is made of a two-dimensional finite volume model based on the SIMPLER. The solution domain is decomposed into several subdomains and the solution at each subdomain is acquired by parallel use of multiple processors. Communications between processors are accomplished by using the standard MPI and the Cray-specific SHMEM. The parallelization method for the overall solution procedure to the Navier-Stokes equations is described in detail, The parallel implementation is validated on the Cray T3E system for a benchmark problem of natural convection in a sidewall-heated cavity. The parallel performance is assessed and the issues encountered in achieving a high-performance parallel model are elaborated.

  • PDF

A Study on the Implementation of GPSS Program on a Parallel Computer (GPSS 프로그램의 병렬화에 관한 연구)

  • 윤정미
    • Journal of the Korea Society for Simulation
    • /
    • v.8 no.2
    • /
    • pp.57-72
    • /
    • 1999
  • With the rapidly increasing complexity of decision-marking or system development in the fields of industry, management, etc., modelling techniques using simulation has become more highlighted. Particularly, the advent of parallel computer systems not only has opened a new horizon of parallel simulation, but also has greatly contributed to the speed-up of the execution of simulation. The implementation of parallel simulation, however, is not a easy job for those who accustomed to the existing computer systems. And it is also necessarily confronted with the problem of synchronization conflict in the process. Thus, how to allow a wider community of users to gain access to parallel simulation while solving synchronization conflicts has become an important issue in simulation study. As a method to solve these problems, this paper is primarily concerned with the implementation of GPSS which is a generally used simulation language for discrete event simulation, onto a parallel computer using C-LINDA. For that, this paper, is to suggest a model and algorithm and to experiment it using a case.

  • PDF

A study on the m-Parallel Nonlinear Combine functions for the Parallel Stream Cipher (병렬 스트림암호를 위한 m-병렬 비선형 결합함수에 관한 연구)

  • 이훈재;문상재
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.4A
    • /
    • pp.301-309
    • /
    • 2002
  • In this paper, we propose the effective implementation of various nonlinear combiners using by PS-LFSR: m-parallel memoryless-nonlinear combiner, m-parallel memory-nonlinear combiner, m-parallel nonlinear filter function, and m-parallel clock-controlled function. Finally, we propose m-parallel LILI-128 stream cipher as an example of the parallel implementation, and we determine its cryptographic security and performance.

Parallel implementation of a neural network-based realtime ATR system using a multicomputer (다중컴퓨터를 이용한 신경회로망 기반 실시간 자동 표적인식시스템의 병렬구현)

  • 전준형;김성완;김진호;최흥문
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.2
    • /
    • pp.197-208
    • /
    • 1996
  • A neural network-based PSRI(position, scale, and rotation invariant) feature extraction and ATR (automatic target recognition) system are proposed and an efficient parallel implementatio of the proposed system using multicomputer is also presented. In the proposed system, the scale and rotationinvariant features are extracted from the contour projection of the number of edge pixels on each of the concentric circles, which is input t the cooperative network. We proposed how to decide the optimum depth and the width of the parallel pipeline system for real time applications by modeling the proposed system into a parallel pipeline implementation method using transputers is also proposed. The implementation results show that we can extract PSRI features less sensitive to input variations, and the speedup of the proposed ATR system is about 7.55 for the various rotated and scaled targets using 8-node transputer system.

  • PDF

Design and Implementation of a Latency Efficient Encoder for LTE Systems

  • Hwang, Soo-Yun;Kim, Dae-Ho;Jhang, Kyoung-Son
    • ETRI Journal
    • /
    • v.32 no.4
    • /
    • pp.493-502
    • /
    • 2010
  • The operation time of an encoder is one of the critical implementation issues for satisfying the timing requirements of Long Term Evolution (LTE) systems because the encoder is based on binary operations. In this paper, we propose a design and implementation of a latency efficient encoder for LTE systems. By virtue of 8-bit parallel processing of the cyclic redundancy checking attachment, code block (CB) segmentation, and a parallel processor, we are able to construct engines for turbo codings and rate matchings of each CB in a parallel fashion. Experimental results illustrate that although the total area and clock period of the proposed scheme are 19% and 6% larger than those of a conventional method based on a serial scheme, respectively, our parallel structure decreases the latency by about 32% to 65% compared with a serial structure. In particular, our approach is more latency efficient when the encoder processes a number of CBs. In addition, we apply the proposed scheme to a real system based on LTE, so that the timing requirement for ACK/NACK transmission is met by employing the encoder based on the parallel structure.

A Parallel Search Algorithm and Its Implementation for Digital k-Winners-Take-All Circuit

  • Yoon, Myungchul
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.4
    • /
    • pp.477-483
    • /
    • 2015
  • The k-Winners-Take-All (kWTA) is an operation to find the largest k (>1) inputs among N inputs. Parallel search algorithm of kWTA for digital inputs is not invented yet, so most of digital kWTA architectures have O(N) time complexity. A parallel search algorithm for digital kWTA operation and the circuits for its VLSI implementation are presented in this paper. The proposed kWTA architecture can compare all inputs simultaneously in parallel. The time complexity of the new architecture is O(logN), so that it is scalable to a large number of digital data. The high-speed kWTA operation and its O(logN) dependency of the new architecture are verified by simulations. It takes 290 ns in searching for 5 winners among 1024 of 32 bit data, which is more than thousands of times faster than existing digital kWTA circuits, as well as existing analog kWTA circuits.