• Title/Summary/Keyword: Parallel Optimal Implementation

Search Result 43, Processing Time 0.032 seconds

EFFICIENT PARALLEL GAUSSIAN NORMAL BASES MULTIPLIERS OVER FINITE FIELDS

  • Kim, Young-Tae
    • Honam Mathematical Journal
    • /
    • v.29 no.3
    • /
    • pp.415-425
    • /
    • 2007
  • The normal basis has the advantage that the result of squaring an element is simply the right cyclic shift of its coordinates in hardware implementation over finite fields. In particular, the optimal normal basis is the most efficient to hardware implementation over finite fields. In this paper, we propose an efficient parallel architecture which transforms the Gaussian normal basis multiplication in GF($2^m$) into the type-I optimal normal basis multiplication in GF($2^{mk}$), which is based on the palindromic representation of polynomials.

Parallel Implementation of the Recursive Least Square for Hyperspectral Image Compression on GPUs

  • Li, Changguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3543-3557
    • /
    • 2017
  • Compression is a very important technique for remotely sensed hyperspectral images. The lossless compression based on the recursive least square (RLS), which eliminates hyperspectral images' redundancy using both spatial and spectral correlations, is an extremely powerful tool for this purpose, but the relatively high computational complexity limits its application to time-critical scenarios. In order to improve the computational efficiency of the algorithm, we optimize its serial version and develop a new parallel implementation on graphics processing units (GPUs). Namely, an optimized recursive least square based on optimal number of prediction bands is introduced firstly. Then we use this approach as a case study to illustrate the advantages and potential challenges of applying GPU parallel optimization principles to the considered problem. The proposed parallel method properly exploits the low-level architecture of GPUs and has been carried out using the compute unified device architecture (CUDA). The GPU parallel implementation is compared with the serial implementation on CPU. Experimental results indicate remarkable acceleration factors and real-time performance, while retaining exactly the same bit rate with regard to the serial version of the compressor.

Implementing Distributed Optimal Power Flow Using the Alternating Direction Method

  • Chung Koohyung;Kim Balho H.;Song Kyung-Bin
    • KIEE International Transactions on Power Engineering
    • /
    • v.5A no.4
    • /
    • pp.412-415
    • /
    • 2005
  • The recent requirement for faster and more frequent solutions has encouraged the consideration of parallel implementations using decentralized processors. Distributed multi-processor environments can potentially greatly increase the available computational capacity and decrease the communication burden, allowing for faster Optimal Power Flow (OPF) solutions. This paper presents a mathematical approach to implementing distributed OPF using the alternating direction method (ADM) to parallelize the OPF. Several IEEE Reliability Test Systems were adopted to demonstrate the proposed algorithm.

Optimal Parallel Implementation of an Optimization Neural Network by Using a Multicomputer System (다중 컴퓨터 시스템을 이용한 최적화 신경회로망의 최적 병렬구현)

  • 김진호;최흥문
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.28B no.12
    • /
    • pp.75-82
    • /
    • 1991
  • We proposed an optimal parallel implementation of an optimization neural network with linear increase of speedup by using multicomputer system and presented performance analysis model of the system. We extracted the temporal-and the spatial-parallelism from the optimization neural network and constructed a parallel pipeline processing model using the parallelism in order to achieve the maximum speedup and efficiency on the CSP architecture. The results of the experiments for the TSP using the Transputer system, show that the proposed system gives linear increase of speedup proportional to the size of the optimization neural network for more than 140 neurons, and we can have more than 98% of effeciency upto 16-node system.

  • PDF

An Efficient Implementation of Decentralized Optimal Power Flow

  • Kim, Balho H.
    • Journal of Electrical Engineering and Technology
    • /
    • v.2 no.3
    • /
    • pp.335-341
    • /
    • 2007
  • In this study, we present an approach to parallelizing OPF that is suitable for distributed implementation and is applicable to very large inter-connected power systems. The approach could be used by utilities for optimal economy interchange without disclosing details of their operating costs to competitors. It could also be used to solve several other computational tasks, such as state estimation and power flow, in a distributed manner. The proposed algorithm was demonstrated with several case study systems.

Design and Implementation of 256-Point Radix-4 100 Gbit/s FFT Algorithm into FPGA for High-Speed Applications

  • Polat, Gokhan;Ozturk, Sitki;Yakut, Mehmet
    • ETRI Journal
    • /
    • v.37 no.4
    • /
    • pp.667-676
    • /
    • 2015
  • The third-party FFT IP cores available in today's markets do not provide the desired speed demands for optical communication. This study deals with the design and implementation of a 256-point Radix-4 100 Gbit/s FFT, where computational steps are reconsidered and optimized for high-speed applications, such as radar and fiber optics. Alternative methods for FFT implementation are investigated and Radix-4 is decided to be the optimal solution for our fully parallel FPGA application. The algorithms that we will implement during the development phase are to be tested on a Xilinx Virtex-6 FPGA platform. The proposed FFT core has a fully parallel architecture with a latency of nine clocks, and the target clock rate is 312.5 MHz.

A Low Complexity Bit-Parallel Multiplier over Finite Fields with ONBs (최적정규기저를 갖는 유한체위에서의 저 복잡도 비트-병렬 곱셈기)

  • Kim, Yong-Tae
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.4
    • /
    • pp.409-416
    • /
    • 2014
  • In H/W implementation for the finite field, the use of normal basis has several advantages, especially the optimal normal basis is the most efficient to H/W implementation in $GF(2^m)$. The finite field $GF(2^m)$ with type I optimal normal basis(ONB) has the disadvantage not applicable to some cryptography since m is even. The finite field $GF(2^m)$ with type II ONB, however, such as $GF(2^{233})$ are applicable to ECDSA recommended by NIST. In this paper, we propose a bit-parallel multiplier over $GF(2^m)$ having a type II ONB, which performs multiplication over $GF(2^m)$ in the extension field $GF(2^{2m})$. The time and area complexity of the proposed multiplier is the same as or partially better than the best known type II ONB bit-parallel multiplier.

Optimized Implementation of Block Cipher PIPO in Parallel-Way on 64-bit ARM Processors (64-bit ARM 프로세서 상에서의 블록암호 PIPO 병렬 최적 구현)

  • Eum, Si Woo;Kwon, Hyeok Dong;Kim, Hyun Jun;Jang, Kyoung Bae;Kim, Hyun Ji;Park, Jae Hoon;Song, Gyeung Ju;Sim, Min Joo;Seo, Hwa Jeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.8
    • /
    • pp.223-230
    • /
    • 2021
  • The lightweight block cipher PIPO announced at ICISC'20 has been effectively implemented by applying the bit slice technique. In this paper, we propose a parallel optimal implementation of PIPO for ARM processors. The proposed implementation enables parallel encryption of 8-plaintexts and 16-plaintexts. The implementation targets the A10x fusion processor. On the target processor, the existing reference PIPO code has performance of 34.6 cpb and 44.7 cpb in 64/128 and 64/256 standards. Among the proposed methods, the general implementation has a performance of 12.0 cpb and 15.6 cpb in the 8-plaintexts 64/128 and 64/256 standards, and 6.3 cpb and 8.1 cpb in the 16-plaintexts 64/128 and 64/256 standards. Compared to the existing reference code implementation, the 8-plaintexts parallel implementation for each standard has about 65.3%, 66.4%, and the 16-plaintexts parallel implementation, about 81.8%, and 82.1% better performance. The register minimum alignment implementation shows performance of 8.2 cpb and 10.2 cpb in the 8-plaintexts 64/128 and 64/256 specifications, and 3.9 cpb and 4.8 cpb in the 16-plaintexts 64/128 and 64/256 specifications. Compared to the existing reference code implementation, the 8-plaintexts parallel implementation has improved performance by about 76.3% and 77.2%, and the 16-plaintext parallel implementation is about 88.7% and 89.3% higher for each standard.

A New Parallel Multiplier for Type II Optimal Normal Basis (타입 II 최적 정규기저를 갖는 유한체의 새로운 병렬곱셈 연산기)

  • Kim Chang-Han;Jang Sang-Woon;Lim Jong-In;Ji Sung-Yeon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.16 no.4
    • /
    • pp.83-89
    • /
    • 2006
  • In H/W implementation for the finite field, the use of normal basis has several advantages, especially, the optimal normal basis is the most efficient to H/W implementation in GF($2^m$). In this paper, we propose a new, simpler, parallel multiplier over GF($2^m$) having a type II optimal normal basis, which performs multiplication over GF($2^m$) in the extension field GF($2^{2m}$). The time and area complexity of the proposed multiplier is same as the best of known type II optimal normal basis parallel multiplier.

Efficient Implementation of Optimal Extension Fields Using Parallel Computation (병렬 연산을 이용한 최적 확장체의 효율적 구현)

  • 이문규;박근수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04a
    • /
    • pp.269-271
    • /
    • 2003
  • 본 논문에서는 타원 곡선 암호의 성능을 향상시키기 위한 효율적인 최적 확장체 연산 알고리즘을 제안한다. 제안하는 알고리즘은 CPU에서 제공되는 정수 곱셈 명령 1회 실행에 두 개의 하위체 연산을 병렬적으로 수행하도록 함으로써 최적 확장체에서의 곱셈, 제곱, 역원 연산의 속도를 향상시킨다.

  • PDF