• Title/Summary/Keyword: parallel architectures

Search Result 127, Processing Time 0.02 seconds

High-Performance Low-Power FFT Cores

  • Han, Wei;Erdogan, Ahmet T.;Arslan, Tughrul;Hasan, Mohd.
    • ETRI Journal
    • /
    • v.30 no.3
    • /
    • pp.451-460
    • /
    • 2008
  • Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low-power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low-power commutators based on an advanced interconnection, and parallel-pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel-pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures.

  • PDF

Parallel Implementations of Digital Focus Indices Based on Minimax Search Using Multi-Core Processors

  • HyungTae, Kim;Duk-Yeon, Lee;Dongwoon, Choi;Jaehyeon, Kang;Dong-Wook, Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.2
    • /
    • pp.542-558
    • /
    • 2023
  • A digital focus index (DFI) is a value used to determine image focus in scientific apparatus and smart devices. Automatic focus (AF) is an iterative and time-consuming procedure; however, its processing time can be reduced using a general processing unit (GPU) and a multi-core processor (MCP). In this study, parallel architectures of a minimax search algorithm (MSA) are applied to two DFIs: range algorithm (RA) and image contrast (CT). The DFIs are based on a histogram; however, the parallel computation of the histogram is conventionally inefficient because of the bank conflict in shared memory. The parallel architectures of RA and CT are constructed using parallel reduction for MSA, which is performed through parallel relative rating of the image pixel pairs and halved the rating in every step. The array size is then decreased to one, and the minimax is determined at the final reduction. Kernels for the architectures are constructed using open source software to make it relatively platform independent. The kernels are tested in a hexa-core PC and an embedded device using Lenna images of various sizes based on the resolutions of industrial cameras. The performance of the kernels for the DFIs was investigated in terms of processing speed and computational acceleration; the maximum acceleration was 32.6× in the best case and the MCP exhibited a higher performance.

Performance Analysis of Parallel Database Machine Architectures (병렬 데이타베이스 컴퓨터 구조의 성능 분석)

  • Lee, Yong-Kyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.4
    • /
    • pp.873-882
    • /
    • 1998
  • The parallel database machine approach is currently widely and successfully used. There are four major architectures which are used in this approach: shared-nothing architecture, shared-evertying architecture, shared-disk architecture, and hybrid architecture. In this paper, we use an analytical model to evaluate the performance of these database machine architectures. We define an abstract model for each type of database machine design to obtain performance equatons describing the execution times with respect to the hybrid hash join poeration. Using the performance equations, we evaluate the execution times of the various database machine design models.

  • PDF

An Aggressive Register Allocation Algorithm for EPIC Architectures (EPIC 아키텍쳐를 위한 적극적 레지스터 할당 알고리듬)

  • Choe, Jun-Gi;Lee, Sang-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.2
    • /
    • pp.497-511
    • /
    • 1999
  • Recently, many parallel processing technologies were developed, ILP(Instruction level Parallelism) processor's performance have been growed very rapidly. especially, EPIC(Explicitly Parallel Instruction computing) architectures attempt to enhance the performance in the predicated execution and speculative execution with the hardware. In this paper to improve the code scheduling possibility by applying to the characteristics of EPIC architectures, a new register allocation algorithm is proposed. And we proves that proposed register allocation algorithm is more efficient scheme than the conventional scheme when predicated execution is applied to our scheme by experiments. In experimental results, it shows much more performance enhancement, about 19% in proposed scheme than the conventional scheme. So, our scheme is verified that it is an effective register allocation method.

  • PDF

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • v.43 no.5
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.

A Development of Distributed Parallel Processing algorithm for Power Flow analysis (전력 조류 계산의 분산 병렬처리기법에 관한 연구)

  • Lee, Chun-Mo;Lee, Hae-Ki
    • Proceedings of the KIEE Conference
    • /
    • 2001.07e
    • /
    • pp.134-140
    • /
    • 2001
  • Parallel processing has the potential to be cost effectively used on computationally intense power system problems. But this technology is not still available is not only parallel computer but also parallel processing scheme. Testing these algorithms to ensure accuracy, and evaluation of their performance is also an issue. Although a significant amount of parallel algorithms of power system problem have been developed in last decade, actual testing on processor architectures lies in the beginning stages. This paper presents the parallel processing algorithm to supply the base being able to treat power flow by newton's method by the distributed memory type parallel computer. This method is to assign and to compute teared blocks of sparse matrix at each parallel processors. The testing to insure accuracy of developed method have been done on serial computer by trying to simulate a parallel environment.

  • PDF

A Study on High Speed LDPC Decoder Based on HSS (HSS기반의 고속 LDPC 복호기 연구)

  • Jung, Ji Won
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.5 no.3
    • /
    • pp.164-168
    • /
    • 2012
  • LDPC decoder architectures are generally classified into serial, parallel and partially parallel architectures. Conventional method of LDPC decoding in general give rise to a large number of computation operations, mass power consumption, and decoding delay. It is necessary to reduce the iteration numbers and computation operations without performance degradation. This paper studies Horizontal Shuffle Scheduling (HSS) algorithm. In the result, number of iteration is half than conventional algorithm without performance degradation. Finally, this paper present design methodology of high-speed LDPC decoder and confirmed its throughput is up to about 600Mbps.

A Development of Parallel Processing for Power Flow analysis (전력 조류 계산의 병렬처리에 관한 연구)

  • Lee, Chun-Mo
    • The Transactions of the Korean Institute of Electrical Engineers P
    • /
    • v.51 no.2
    • /
    • pp.55-59
    • /
    • 2002
  • Parallel processing is able to be used effectively on computationally intense power system problems. But this technology is not still available is not only parallel computer but also parallel processing scheme. Testing these algorithms to ensure accuracy, and evaluation of their performance is also an issue. Although a significant amount of parallel algorithms of power system problem have been developed in last decade, actual testing on parallel computer architectures lies in the beginning stages because no clear cut paths. This paper presents Jacobian modeling method to supply the base being able to treat power flow by newton's method by the computer. This method is to assign and to compute teared blocks of sparse matrix at each parallel processors. The testing to insure accuracy of developed method have been done on serial computer by trying to simulate a parallel environment.

Performance Optimization of Parallel Algorithms

  • Hudik, Martin;Hodon, Michal
    • Journal of Communications and Networks
    • /
    • v.16 no.4
    • /
    • pp.436-446
    • /
    • 2014
  • The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.

Adaptive and optimized agent placement scheme for parallel agent-based simulation

  • Jin, Ki-Sung;Lee, Sang-Min;Kim, Young-Chul
    • ETRI Journal
    • /
    • v.44 no.2
    • /
    • pp.313-326
    • /
    • 2022
  • This study presents a noble scheme for distributed and parallel simulations with optimized agent placement for simulation instances. The traditional parallel simulation has some limitations in that it does not provide sufficient performance even though using multiple resources. The main reason for this discrepancy is that supporting parallelism inevitably requires additional costs in addition to the base simulation cost. We present a comprehensive study of parallel simulation architectures, execution flows, and characteristics. Then, we identify critical challenges for optimizing large simulations for parallel instances. Based on our cost-benefit analysis, we propose a novel approach to overcome the performance constraints of agent-based parallel simulations. We also propose a solution for eliminating the synchronizing cost among local instances. Our method ensures balanced performance through optimal deployment of agents to local instances and an adaptive agent placement scheme according to the simulation load. Additionally, our empirical evaluation reveals that the proposed model achieves better performance than conventional methods under several conditions.