• 제목/요약/키워드: parallel architectures

검색결과 127건 처리시간 0.029초

High-Performance Low-Power FFT Cores

  • Han, Wei;Erdogan, Ahmet T.;Arslan, Tughrul;Hasan, Mohd.
    • ETRI Journal
    • /
    • 제30권3호
    • /
    • pp.451-460
    • /
    • 2008
  • Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low-power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low-power commutators based on an advanced interconnection, and parallel-pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel-pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures.

  • PDF

Parallel Implementations of Digital Focus Indices Based on Minimax Search Using Multi-Core Processors

  • HyungTae, Kim;Duk-Yeon, Lee;Dongwoon, Choi;Jaehyeon, Kang;Dong-Wook, Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권2호
    • /
    • pp.542-558
    • /
    • 2023
  • A digital focus index (DFI) is a value used to determine image focus in scientific apparatus and smart devices. Automatic focus (AF) is an iterative and time-consuming procedure; however, its processing time can be reduced using a general processing unit (GPU) and a multi-core processor (MCP). In this study, parallel architectures of a minimax search algorithm (MSA) are applied to two DFIs: range algorithm (RA) and image contrast (CT). The DFIs are based on a histogram; however, the parallel computation of the histogram is conventionally inefficient because of the bank conflict in shared memory. The parallel architectures of RA and CT are constructed using parallel reduction for MSA, which is performed through parallel relative rating of the image pixel pairs and halved the rating in every step. The array size is then decreased to one, and the minimax is determined at the final reduction. Kernels for the architectures are constructed using open source software to make it relatively platform independent. The kernels are tested in a hexa-core PC and an embedded device using Lenna images of various sizes based on the resolutions of industrial cameras. The performance of the kernels for the DFIs was investigated in terms of processing speed and computational acceleration; the maximum acceleration was 32.6× in the best case and the MCP exhibited a higher performance.

병렬 데이타베이스 컴퓨터 구조의 성능 분석 (Performance Analysis of Parallel Database Machine Architectures)

  • 이용규
    • 한국정보처리학회논문지
    • /
    • 제5권4호
    • /
    • pp.873-882
    • /
    • 1998
  • 현재 병렬 데이타베이스 컴퓨터가 광범위하고 성공적으로 활용되고 있다. 이의 구조로는 주기억 장치와 디스크를 공유하지 않는 구조, 두가지를 모두 공유하는 구조, 디스크만을 공유하는 구조, 그리고 절충형 구조 등의 네가지 구조가 있다. 이 논문에서는 데이타베이스 컴퓨터 구조의 성능을 비교 분석하기 위하여 데이타베이스 컴퓨터 구조를 추상적인 모형으로 정의하고, 각각의 모형에 대하여 절충형 해쉬 조인 연산의 수행시간을 수식화한 성능식을 구하여 여러 가지 데이타베이스 컴퓨터 구조 모형의 수행시간을 비교 분석한다.

  • PDF

EPIC 아키텍쳐를 위한 적극적 레지스터 할당 알고리듬 (An Aggressive Register Allocation Algorithm for EPIC Architectures)

  • 최준기;이상정
    • 한국정보처리학회논문지
    • /
    • 제6권2호
    • /
    • pp.497-511
    • /
    • 1999
  • 최근 많은 명령어 수준 병렬 처리 기술들이 개발되면서 ILP 프로세서 성능이 급격히 증가하고 있다. 특히, 새로운 기술로 주목 받고 있는 EPIC(Explicitly Parallel Instruction Computing) 아키텍쳐는 조건실행 (Predicated Execution)과 투기적실행(Speculative execution)을 하드웨어와 접목하여 성능 향상을 시도하고 있다. 본 논문에서는 EPIC 아키텍쳐의 특성을 최대로 활용하여 코드 스케줄 가능성을 높이는 새로운 레지스터 할당 알고리듬을 제안한다. 그리고, 제안된 레지스터 할당 알고리듬은 조건실행의 적용으로 인하여 더욱 효율을 높일 수 있음을 실험을 통하여 입증한다. 실험 결과 기존의 레지스터 할당 방법에 비하여 평균 19%의 성능 향상을 보임으로써 제안된 레지스터 할당 방법이 효과적임을 검증한다.

  • PDF

New execution model for CAPE using multiple threads on multicore clusters

  • Do, Xuan Huyen;Ha, Viet Hai;Tran, Van Long;Renault, Eric
    • ETRI Journal
    • /
    • 제43권5호
    • /
    • pp.825-834
    • /
    • 2021
  • Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.

전력 조류 계산의 분산 병렬처리기법에 관한 연구 (A Development of Distributed Parallel Processing algorithm for Power Flow analysis)

  • 이춘모;이해기
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2001년도 학술대회 논문집 전문대학교육위원
    • /
    • pp.134-140
    • /
    • 2001
  • Parallel processing has the potential to be cost effectively used on computationally intense power system problems. But this technology is not still available is not only parallel computer but also parallel processing scheme. Testing these algorithms to ensure accuracy, and evaluation of their performance is also an issue. Although a significant amount of parallel algorithms of power system problem have been developed in last decade, actual testing on processor architectures lies in the beginning stages. This paper presents the parallel processing algorithm to supply the base being able to treat power flow by newton's method by the distributed memory type parallel computer. This method is to assign and to compute teared blocks of sparse matrix at each parallel processors. The testing to insure accuracy of developed method have been done on serial computer by trying to simulate a parallel environment.

  • PDF

HSS기반의 고속 LDPC 복호기 연구 (A Study on High Speed LDPC Decoder Based on HSS)

  • 정지원
    • 한국정보전자통신기술학회논문지
    • /
    • 제5권3호
    • /
    • pp.164-168
    • /
    • 2012
  • 본 논문에서는 DVB-S2 표준안에서 권고되고 있는 irregular LDPC 부호의 고속화 방안에 대한 연구를 하였다. LDPC 복호기에서는 많은 반복횟수와 많은 연산량이 복호 속도 저하의 원인이 되고 있으며, 성능 저하 없이 반복횟수와 연산량을 감소하기 위해서 HSS 기반의 LDPC 복호 구조를 제시하였다. 결과 반복횟수를 성능 저하 없이 절반으로 줄일 수 있으며, 이를 효율적인 설계방안을 제시하였다. 결과 600Mbps급의 throughput을 갖는 LDPC 복호기를 구현 가능케 하였다.

전력 조류 계산의 병렬처리에 관한 연구 (A Development of Parallel Processing for Power Flow analysis)

  • 이춘모
    • 전기학회논문지P
    • /
    • 제51권2호
    • /
    • pp.55-59
    • /
    • 2002
  • Parallel processing is able to be used effectively on computationally intense power system problems. But this technology is not still available is not only parallel computer but also parallel processing scheme. Testing these algorithms to ensure accuracy, and evaluation of their performance is also an issue. Although a significant amount of parallel algorithms of power system problem have been developed in last decade, actual testing on parallel computer architectures lies in the beginning stages because no clear cut paths. This paper presents Jacobian modeling method to supply the base being able to treat power flow by newton's method by the computer. This method is to assign and to compute teared blocks of sparse matrix at each parallel processors. The testing to insure accuracy of developed method have been done on serial computer by trying to simulate a parallel environment.

Performance Optimization of Parallel Algorithms

  • Hudik, Martin;Hodon, Michal
    • Journal of Communications and Networks
    • /
    • 제16권4호
    • /
    • pp.436-446
    • /
    • 2014
  • The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.

Adaptive and optimized agent placement scheme for parallel agent-based simulation

  • Jin, Ki-Sung;Lee, Sang-Min;Kim, Young-Chul
    • ETRI Journal
    • /
    • 제44권2호
    • /
    • pp.313-326
    • /
    • 2022
  • This study presents a noble scheme for distributed and parallel simulations with optimized agent placement for simulation instances. The traditional parallel simulation has some limitations in that it does not provide sufficient performance even though using multiple resources. The main reason for this discrepancy is that supporting parallelism inevitably requires additional costs in addition to the base simulation cost. We present a comprehensive study of parallel simulation architectures, execution flows, and characteristics. Then, we identify critical challenges for optimizing large simulations for parallel instances. Based on our cost-benefit analysis, we propose a novel approach to overcome the performance constraints of agent-based parallel simulations. We also propose a solution for eliminating the synchronizing cost among local instances. Our method ensures balanced performance through optimal deployment of agents to local instances and an adaptive agent placement scheme according to the simulation load. Additionally, our empirical evaluation reveals that the proposed model achieves better performance than conventional methods under several conditions.