• Title/Summary/Keyword: Asynchronous Parallel Algorithm

Search Result 13, Processing Time 0.022 seconds

Application of a Parallel Asynchronous Algorithm to Some Grid Problems on Workstation Clusters

  • Park, Pil-Seong
    • Ocean and Polar Research
    • /
    • v.23 no.2
    • /
    • pp.173-179
    • /
    • 2001
  • Parallel supercomputing is now a must for oceanographic numerical modelers. Most of today's parallel numerical schemes use synchronous algorithms, where some processors that have finished their tasks earlier than others must wait at synchronization points for correct computation. Hence, the load balancing is a crucial factor, however, it is, in general, difficult to achieve on heterogeneous workstation clusters. We devise an asynchronous algorithm that reduces the idle times of faster processors, and discuss application of the algorithm to some grid problems and implementation on a workstation cluster using Message Passing Interface (MPI).

  • PDF

A Synchronous/Asynchronous Hybrid Parallel Power Iteration for Large Eigenvalue Problems by the MPMD Methodology (MPMD 방식의 동기/비동기 병렬 혼합 멱승법에 의한 거대 고유치 문제의 해법)

  • Park, Pil-Seong
    • The KIPS Transactions:PartA
    • /
    • v.11A no.1
    • /
    • pp.67-74
    • /
    • 2004
  • Most of today's parallel numerical schemes use synchronous algorithms, where some processors that have finished their tasks earlier than others must wait at synchronization points for correct computation. Hence overall performance of the system is dependent upon the speed of the slowest processor. In this paper, we det·ise a synchronous/asynchronous hybrid algorithm to accelerate convergence of the solution for finding the dominant eigenpair of a large matrix, by reducing the idle times of faster processors using MPMD programming methodology.

Improving Performance of Large Sparse Linear System Solvers On Distributed Memory Systems By Asynchronous Algorithms (비동기 알고리즘을 이용한 분산 메모리 시스템에서의 초대형 선형 시스템 해법의 성능 향상)

  • Park, Pil-Seong;Sin, Sun-Cheol
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.439-446
    • /
    • 2001
  • The main stream of parallel programming today is using synchronous algorithms, where processor synchronization for correct computation and workload balance are essential. Overall performance of the whole system is dependent upon the performance of the slowest processor, if workload is not well-balanced or heterogeneous clusters are used. Asynchronous iteration is a way to mitigate such problems, but most of the works done so far are for shared memory systems. In this paper, we suggest and implement a parallel large sparse linear system solver that improves performance on distributed memory systems like clusters by reducing processor idle times as much as possible by asynchronous iterations.

  • PDF

Design of Asynchronous 16-Bit Divider Using NST Algorithm (NST알고리즘을 이용한 비동기식 16비트 제산기 설계)

  • 이우석;박석재;최호용
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.3
    • /
    • pp.33-42
    • /
    • 2003
  • This paper describes an efficient design of an asynchronous 16-bit divider using the NST (new Svoboda-Tung) algorithm. The divider is designed to reduce power consumption by using the asynchronous design scheme in which the division operation is performed only when it is requested. The divider consists of three blocks, i.e. pre-scale block, iteration step block, and on-the-fly converter block using asynchronous pipeline structure. The pre-scale block is designed using a new subtracter to have small area and high performance. The iteration step block consists of an asynchronous ring structure with 4 division steps for area reduction. In other to reduce hardware overhead, the part related to critical path is designed by a dual-rail circuit, and the other part is done by a single-rail circuit in the ring structure. The on-the-fly converter block is designed for high performance using the on-the-fly algorithm that enables parallel operation with iteration step block. The design results with 0.6${\mu}{\textrm}{m}$ CMOS process show that the divider consists of 12,956 transistors with 1,480 $\times$1,200${\mu}{\textrm}{m}$$^2$area and average-case delay is 41.7㎱.

A Fault-Tolerant Linear System Solver in a Standard MPI Environment (표준 MPI 환경에서의 무정지형 선형 시스템 해법)

  • Park, Pil-Seong
    • Journal of Internet Computing and Services
    • /
    • v.6 no.6
    • /
    • pp.23-34
    • /
    • 2005
  • In a large scale parallel computation, failures of some nodes or communication links end up with waste of computing resources, Several fault-tolerant MPI libraries have been proposed so far, but the programs written by using such libraries have a portability problem since fault-tolerant features are not supported by the MPI standard yet, In this paper, we propose an application-level fault-tolerant linear system solver that uses the asynchronous iteration algorithm and the standard MPI functions only, which does not have a portability problem and is more efficient by adopting a simplified recovery mechanism.

  • PDF

CUDA based Lossless Asynchronous Compression of Ultra High Definition Game Scenes using DPCM-GR (DPCM-GR 방식을 이용한 CUDA 기반 초고해상도 게임 영상 무손실 비동기 압축)

  • Kim, Youngsik
    • Journal of Korea Game Society
    • /
    • v.14 no.6
    • /
    • pp.59-68
    • /
    • 2014
  • Memory bandwidth requirements of UHD (Ultra High Definition $4096{\times}2160$) game scenes have been much more increasing. This paper presents a lossless DPCM-GR based compression algorithm using CUDA for solving the memory bandwidth problem without sacrificing image quality, which is modified from DDPCM-GR [4] to support bit parallel pipelining. The memory bandwidth efficiency increases because of using the shared memory of CUDA. Various asynchronous transfer configurations which can overlap the kernel execution and data transfer between host and CUDA are implemented with the page-locked host memory. Experimental results show that the maximum 31.3 speedup is obtained according to CPU time. The maximum 30.3% decreases in the computation time among various configurations.

Virtual Optimal Design of Satellite Adapter in Parallel Computing Environment (병렬 컴퓨팅 환경 하에서 인공위성 어댑터 가상최적설계)

  • Moon, Jong-Keun;Yoon, Young-Ha;Kim, Kyung-Won;Kim, Sun-Won;Kim, Jin-Hee;Kim, Seung-Jo
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.35 no.11
    • /
    • pp.973-982
    • /
    • 2007
  • In this paper, optimal design framework is developed by automatic mesh generation and PSO(Particle Swarm Optimization) algorithm based on parallel computing environment and applied to structural optimal design of satellite adapter module. By applying automatic mesh generation, it became possible to change the structural shape of adapter module. PSO algorithm was merged with parallel computing environment and for maximizing a computing performance, asynchronous PSO algorithm was developed and could reduce the computing time of optimization process. As constraint conditions, eigen-frequency and maximum stress was considered. Finally using optimal design framework, weight reduction of satellite adapter module is derived with satisfaction of structural safety.

Development of Asynchronous Blocking Algorithm through Asynchronous Case Study of Steam Turbine Generator (스팀터빈 발전기 비동기 투입 사례연구를 통한 비동기 방지 알고리즘 개발)

  • Lee, Jong-Hweon
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.10
    • /
    • pp.1542-1547
    • /
    • 2012
  • Asynchronous phenomenon occurs on the synchronous generators under power system when a generator's amplitude of electromagnetic force, phase angle, frequency and waveform etc become different from those of other synchronous generators which can follow instantly varying speed of turbine. Because the amplitude of electromagnetic force, phase frequency and waveform differ from those of other generators with which are to be put into parallel operation due to the change of excitation condition for load sharing and the sharing load change, if reactive current in the internal circuit circulates among generators, the efficiency varies and the stator winding of generators are overheated by resistance loss. When calculation method of protection settings and logic for protection of generator asynchronization will be recommended, a distance relay scheme is commonly used for backup protection. This scheme, called a step distance protection, is comprised of 3 steps for graded zones having different operating time. As for the conventional step distance protection scheme, zone 2 can exceed the ordinary coverage excessively in case of a transformer protection relay especially. In this case, there can be overlapped protection area from a backup protection relay and, therefore, malfunctions can occur when any fault occurs in the overlapped protection area. Distance relays and overcurrent relays are used for backup protection generally, and both relays have normally this problem, the maloperation, caused by a fault in the overlapped protection area. Corresponding to an IEEE standard, this problem can be solved with the modification of the operating time. On the other hand, in Korea, zones are modified to cope with this problem in some specific conditions. These two methods may not be obvious to handle this problem correctly because these methods, modifying the common rules, can cause another coordination problem. To overcome asynchronizing protection, this paper describes an improved backup protection coordination scheme using a new logic that will be suggested.

A Genetic Algorithm Based Source Encoding Scheme for Distinguishing Incoming Signals in Large-scale Space-invariant Optical Networks

  • Hongki Sung;Yoonkeon Moon;Lee, Hagyu
    • Journal of Electrical Engineering and information Science
    • /
    • v.3 no.2
    • /
    • pp.151-157
    • /
    • 1998
  • Free-space optical interconnection networks can be classified into two types, space variant and space invariant, according to the degree of space variance. In terms of physical implementations, the degree of space variance can be interpreted as the degree of sharing beam steering optics among the nodes of a given network. This implies that all nodes in a totally space-invariant network can share a single beam steering optics to realize the given network topology, whereas, in a totally space variant network, each node requires a distinct beam steering optics. However, space invariant networks require mechanisms for distinguishing the origins of incoming signals detected at the node since several signals may arrive at the same time if the node degree of the network is greater than one. This paper presents a signal source encoding scheme for distinguishing incoming signals efficiently, in terms of the number of detectors at each node or the number of unique wavelengths. The proposed scheme is solved by developing a new parallel genetic algorithm called distributed asynchronous genetic algorithm (DAGA). Using the DAGA, we solved signal distinction schemes for various network sizes of several topologies such as hypercube, the mesh, and the de Brujin.

  • PDF

The Measurement and Analysis of Cost Error in Simulated Annealing (시뮬레이티드 어닐링에서의 비용오류 측정 및 분석)

  • Hong, Cheol-Ui;Kim, Yeong-Jun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1141-1149
    • /
    • 2000
  • This paper proposes new cost error measurement method and analyzes the optimistic and pessimistic cost errors statistically which is resulted from an asynchronous parallel Simulated annealing (SA) in distributed memory multicomputers. The traditional cost error measurement scheme has inherent problems which are corrected in the new method. At each temperature the new method predicts the amount of cost error that an algorithm will tolerate and still converge by the hill-climbing nature of SA. This method also explains three interesting phenomenon of he cost error analytically. So the new cost error measurement method provides a single mechanism for the occurrence of cost error and its control.

  • PDF