통합 검색 | Korea Science

Performance Comparison of Two Parallel LU Decomposition Algorithms on MasPar Machines

김영태
- 전기전자학회논문지
- /
- 제2권2호
- /
- pp.247-254
- /
- 1998
This paper presents a performance study of two LU decomposition algorithms on two massively parallel SIMD machines: the 16K processor MasPar MP-1 and the 4K processor MasPar MP-2. The paper presents experimental results and an analysis of the algorithms to explain the results. While the blocked and the nonblocked algorithms for LU decomposition have been studied individually by others, we compare the two algorithms and identify the tradeoffs between them. Our analysis of the blocked algorithm shows how the block size affects the interprocessor communication cost and the memory read/write overhead. The analysis in this paper is useful to determine an optimum block size for the blocked algorithm.
PDF

병력구조 전산기를 이용한 최단 경로 계산 (Shortest Path Calculation Using Parallel Processor System)

서창진;이장규
- 대한전기학회논문지
- /
- 제34권6호
- /
- pp.230-237
- /
- 1985
Shortest path calculations for a large-scale network have to be performed using a decomposition techniqre, since the calculations require large memory size which increases by the square of the number of vertices in the network. Also, the calculation time increases by the cube of the number of vertices in the network. In the decomposition technique,the network is broken into a number of smaller size subnetworks for each of which shortest paths are computed. A union of the solutions provides the solution of the original network. In all of the decomposition algirithms developed up to now, boundary vertices which divide all the subnetworks have to be included in computing shortest paths for each subnetwork. In this paper, an improved algorithm is developed to reduce the number of boundary vertices to be engaged. In the algorithm, only those boundary vertices that are directly connected to the subnetwork are engaged. The algorithm is suitable for an application to real time computation using a parallel processor system which consists of a number of micro-computers or prcessors. The algorithm has been applied to a 39- vertex network and a 232-vertex network. The results show that it is efficient and has better performance than any other algorithms. A parallel processor system has been built employing an MZ-80 micro-computer and two Z-80 microprocessor kits. The former is used as a master processor and the latter as slave processors. The algorithm is embedded into the system and proven effective for real-time shortest path computations.
PDF

영역분할법과 유한요소해석을 이용한 유동장의 병렬계산 (Parallel Computation of a Flow Field Using FEM and Domain Decomposition Method)

최형권;김범준;강성우;유정열
- 대한기계학회:학술대회논문집
- /
- 대한기계학회 2002년도 학술대회지
- /
- pp.55-58
- /
- 2002
Parallel finite element code has been recently developed for the analysis of the incompressible Wavier-Stokes equations using domain decomposition method. Metis and MPI libraries are used for the domain partitioning of an unstructured mesh and the data communication between sub-domains, respectively. For unsteady computation of the incompressible Navier-Stokes equations, 4-step splitting method is combined with P1P1 finite element formulation. Smagorinsky and dynamic model are implemented for the simulation of turbulent flows. For the validation performance-estimation of the developed parallel code, three-dimensional Laplace equation has been solved. It has been found that the speed-up of 40 has been obtained from the present parallel code fir the bench mark problem. Lastly, the turbulent flows around the MIRA model and Tiburon model have been solved using 32 processors on IBM SMP cluster and unstructured mesh. The computed drag coefficient agrees better with the existing experiment as the mesh resolution of the region increases, where the variation of pressure is severe.
PDF

A FASTER LU DECOMPOSITION FOR PARALLEL C PROGRAMS

Lee, Sang-Moon;Lee, Chin-Young
- Journal of applied mathematics & informatics
- /
- 제3권2호
- /
- pp.217-234
- /
- 1996
This report introduces a faster parallel LU decomposi-tion algorithm that gives a speedup almost equal to the number of nodes used. The new algorithm takes an advantage of an important C feature that lays out a matrix using a row major scheme and is based on the currently widely used LU decomposition algorithm with one major modification to eliminate most of the communication overhead. Empirical results are included in this report. For example solving a dense matrix that contains 100,000,000 elements gives a speedup of 50 when executed on 50 nodes of an intel Paragon in parallel.

Efficient Detection of Space-Time Block Codes Based on Parallel Detection

김정창;전경훈
- 한국통신학회논문지
- /
- 제36권2A호
- /
- pp.100-107
- /
- 2011
Algorithms based on the QR decomposition of the equivalent space-time channel matrix have been proved useful in the detection of V-BLAST systems. Especially, the parallel detection (PD) algorithm offers ML approaching performance up to 4 transmit antennas with reasonable complexity. We show that when directly applied to STBCs, the PD algorithm may suffer a rather significant SNR degradation over ML detection, especially at high SNRs. However, simply extending the PD algorithm to allow p ${\geq}$ 2 candidate layers, i.e. p-PD, regains almost all the loss but only at a significant increase in complexity. Here, we propose a simplification to the p-PD algorithm specific to STBCs without a corresponding sacrifice in performance. The proposed algorithm results in significant complexity reductions for moderate to high order modulations.
https://doi.org/10.7840/KICS.2011.36A.2.100 인용 PDF KSCI

병렬처리를 이용한 대규모 동적 시스템의 최적제어 (Optimal Control of Large-Scale Dynamic Systems using Parallel Processing)

박기홍
- 제어로봇시스템학회논문지
- /
- 제5권4호
- /
- pp.403-410
- /
- 1999
In this study, a parallel algorithm has been developed that can quickly solve the optiaml control problem of large-scale dynamic systems. The algorithm adopts the sequential quadratic programming methods and achieves domain decomposition-type parallelism in computing sensitivities for search direction computation. A silicon wafer thermal process problem has been solved using the algorithm, and a parallel efficiency of 45% has been achieved with 16 processors. Practical methods have also been investigated in this study as a way to further speed up the computation time.
PDF

MIMO 수신기에 적용 가능한 고성능 기븐스 회전 기반의 QR 분해 하드웨어 구조 (High-Performance Givens Rotation-based QR Decomposition Architecture Applicable for MIMO Receiver)

윤지환;이민우;박종선
- 전자공학회논문지SC
- /
- 제49권3호
- /
- pp.31-37
- /
- 2012
본 논문에서는 기븐스 회전 기반의 QR 분해를 고속으로 하기 위한 하드웨어 구조를 제안하였다. 제안된 접근 방식은 단위시간 당 처리량을 증대하기 위해 임의의 행렬을 직교행렬과 상삼각행렬의 곱으로 분해하는 과정 중 기븐스 회전을 위한 행렬의 기준 성분을 1개만 고정적으로 두지 않고 가능한 한 증가시킨다. 또한 기븐스 회전을 고속의 SSL-코딕(CORDIC)으로 구성하여 처리속도를 더욱 증대하였다. 제안 방법은 QR 분해의 성능을 기존의 TSA(triangular systolic array) 방식에 비해 비약적으로 향상되었을 뿐 아니라, 연산의 중간 결과를 저장하는 플립플롭의 개수를 경감하여 회로의 면적 또한 감소시키는 효과를 보여준다. 제안하는 QR 분해 하드웨어는 TSMC $0.25{\mu}m$ 공정을 사용하여 구현되었다. 실험 결과, $8{\times}8$ 행렬의 QR 분해에 대해 제안 구조는 TACR/TSA 기반 구조와 비교하여 75.24%의 성능 향상을 이룩할 수 있었다.
PDF KSCI

Domain Decomposition Approach Applied for Two- and Three-dimensional Problems via Direct Solution Methodology

Kwak, Jun Young;Cho, Haeseong;Chun, Tae Young;Shin, SangJoon;Bauchau, Olivier A.
- International Journal of Aeronautical and Space Sciences
- /
- 제16권2호
- /
- pp.177-189
- /
- 2015
This paper presents an all-direct domain decomposition approach for large-scale structural analysis. The proposed approach achieves computational robustness and efficiency by enforcing the compatibility of the displacement field across the sub-domain boundaries via local Lagrange multipliers and augmented Lagrangian formulation (ALF). The proposed domain decomposition approach was compared to the existing FETI approach in terms of the computational time and memory usage. The parallel implementation of the proposed algorithm was described in detail. Finally, a preliminary validation was attempted for the proposed approach, and the numerical results of two- and three-dimensional problems were compared to those obtained through a dual-primal FETI approach. The results indicate an improvement in the performance as a result of the implementing the proposed approach.
https://doi.org/10.5139/IJASS.2015.16.2.177 인용 PDF KSCI

SPARSE NULLSPACE COMPUTATION OF EQULILBRIUM MATRICES

Jang, Ho-Jong;Cha, Kyung-Joon
- 대한수학회논문집
- /
- 제11권4호
- /
- pp.1175-1185
- /
- 1996
We study the computation of sparse null bases of equilibrium matrices in the context of structural optimization and incompressible fluid flow. In our approach we emphasize the parallel computatin and examine the applications. New block decomposition and node ordering schemes are suggested, and numerical examples are considered.
PDF

충돌해석을 위한 병렬유한요소 알고리즘 (A Parallel Finite Element Procedure for Contact-Impact Problems)

하재선
- 대한기계학회:학술대회논문집
- /
- 대한기계학회 2003년도 추계학술대회
- /
- pp.1286-1290
- /
- 2003
This paper presents a newly implemented parallel finite element procedure for contact-impact problems. Three sub-algorithms are includes in the proposed parallel contact-impact procedure, such as a parallel Belytschko-Lin-Tsay (BLT) shell element generation, a parallel explicit time integration scheme, and a parallel contact search algorithm based on the master slave slide-line algorithm. The underlying focus of the algorithms is on its effectiveness and efficiency for inclusion in future finite element systems on parallel computers. Throughout this research, a prototype code, named GT-PARADYN, is developed on the IBM SP2, a distributed-memory computer. Some numerical examples are provided to demonstrate the timing results of the procedure, discussing the accuracy and efficiency of the code.
PDF

검색결과 186건 처리시간 0.023초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)