• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.034 seconds

Parallel Implementation of Nonlinear Analysis Program of PSC Frame Using MPI (MPI를 이용한 PSC 프레임 비선형해석 프로그램의 병렬화)

  • 이재석;최규천
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2001.04a
    • /
    • pp.61-68
    • /
    • 2001
  • A parallel nonlinear analysis program of prestressed concrete frame is migrated on a PC cluster system and a massively parallel processing system, CRAY T3E system, using MPI. The PC cluster system is configured with Pentium Ⅲ class PCs and fast ethernet. The CRAY T3E system is composed of a set of nodes each containing one Processing Element (PE), a memory subsystem and its distributed memory interconnect network. Parallel computing algorithms are implemented on element-wise processing parts including the calculation of stiffness matrix, element stresses and determination of material states, check of material failure and calculation of unbalanced loads. Parallel performance of the migrated program is evaluated through typical numerical examples.

  • PDF

A Systolic Parallel Simulation System for Dynamic Traffic Assignment : SPSS-DTA

  • Park, Kwang-Ho;Kim, Won-Kyu
    • Journal of Intelligence and Information Systems
    • /
    • v.6 no.1
    • /
    • pp.113-128
    • /
    • 2000
  • This paper presents a first year report of an ongoing multi-year project to develop a systolic parallel simulation system for dynamic traffic assignment. The fundamental approach to the simulation is systolic parallel processing based on autonomous agent modeling. Agents continuously act on their own initiatives and access to database to get the status of the simulation world. Various agents are defined in order to populate the simulation world. In particular existing modls and algorithm were incorporated in designing the behavior of relevant agents such as car-following model headway distribution Frank-Wolf algorithm and so on. Simulation is based on predetermined routes between centroids that are computed off-line by a conventional optimal path-finding algorithm. Iterating the cycles of optimization-then-simulation the proposed system will provide a realistic and valuable traffic assignment. Gangnum-Gu district in Seoul is selected for the target are for the modeling. It is expected that realtime traffic assignment services can be provided on the internet within 3 years.

  • PDF

Optimal Control of Large-Scale Dynamic Systems using Parallel Processing (병렬처리를 이용한 대규모 동적 시스템의 최적제어)

  • Park, Ki-Hong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.5 no.4
    • /
    • pp.403-410
    • /
    • 1999
  • In this study, a parallel algorithm has been developed that can quickly solve the optiaml control problem of large-scale dynamic systems. The algorithm adopts the sequential quadratic programming methods and achieves domain decomposition-type parallelism in computing sensitivities for search direction computation. A silicon wafer thermal process problem has been solved using the algorithm, and a parallel efficiency of 45% has been achieved with 16 processors. Practical methods have also been investigated in this study as a way to further speed up the computation time.

  • PDF

STUDY OF THREE-DIMENSIONAL DETONATION WAVE STRUCTURES USING PARALLEL PROCESSING (병렬 처리를 이용한 3차원 데토네이션 파 구조 해석)

  • Cho D.R.;Choi J.Y.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2005.10a
    • /
    • pp.151-155
    • /
    • 2005
  • Three-dimensional structures of unsteady detonation wave propagating through a square-shaped tube is studied using computational method and parallel processing. Inviscid fluid dynamics equations coupled with variable-${\gamma}$ formulation and simplified one-step Arrhenius chemical reaction model were analysed by a MUSCL-type TVD scheme and four stage Runge-Kutta time integration. Results in three dimension show the two unsteady detonation wave propagating mode, the Rectangular and diagonal mode of detonation wave instabilities. Two different modes of instability showed the same cell length but different cell width and the geometric similarities in smoked-foil record.

  • PDF

NUMERICAL STUDY OF THREE-DIMENSIONAL DETONATION WAVES USING PARALLEL PROCESSING (병렬 처리를 이용한 3차원 테토네이션 파 수치해석)

  • Cho, D.R.;Choi, J.Y.
    • 한국연소학회:학술대회논문집
    • /
    • 2005.10a
    • /
    • pp.15-19
    • /
    • 2005
  • Three-dimensional structures of unsteady detonation wave propagating through a square-shaped tube is studied using computational method and parallel processing. Inviscid fluid dynamics equations coupled with variable-${\gamma}$ formulation and simplified one-step Arrhenius chemical reaction model were analysed by a MUSCL-type TVD scheme and four stage Runge-Kutta time integration. Results in three dimension show the two unsteady detonation wave propagating mode, the Rectangular and diagonal mode of detonation wave instabilities. Two different modes of instability showed the same cell length but different cell width and the geometric similarities in smoked-foil record.

  • PDF

From WiFi to WiMAX: Efficient GPU-based Parameterized Transceiver across Different OFDM Protocols

  • Li, Rongchun;Dou, Yong;Zhou, Jie;Li, Baofeng;Xu, Jinbo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.8
    • /
    • pp.1911-1932
    • /
    • 2013
  • Orthogonal frequency-division multiplexing (OFDM) has become a popular modulation scheme for wireless protocols because of its spectral efficiency and robustness against multipath interference. Although the components of various OFDM protocols are functionally similar, they remain distinct because of the characteristics of the environment. Recently, graphics processing units (GPUs) have been used to accelerate the signal processing of the physical layer (PHY) because of their great computational power, high development efficiency, and flexibility. In this paper, we describe the implementation of parameterized baseband modules using GPUs for two different OFDM protocols, namely, 802.11a and 802.16. First, we introduce various modules in the modulator/demodulator parts of the transmitter and receiver and analyze the computational complexity of each module. We then describe the integration of the GPU-based baseband modules of the two protocols using the parameterized method. GPU-based implementations are addressed to explain how to accelerate the baseband processing to archive real-time throughput. Finally, the performance results of each signal processing module are evaluated and analyzed. The experiments show that the GPU-based 802.11a and 802.16 PHY meet the real-time requirement and demonstrate good bit error ratio (BER) performance. The performance comparison indicates that our GPU-based implemented modules have better flexibility and throughput to the current ones.

On Parallel Implementation of Lagrangean Approximation Procedure (Lagrangean 근사과정의 병렬계산)

  • 이호창
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.18 no.3
    • /
    • pp.13-34
    • /
    • 1993
  • By operating on many part of a software system concurrently, the parallel processing computers may provide several orders of magnitude more computing power than traditional serial computers. If the Lagrangean approximation procedure is applied to a large scale manufacturing problem which is decomposable into many subproblems, the procedure is a perfect candidate for parallel processing. By distributing Lagrangean subproblems for given multiplier to multiple processors, concurrently running processors and modifying Lagrangean multipliers at the end of each iteration of a subgradient method,a parallel processing of a Lagrangean approximation procedure may provide a significant speedup. This purpose of this research is to investigate the potential of the parallelized Lagrangean approximation procedure (PLAP) for certain combinational optimization problems in manufacturing systems. The framework of a Plap is proposed for some combinatorial manufacturing problems which are decomposable into well-structured subproblems. The synchronous PLAP for the multistage dynamic lot-sizing problem is implemented on a parallel computer Alliant FX/4 and its computational experience is reported as a promising application of vector-concurrent computing.

  • PDF

Pattern Classification with the Analog Cellular Parallel Processing Networks (아날로그 셀룰라 병렬 처리 회로망(CPPN)을 이용한 Pattern Classification)

  • 오태완;이혜정;김형석
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2367-2370
    • /
    • 2003
  • A fast pattern classification algorithm with Cellular Parallel Processing Network-based dynamic programming is proposed. The Cellular Parallel Processing Networks is an analog parallel processing architecture and the dynamic programming is an efficient computation algorithm for optimization problem. Combining merits of these two technologies, fast Pattern classification with optimization is formed. On such CPPN-based dynamic programming, if exemplars and test patterns are presented as the goals and the start positions, respectively, the optimal paths from test patterns to their closest exemplars are found. Such paths are utilized as aggregating keys for the classification. The pattern classification is performed well regardless of degree of the nonlinearity in class borders.

  • PDF

Parallel Computing Environment for R with on Supercomputer Systems (빅데이터 분석을 위한 슈퍼컴퓨터 환경에서 R의 병렬처리)

  • Lee, Sang Yeol;Won, Joong Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.4
    • /
    • pp.19-31
    • /
    • 2014
  • We study parallel processing techniques for the R programming language of high performance computing technology. In this study, we used massively parallel computing system which has 25,408 cpu cores. We conducted a performance evaluation of a distributed memory system using MPI and of a the shared memory system using OpenMP. Our findings are summarized as follows. First, For some particular algorithms, parallel processing is about 150 times faster than serial processing in R. Second, the distributed memory system gets faster as the number of nodes increases while shared memory system is limited in the improvement of performance, due to the limit of the number of cpus in a single system.

A Disk Allocation Scheme for High-Performance Parallel File System (고성능 병렬화일 시스템을 위한 디스크 할당 방법)

  • Park, Kee-Hyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.9
    • /
    • pp.2827-2835
    • /
    • 2000
  • In recent years, much attention has been focused on improving I/O devices' processing speed which is essential in such large data processing areas as multimedia data processing. And studies on high-performance parallel file systems are considered to be one of such efforts. In this paper, an efficient disk allocation scheme is proposed for high-performance parallel file systems. In other words, the concept of a parallel disk file's parallelism is defined using data declustering characteristic of a given parallel file. With the concept, an efficient disk allocation scheme is proposed which calculates the appropriate degree of data declustering on disks for each parallel file in order to obtain the maximum throughput when more than one parallel file is used at the same time. Since, calculation for obtaining the maximum throughput is too complex as the number of parallel files increases, an approximate disk allocation algorithm is also proposed in this paper. The approximate algorithm is very simple and especially provides very good results when I/O workload is high. In addition, it has shown that the approximate algorithm provides the optimal disk allocation for the maximum throughput when the arrival rate of I/O requests is infinite.

  • PDF