• Title/Summary/Keyword: Parallel Programs

Search Result 218, Processing Time 0.023 seconds

Efficient Checkpoint Algorithm for Message-Passing Parallel Applications on Cloud Computing (클라우드컴퓨팅에서 메시지패싱방식 응용프로그램의 효율적인 체크포인트 알고리즘)

  • Le, Duc Tai;Dao, Manh Thuong Quan;Ahn, Min-Joon;Choo, Hyun-Seung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.156-157
    • /
    • 2011
  • In this work, we study the checkpoint/restart problem for message-passing parallel applications running on cloud computing environment. This is a new direction which arises from the trend of enabling the applications to run on the cloud computing environment. The main objective is to propose an efficient checkpoint algorithm for message-passing parallel applications considering communications with external systems. We further implement the novel algorithm by modifying gSOAP and OpenMPI (the open source libraries) which support service calls and checkpoint message-passing parallel programs, especially. The simulation showed that additional costs to the executing and checkpointing application of the algorithm are negligible. Ultimately, the algorithm supports efficiently the checkpoint/restart service for message-passing parallel applications, that send requests to external services.

Proposal for Decoding-Compatible Parallel Deflate Algorithm by Inserting Control Header Composed of Non-Compressed Blocks (비 압축 블록으로 구성된 제어 헤더 삽입을 통한 압축 해제 호환성 있는 병렬 처리 Deflate 알고리즘 제안)

  • Kim Jung Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.5
    • /
    • pp.207-216
    • /
    • 2023
  • For decoding-compatible parallel Deflate algorithm, this study proposed a new method of the control header being made in such a way that essential information for parallel compression and decompression are stored in the Disposed Bit Area (DBA) of the non-compression block and being inserted into the compressed blocks. Through this, parallel compression and decompression are possible while maintaining perfect compatibility with the existing decoder. After applying this method, the compression time was reduced by up to 71.2% compared to the sequential processing method, and the parallel decompression time was reduced by up to 65.7%. In particular, it is well known that parallel decompression is impossible due to the structural limitations of the Deflate algorithm. However, the decoder equipped with the proposed method enables high-speed parallel decompression at the algorithm level and maintains compatibility, so that parallelly compressed data can be decoded normally by existing decoder programs.

A Labeling Scheme for Efficient On-the-fly Detection of Race Conditions in Parallel Programs (병렬프로그램의 경합조건을 수행 중에 효율적으로 탐지하기 위한 레이블링 기법)

  • Park, So-Hee;Woo, Jong-Jung;Bae, Jong-Min;Jun, Yong-Kee
    • The KIPS Transactions:PartA
    • /
    • v.9A no.4
    • /
    • pp.525-534
    • /
    • 2002
  • Race conditions, races in short, need to be detected for debugging parallel programs, because the races result in unintended non-deterministic executions. To detect the races in an execution of program, previous techniques use a centralized data structure which may incur serious bottleneck in generating concurrency information, or show inefficient time complexity which depends on the degree of nested parallelism in comparing any two of them. We propose a new labeling scheme in this paper, which is scalable in generating the concurrency information without bottleneck by using private data structure, and improves time complexity into constant in checking concurrency. The scalability and time efficiency therfore makes on-the-fly race detection efficient not only for programs with either shared-memory or message-passing, but also for programs with mixed model of the two.

Real time simulation using multiple DSPs for fossil power plants (병렬처리를 이용한 화력발전소의 실시간 시뮬레이션)

  • 박희준;김병국
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.480-483
    • /
    • 1997
  • A fossil power plant can be modeled by a lot of algebraic equations and differential equations. When we simulate a large, complicated fossil power plant by a computer such as workstation or PC, it takes much time until overall equations are completely calculated. Therefore, new processing systems which have high computing speed is ultimately needed to develope real-time simulators. Vital points of real-time simulators are accuracy, computing speed, and deadline observing. In this paper, we present a enhanced strategy in which we can provide powerful computing power by parallel processing of DSP processors with communication links. We designed general purpose DSP modules, and a VME interface module. Because the DSP module is designed for general purpose, we can easily expand the parallel system by just connecting new DSP modules to the system. Additionally we propose methods about downloading programs, initial data to each DSP module via VME bus, DPRAM and processing sequences about computing and updating values between DSP modules and CPU30 board when the simulator is working.

  • PDF

A Synchronous/Asynchronous Hybrid Parallel Power Iteration for Large Eigenvalue Problems by the MPMD Methodology (MPMD 방식의 동기/비동기 병렬 혼합 멱승법에 의한 거대 고유치 문제의 해법)

  • Park, Pil-Seong
    • The KIPS Transactions:PartA
    • /
    • v.11A no.1
    • /
    • pp.67-74
    • /
    • 2004
  • Most of today's parallel numerical schemes use synchronous algorithms, where some processors that have finished their tasks earlier than others must wait at synchronization points for correct computation. Hence overall performance of the system is dependent upon the speed of the slowest processor. In this paper, we det·ise a synchronous/asynchronous hybrid algorithm to accelerate convergence of the solution for finding the dominant eigenpair of a large matrix, by reducing the idle times of faster processors using MPMD programming methodology.

Comparison of Go and C++ TBB on Parallel Processing (Go와 C++ TBB의 병렬처리 비교)

  • Park, Dong-Ha;Moon, Bong-Kyo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.64-67
    • /
    • 2017
  • Applying concurrent structure and parallel processing are a common issue for these day's programs. In this research, Dynamic Programming is used to compare the parallel performance of Go language and Intel C++ Thread Building Blocks. The experiment was performed on 4 core machine and its result contains execution time under Simultaneous Multi-Threading environment. Static Optimal Binary Search Tree was used as an example. From the result, the speed-up of Go was higher than the number of cores, and that of TBB was close to it. TBB performed better in general, but for larger scale, Go was partially faster than the other.

A FASTER LU DECOMPOSITION FOR PARALLEL C PROGRAMS

  • Lee, Sang-Moon;Lee, Chin-Young
    • Journal of applied mathematics & informatics
    • /
    • v.3 no.2
    • /
    • pp.217-234
    • /
    • 1996
  • This report introduces a faster parallel LU decomposi-tion algorithm that gives a speedup almost equal to the number of nodes used. The new algorithm takes an advantage of an important C feature that lays out a matrix using a row major scheme and is based on the currently widely used LU decomposition algorithm with one major modification to eliminate most of the communication overhead. Empirical results are included in this report. For example solving a dense matrix that contains 100,000,000 elements gives a speedup of 50 when executed on 50 nodes of an intel Paragon in parallel.

Interprocedural Transformations for Parallel Computing

  • Park, Doo-Soon;Choi, Min-Hyung
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.12
    • /
    • pp.1700-1708
    • /
    • 2006
  • Since the most program execution time is consumed in a loop structure, extracting parallelism from loop programs is critical for the taster program execution. In this paper, we proposed data dependency removal method for a single loop. The data dependency removal method can be applied to uniform and non-uniform data dependency distance in the single loop. Procedure calls parallelisms with only a single loop structure or procedure call most of other methods are concerned with the uniform code within the uniform data dependency distance. We also propose an algorithm, which can be applied to uniform, non-uniform, and complex data dependency distance among the multiple procedures. We compared our method with conventional methods using CRAY-T3E for the performance evaluation. The results show that the proposed algorithm is effective.

  • PDF

Computer simulation for the performance analysis of automobile air conditioning system (자동차용 에어컨 시스템의 성능해석을 위한 컴퓨터 시뮬레이션)

  • 이건호;유정열;정종대;최규환
    • Korean Journal of Air-Conditioning and Refrigeration Engineering
    • /
    • v.10 no.2
    • /
    • pp.202-216
    • /
    • 1998
  • A computer simulation for the performance analysis of automobile air conditioning components is carried out for the various operating conditions. The automobile air conditioning system consists of laminated type evaporator, swash plate type compressor, parallel flow type condenser, externally equalized thermostatic expansion valve and receiver drier. The overall heat transfer coefficient and the pressure drop in laminated type evaporator were obtained through experiments. In parallel flow type condenser, the performance analysis computer program using the empirical equation for heat transfer coefficient has been developed and the results are compared with experimental results. A model for matching the performance analysis programs of respective components .of automobile air conditioning system is introduced. Further, the effects of varying condenser size and refrigerant charge on the performance of automobile air conditioning system are discussed clearly.

  • PDF

A STUDY OF THE APPLICATION OF DELAUNAY GRID GENERATION ON GPU USING CUDA LIBRARY (GPU Library CUDA를 이용한 효율적인 Delaunay 격자 생성에 관한 연구)

  • Song, J.H.;Kang, S.H.;Kim, G.M.;Kim, B.S.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2011.05a
    • /
    • pp.194-198
    • /
    • 2011
  • In this study, an efficient algorithm for Delaunay triangulation of a number of points which can be used on a GPU-based parallel computation is studied The developed algorithm is programmed using CUDA library. and the program takes full advantage of parallel computation which are concurrently performed on each of the threads on GPU. The results of partitioned triangulation collected from the GPU computation requires proper stitching between neighboring partitions and calculation of connectivities among triangular cells on CPU In this study, the effect of number of threads on the efficiency and total duration for Delaunay grid generation is studied. And it is also shown that GPU computing using CUDA for Delaunay grid generation is feasible and it saves total time required for the triangulation of the large number points compared to the sequential CPU-based triangulation programs.

  • PDF