• Title/Summary/Keyword: nested parallelism

Search Result 30, Processing Time 0.022 seconds

A New Synchronization Scheme for Parallel Processing on Perfectly Nested Do Loops (완전 중첩 루프에서 병렬처리를 위한 새로운 동기화 기법)

  • 이광형;황종선;박두순;김병수
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.10
    • /
    • pp.1-10
    • /
    • 1994
  • In most application programs, loops usually contain most of the computation in a program and are the most improtant source of parallelism. When loops are executed on multiprocessors, the cross iteration data dependences need to be enforced by synchronization between processors. In this paper, we propose a new synchronization scheme(Free/Hold) for reducing overgeads occured by synchronization variables in data oriented scheme and delay of time occured by synchronization instruction in statement oriented scheme. The Free/Hold mechanism enforces the correct execution order by inserting synchronization instruction between each instance with data dependence relationship using the RD(Real dependence Distance). We also present an algorithm for removing unnecessary dependences in one-to-many dependences.

  • PDF

A Design of Parallel Compiler Using the Parafrase II (Parafrase II를 이용한 병렬 컴파일러 설계)

  • Song Worl-Bong
    • Journal of the Korea Computer Industry Society
    • /
    • v.7 no.3
    • /
    • pp.185-190
    • /
    • 2006
  • In this paper, a simple parallel compiler using of Parafrase II is presented. This is a new general method the extracting parallelism in order to parallel processing effectively in nested loop. For this, the source program of Parafrase II parallel compiler is analyzed and implemented. Moreover, this method can be applicable where the dependency relation is both uniform and non-uniform in distance.

  • PDF

Parallel Computation Algorithm of Gauss Elimination in Power system Analysis (전력계통해석을 위한 자코비안행렬 가우스소거의병렬계산 알고리즘)

  • 서의석;오태규
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.43 no.2
    • /
    • pp.189-196
    • /
    • 1994
  • This paper describes a parallel computing algorithm in Gauss elimination of Jacobian matrix to large-scale power system. The structure of Jacobian matrix becomes different according to ordering method of buses. In sequential computation buses are ordered to minimize the number of fill-in in the triangulation of the Jacobian matrix. The proposed method develops the parallelism in the Gauss elimination by using ND(nested dissection) ordering. In this procedure the level structure of the power system network is transformed to be long and narrow by using end buses which results in balance of computing load among processes and maximization of parallel computation. Each processor uses the sequential computation method to preserve the sqarsity of matrix.

  • PDF

Parallel Computation Algorithm of Gauss Elimination in Power system Analysis (전력계통의 자코비안행렬 가우스소거의 병렬계산)

  • Suh, Eui-Suk;Oh, Tae-Kyoo
    • Proceedings of the KIEE Conference
    • /
    • 1993.07a
    • /
    • pp.163-166
    • /
    • 1993
  • This paper describes an parallell computing algorithm in Gauss elimination of Jacobian matrix to large-scale power system. The structure of Jacobian matrix becomes different according to ordering method of buses. In sequential computation buses are ordered to minimize the number of fill-in in the triangulation of the Jacobian matrix. The proposed method using ND(nested dissection) ordering develops the parallelism in the Gauss elimination to have balance of computing load among processes and each processor uses the sequential computation method to preserve the sparsity of matrix.

  • PDF

A Study on Parallel Performance Optimization Method for Acceleration of High Resolution SAR Image Processing (고해상도 SAR 영상처리 고속화를 위한 병렬 성능 최적화 기법 연구)

  • Lee, Kyu Beom;Kim, Gyu Bin;An, Sol Bo Reum;Cho, Jin Yeon;Lim, Byoung-Gyun;Kim, Dong-Hyun;Kim, Jeong Ho
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.46 no.6
    • /
    • pp.503-512
    • /
    • 2018
  • SAR(Synthetic Aperture Radar) is a technology to acquire images by processing signals obtained from radar, and there is an increasing demand for utilization of high-resolution SAR images. In this paper, for high-speed processing of high-resolution SAR image data, a study for SAR image processing algorithms to achieve optimal performance in multi-core based computer architecture is performed. The performance deterioration due to a large amount of input/output data for high resolution images is reduced by maximizing the memory utilization, and the parallelization ratio of the code is increased by using dynamic scheduling and nested parallelism of OpenMP. As a result, not only the total computation time is reduced, but also the upper bound of parallel performance is increased and the actual parallel performance on a multi-core system with 10 cores is improved by more than 8 times. The result of this study is expected to be used effectively in the development of high-resolution SAR image processing software for multi-core systems with large memory.

A Detection Tool of First Races in OpenMP Programs with Directives (OpenMP 디렉티브 프로그램의 최초경합 탐지를 위한 도구)

  • Kang, Mun-Hye;Ha, Ok-Kyoon;Jun, Yong-Kee
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.1
    • /
    • pp.1-7
    • /
    • 2010
  • Detecting data races is important for debugging programs with OpenMP directives, because races result in unintended non-deterministic executions of the program. It is especially important to detect the first data races to occur for effective debugging, because the removal of such races may make other affected races disappear or appear. The previous tools for race detecting can not guarantee that detected races are the first races to occur. This paper suggests a tool what detects the first races to occur on the program with nested parallelism using the two-pass on-the-fly technique. To show functionality of this tool, we empirically compare with the previous tools using a set of the synthetic programs with OpenMP directives.

Check of Concurrency in Parallel Programs using Image Information (영상정보를 이용한 병렬 프로그램내의 병행성 판별)

  • Park, Myeong-Chul;Ha, Seok-Wun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.12
    • /
    • pp.2132-2139
    • /
    • 2006
  • A parallel program including a nested parallelism has a complex execution aspects and tasks are executed concurrently. This concurrency is a main cause raising most of errors. In this paper, a new method for checking concurrency between two tasks is proposed. The existing techniques for checking the concurrency have their limits to represent a global structure. A new labeling technique that appropriate for image visualization is proposed. To show the global structure by imaging of execution aspects through region partition on 2D plane. On the basis of it, each of the tasks that can distinguish the ordered relation create an independent image. Image information generated by the result simplifies semantic analysis of the related task, and provides an outline of a global execution aspects structure of the program to user effectively.

On-the -fly Detection of the First Races for Shared-Memory Parallel Programs with Ordered Synchronization (순서적 동기화를 포함하는 공유 메모리 병렬프로그램에서의 수행중 최초경합 탐지 기법)

  • Park, Hui-Dong;Jeon, Yong-Gi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.8
    • /
    • pp.884-894
    • /
    • 1999
  • 순서적 동기화 및 내포 병렬성을 포함하는 공유메모리 병렬 프로그램에서의 경합(race)은 프로그램 수행에서 원하지 않은 비결정성(nondeterminism)을 야기할 수 있기 때문에 반드시 탐지되어져야 한다. 특히 프로그램 수행에서 최초경합(first race)을 탐지하는 것은 중요한데, 그 이유는 이 경합을 제거하면 다른 경합이 나타나지 않을 수도 있기 때문이다. 본 논문에서는 결정적 공유메모리 병렬프로그램을 위한 2단계 수행중 (two-pass on-the-fly) 최초경합 탐지 기법을 제시하며, 이것은 공유메모리 병렬 프로그램의 특정 수행에서 "최초로 발생되는" 경합들을 탐지하는 기법이다. 그리고 HPF 컴파일러를 이용하여 본 탐지 프로토콜을 공인된 벤치마크 프로그램에 적용하여, 병렬 프로그램 디버깅 시 고려하여야 할 파라미터들에 대한 실험으로부터 본 기법의 효율성을 보였다.Abstract Detecting races is important in debugging shared-memory parallel programs which have ordered synchronization and nested parallelism, because the races result in unintended non- deterministic executions of the programs. The first races are important in debugging, because the removal of such races may make other races disappear. It is even possible that all races reported would disappear once the first races are removed. This paper presents a new two-pass on-the-fly algorithm to detect the first races in such parallel programs. The algorithm reported in this paper is an on-the-fly algorithm that detects the races that "occur first" in a particular execution of shared-memory parallel programs. The experiment has accomplished, where two certified benchmark programs which can be executed under High Performance Fortran environments to get some parameters which improve debugging performance with our algorithm. with our algorithm.

A Labeling Scheme for Efficient On-the-fly Detection of Race Conditions in Parallel Programs (병렬프로그램의 경합조건을 수행 중에 효율적으로 탐지하기 위한 레이블링 기법)

  • Park, So-Hee;Woo, Jong-Jung;Bae, Jong-Min;Jun, Yong-Kee
    • The KIPS Transactions:PartA
    • /
    • v.9A no.4
    • /
    • pp.525-534
    • /
    • 2002
  • Race conditions, races in short, need to be detected for debugging parallel programs, because the races result in unintended non-deterministic executions. To detect the races in an execution of program, previous techniques use a centralized data structure which may incur serious bottleneck in generating concurrency information, or show inefficient time complexity which depends on the degree of nested parallelism in comparing any two of them. We propose a new labeling scheme in this paper, which is scalable in generating the concurrency information without bottleneck by using private data structure, and improves time complexity into constant in checking concurrency. The scalability and time efficiency therfore makes on-the-fly race detection efficient not only for programs with either shared-memory or message-passing, but also for programs with mixed model of the two.

Optimal Scheduling of SAD Algorithm on VLIW-Based High Performance DSP (VLIW 기반 고성능 DSP에서의 SAD 알고리즘 최적화 스케줄링)

  • Yu, Hui-Jae;Jung, Sou-Hwan;Chung, Sun-Tae
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.12
    • /
    • pp.262-272
    • /
    • 2007
  • SAD (Sum of Absolute Difference) algorithm is the most frequently executing routine in motion estimation, which is the most demanding process in motion picture encoding. To enhance the performance of motion picture encoding on a VLIW processor, an optimal implementation of SAD algorithm on VLIW processor should be accomplished. In this paper, we propose an implementation of optimal scheduling of SAD algorithm with conditional branch on a VLIW-based high performance DSP. We first transform the nested loop with conditional branch of SAD algorithm into a single loop with conditional branch which has a large enough loop body to utilize fully the ILP capability of VLIW DSP and has a conditional branch to make the escape from loop to be achieved as soon as possible. And then we apply a modulo scheduling technique to the transformed single loop. We test the proposed implementation on TMS320C6713, and analyze the code size and performance with respect to processing time. Through experiments, it is shown that the SAD implementation proposed in this paper has small code size appropriate for embedded applications, and the H.263 encoder with the proposed SAD implementation performs better than other H.263 encoder with other SAD implementations.