• Title/Summary/Keyword: nested parallelism

Search Result 30, Processing Time 0.022 seconds

Parallelizing Imperfectly Nested Loops

  • Kim, Ki-Chang
    • Journal of Electrical Engineering and information Science
    • /
    • v.1 no.1
    • /
    • pp.140-150
    • /
    • 1996
  • Loops are some of the richest program constructs where parallelism is available. Exploiting fine-grain parallelizm out these constructs is particularly important in light of the growing popularity of superscalar and VLIW machines. This paper explains how the fine-grain parallelization techniques can be generalized to handle nested loops. Our technique integrates nested loop parallelization techniques at the fine-grain level, thus exposing more fine-grain parallelism, and is flexible enough to handle non-perfectly nested loops. Examples and some experimental results are presented to illustrate our approach.

  • PDF

Parallelism for Nested Loops with Simple Subscripts

  • Jeong, Sam-Jin
    • International Journal of Contents
    • /
    • v.4 no.4
    • /
    • pp.1-6
    • /
    • 2008
  • In this paper, we propose improved loop splitting method for maximizing parallelism of single loops with non-constant dependence distances. By using the iteration and distance for the source of the first dependence, and by our defined theorems, we present generalized and optimal algorithms for single loops with non-uniform dependences (MPSL). By the extension of the MPSL method, we also apply to exploit parallelism from nested loops with simple subscripts, based on cycle shrinking and loop interchanging method. The algorithms generalize how to transform general single loops with non-uniform dependences as well as nested loops with simple subscripts into parallel loops.

Efficient On-the-fly Detection of First Races in Shared-Memory Programs with Nested Parallelism (내포병렬성을 가진 공유메모리 프로그램의 수행중 최초경합 탐지를 위한 효율적 기법)

  • 하금숙;전용기;유기영
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.7_8
    • /
    • pp.341-351
    • /
    • 2003
  • For debugging effectively the shared-memory programs with nested parallelism, it is important to detect efficiently the first races which incur non-deterministic executions of the programs. Previous on-the-fly technique detects the first races in two passes, and shows inefficiencies both in execution time and memory space because the size of an access history for each shared variable depends on the maximum parallelism of program. This paper proposes a new on-the-fly technique to detect the first races in two passes, which is constant in both the number of event comparisons and the space complexity on each access to shared variable because the size of an access history for each shared variable is a small constant. This technique therefore makes on-the-fly race detection more efficient and practical for debugging shared-memory programs with nested parallelism.

The 3-Dimensional Visualization in Shared-Memory Programs with Nested Parallelism (내포 병렬성을 가진 공유메모리 프로그램의 3차원 시각화)

  • Park, Myeong-Chul;Hur, Hwa-Ra;Ha, Seok-Wun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.1
    • /
    • pp.53-58
    • /
    • 2008
  • A pellet program including a nested parallelism has a result of non-deterministic because of executed concurrently without synchronization. In order to detect like this error the visualization technique which is various is used. But the intuition characteristic is decreased because of limits of space and excessive abstraction. In this paper, proposes 3-D visualization engines which provide global structure of the arranging in a parallel program with nested parallelism which is complicated to the user. The visualization engine which is proposed provides global structure to the user as program easily to understand, it provides an effective debugging environment.

A Data Dependency Elimination Algorithm for Extracting Maximum Parallelism (최대 병렬성 추출을 위한 자료 종속성 제거 알고리즘)

  • 송월봉;박두순
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.1
    • /
    • pp.139-139
    • /
    • 1999
  • In most application programs, loops usually comprise most of the computation in a program and the most important source of parallelism. When the data dependency relation is uniformin terms of distance, several compile time parallelization methods were introduced. On the otherhand,when the data dependency relation is non-uniform in distance, the compile time extraction ofparallelism is much complicated. In this paper, a general method the extracting parallelism in nestedloops is presented. This algorithm can be applicable where the dependency relation is both uniform andnon-uniform in distance. According to execution repeatedly the statements in nested loops, thealgorithm which effectively removes these kind of data dependencies is developed in order to presentthe total parallelization of nested loops.

Parallelism point selection in nested parallelism situations with focus on the bandwidth selection problem (평활량 선택문제 측면에서 본 중첩병렬화 상황에서 병렬처리 포인트선택)

  • Cho, Gayoung;Noh, Hohsuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.383-396
    • /
    • 2018
  • Various parallel processing R packages are used for fast processing and the analysis of big data. Parallel processing is used when the work can be decomposed into tasks that are non-interdependent. In some cases, each task decomposed for parallel processing can also be decomposed into non-interdependent subtasks. We have to choose whether to parallelize the decomposed tasks in the first step or to parallelize the subtasks in the second step when facing nested parallelism situations. This choice has a significant impact on the speed of computation; consequently, it is important to understand the nature of the work and decide where to do the parallel processing. In this paper, we provide an idea of how to apply parallel computing effectively to problems by illustrating how to select a parallelism point for the bandwidth selection of nonparametric regression.

Unfolding Nested Loops of Functional Languages for Multithreaded Architectures (다중스레드 구조를 위한 함수형 언어의 중첩루프 펼침)

  • 하상호
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.826-836
    • /
    • 2002
  • We need an enormous amount of memories for name spaces as well as additional processors if we are to effectively exploit a massively parallelism in nested loops of functional languages such as Id. If there is no sufficient amount of memories enough to exploit that parallelism, the execution of programs can be aborted during the unfolding of loops. Additionally, if loops are overunfolded, compared with the number of processors available, the system performance can be degraded severely due to the overhead of loop unfolding. This paper suggests and analyzes an algorithm which can be used to effectively unfold nested loops of functional languages on multithreaded architectures. This algorithm has a feature to unfold a given nested loop safely and near optimally, considering the system resources of processors and memories available when the loop is to be unfolded.

Transform Nested Loops into MultiThread in Java Programming Language for Parallel Processing (자바 프로그래밍에서 병렬처리를 위한 중첩 루프 구조의 다중스레드 변환)

  • Hwang, Deuk-Young;Choi, Young-Keun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.8
    • /
    • pp.1997-2012
    • /
    • 1998
  • It is necessary to find out the parallelism in tlle sequential Java program to execute it on the parallel machine. The loop is a fundamental source to exploit parallelism as it process a large portion of total execution time in sequential Java program on the parallel machine. However, a complete parallel execution can hardly be achieved due to data dependence. This paper proposes the method of exploiting the implicit parallelism by structuring a dependence graph through the analysis of data dependence in the existing Java programming language having a nested loop structure. The parallel code generation method through the restructuring compiler and also the translation method of Java source program into multithread statement. which is supported by the Java programming language itself, are proposed here. The perforance evaluatlun of the program translaed into the thread statement is conducted using the trip cunt of loop and the trip Count of luop and the thread count as parameters The resttucturing compiler provides efficient way of exploiting parallelism by reducing manual overhead conveliing sequential Java program into parallel code. The execution time for the Java program as a result can be reduced un the parallel machine.

  • PDF

Extracting Maximum Parallelism for Parallel Computing (병렬 계산을 위한 최대 병렬성 추출 방법)

  • Park, Doo-Soon
    • The Journal of Korean Association of Computer Education
    • /
    • v.8 no.1
    • /
    • pp.93-103
    • /
    • 2005
  • Since the most program execution time is consumed in a loop structure, extracting parallelism from sequential loop programs is critical for the faster program execution. Conventional studies for extracting the parallelism are focused mostly on a uniform data dependence distance. In this paper, we proposed data dependency elimination method for a nested loop and extended data dependency elimination method to extract parallelism from the loop with procedure calls. The data dependency elimination method and the extended data dependency elimination method can be applied to uniform and non-uniform data dependency distance. We compared our method with conventional methods using CRAY-T3E for the performance evaluation. The results show that the proposed algorithms are very effective.

  • PDF

Locality-Conscious Nested-Loops Parallelization

  • Parsa, Saeed;Hamzei, Mohammad
    • ETRI Journal
    • /
    • v.36 no.1
    • /
    • pp.124-133
    • /
    • 2014
  • To speed up data-intensive programs, two complementary techniques, namely nested loops parallelization and data locality optimization, should be considered. Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor. Therefore, locality and parallelization may demand different loop transformations. As such, an integrated approach that combines these two can generate much better results than each individual approach. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate loop transformation. Applying this transformation results in coarse grain parallelism through exploiting the largest possible groups of outer permutable loops in addition to data locality through dependence satisfaction at inner loops. These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions.