통합 검색 | Korea Science

Parallelizing Imperfectly Nested Loops

Kim, Ki-Chang
- Journal of Electrical Engineering and information Science
- /
- 제1권1호
- /
- pp.140-150
- /
- 1996
Loops are some of the richest program constructs where parallelism is available. Exploiting fine-grain parallelizm out these constructs is particularly important in light of the growing popularity of superscalar and VLIW machines. This paper explains how the fine-grain parallelization techniques can be generalized to handle nested loops. Our technique integrates nested loop parallelization techniques at the fine-grain level, thus exposing more fine-grain parallelism, and is flexible enough to handle non-perfectly nested loops. Examples and some experimental results are presented to illustrate our approach.
PDF

Parallelism for Nested Loops with Simple Subscripts

Jeong, Sam-Jin
- International Journal of Contents
- /
- 제4권4호
- /
- pp.1-6
- /
- 2008
In this paper, we propose improved loop splitting method for maximizing parallelism of single loops with non-constant dependence distances. By using the iteration and distance for the source of the first dependence, and by our defined theorems, we present generalized and optimal algorithms for single loops with non-uniform dependences (MPSL). By the extension of the MPSL method, we also apply to exploit parallelism from nested loops with simple subscripts, based on cycle shrinking and loop interchanging method. The algorithms generalize how to transform general single loops with non-uniform dependences as well as nested loops with simple subscripts into parallel loops.
https://doi.org/10.5392/IJoC.2008.4.4.001 인용 PDF

다중 루프문의 병렬처리를 위한 타스크 스케줄링에 관한 연구 (Study on Task Scheduling for Parallel Processing of Nested Loops)

허정연;손윤구
- 전자공학회논문지B
- /
- 제29B권1호
- /
- pp.11-17
- /
- 1992
This paper is to propose an analytical queuing model for parallel processing of sequential program with nested loops. The analytical results are compared with the results from the implemented multiprocessor system composed of four intel 8088 microprocessor, eight 2KB shared common memories, and a hardware token ring. At results, this study shows that the processed results are almost similar in proposed analytical model and real system. Proposed analytical model can be applied to evaluate parallel processing of sequential program with nested loops.
PDF

Locality-Conscious Nested-Loops Parallelization

Parsa, Saeed;Hamzei, Mohammad
- ETRI Journal
- /
- 제36권1호
- /
- pp.124-133
- /
- 2014
To speed up data-intensive programs, two complementary techniques, namely nested loops parallelization and data locality optimization, should be considered. Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor. Therefore, locality and parallelization may demand different loop transformations. As such, an integrated approach that combines these two can generate much better results than each individual approach. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate loop transformation. Applying this transformation results in coarse grain parallelism through exploiting the largest possible groups of outer permutable loops in addition to data locality through dependence satisfaction at inner loops. These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions.
https://doi.org/10.4218/etrij.14.0113.0266 인용 PDF KSCI

다중스레드 구조를 위한 함수형 언어의 중첩루프 펼침 (Unfolding Nested Loops of Functional Languages for Multithreaded Architectures)

하상호
- 한국정보과학회논문지:소프트웨어및응용
- /
- 제29권11호
- /
- pp.826-836
- /
- 2002
Id 언어와 같은 함수형 언어의 중천루프에 포함된 미세한 수준의 대규모 병렬성을 다중스레드 구조상에서 이용하려면 프로세서뿐만 아니라, 이름공간을 위한 상당히 말은 기억공간 등의 자원이 추가로 요구된다. 이러한 병렬성을 포함하는 중첩루프론 시스템 자원 제한 없이 무분별하게 펼쳐서 실행하려고 한다면, 실행도중 기억공간의 자원의 고갈로 인하여 프로그램의 실행이 중단될 수 있다. 또한, 루프의 펼침에 따른 부담으로 인하여 프로세서의 수에 비해서 루프를 지나치게 많이 펼치는 경우에, 병렬 수행의 효과가 상당히 떨어질 수 있다. 본 논문에서는 함수형 언어의 중첩루프를 다중스레드 구조상에서 효과적으로 펼쳐서 실행할 수 있는 알고리즘을 제안하고 분석한다. 제안된 알고리즘의 특성은 주어진 중첩루프를 펼칠 시점에 프로세서 수와 기억공간의 현재 사용 가능한 시스템 자원 양에 제한하여 안전하면서도 가능한 최적으로 펼친다는데 있다.
PDF KSCI

Enhanced Region Partitioning Method of Non-perfect nested Loops with Non-uniform Dependences

Jeong Sam-Jin
- International Journal of Contents
- /
- 제1권1호
- /
- pp.40-44
- /
- 2005
This paper introduces region partitioning method of non-perfect nested loops with non-uniform dependences. This kind of loop normally can't be parallelized by existing parallelizing compilers and transformations. Even when parallelized in rare instances, the performance is very poor. Based on the Convex Hull theory which has adequate information to handle non-uniform dependences, this paper proposes an enhanced region partitioning method which divides the iteration space into minimum parallel regions where all the iterations inside each parallel region can be executed in parallel by using variable renaming after copying.
PDF

효율적인 Nested Loops Join을 위한 조인순서 선정 및 인덱스 구성에 관한 연구 (The Study of the Method that to Choice Efficient Nested Loops Join Order and the Index Design)

;여정모
- 한국정보처리학회:학술대회논문집
- /
- 한국정보처리학회 2013년도 춘계학술발표대회
- /
- pp.877-880
- /
- 2013
정보시스템의 기반이 되는 관계형 데이터베이스에서는 데이터의 양에 따라 성능 차이가 발생한다. 데이터베이스에 관한 여러 가지 기능에 대한 이해가 부족하여 많은 성능 저하 문제를 유발하는데, 그중에 조인 성능문제가 큰 비중을 차지하고 있다. 아주 드문 경우가 아니라면 대부분의 데이터 처리는 하나 이상의 테이블이 필요하기 때문이다. 조인을 정확하게 사용하면 성능 개선에 큰 이점을 가져 올 수 있다. 본 연구는 관계형 데이터베이스 기반의 가장 기본적인 조인방식인 Nested Loops Join 방식을 효율적으로 수행하기 위한 조인순서 선정 및 인덱스 구성에 관한 연구를 하였다. 연구 결과를 평가하기 위해서 SQL Trace을 추출한 후 성능을 비교함으로써 선정된 조인순서가 효율적인 것을 입증하였다. 또한 기존의 응답시간을 기준으로 성능평가방법보다 액세스한 데이터 블록 수를 기준으로 한 성능 평가방법이 더 근본적으로 조인 성능을 개선할 수 있음을 증명하였다. 차후에는 더 복잡한 조인 형태 및 다른 조인방식의 성능개선 방법에 관한 연구를 진행할 것이다.
https://doi.org/10.3745/PKIPS.y2013m05a.877 인용 PDF

최대 병렬성 추출을 위한 자료 종속성 제거 알고리즘 (A Data Dependency Elimination Algorithm for Extracting Maximum Parallelism)

송월봉;박두순
- 한국정보과학회논문지:소프트웨어및응용
- /
- 제26권1호
- /
- pp.139-139
- /
- 1999
In most application programs, loops usually comprise most of the computation in a program and the most important source of parallelism. When the data dependency relation is uniformin terms of distance, several compile time parallelization methods were introduced. On the otherhand,when the data dependency relation is non-uniform in distance, the compile time extraction ofparallelism is much complicated. In this paper, a general method the extracting parallelism in nestedloops is presented. This algorithm can be applicable where the dependency relation is both uniform andnon-uniform in distance. According to execution repeatedly the statements in nested loops, thealgorithm which effectively removes these kind of data dependencies is developed in order to presentthe total parallelization of nested loops.

완전 중첩 루프에서 병렬처리를 위한 새로운 동기화 기법 (A New Synchronization Scheme for Parallel Processing on Perfectly Nested Do Loops)

이광형;황종선;박두순;김병수
- 전자공학회논문지B
- /
- 제31B권10호
- /
- pp.1-10
- /
- 1994
In most application programs, loops usually contain most of the computation in a program and are the most improtant source of parallelism. When loops are executed on multiprocessors, the cross iteration data dependences need to be enforced by synchronization between processors. In this paper, we propose a new synchronization scheme(Free/Hold) for reducing overgeads occured by synchronization variables in data oriented scheme and delay of time occured by synchronization instruction in statement oriented scheme. The Free/Hold mechanism enforces the correct execution order by inserting synchronization instruction between each instance with data dependence relationship using the RD(Real dependence Distance). We also present an algorithm for removing unnecessary dependences in one-to-many dependences.
PDF

An Improving Method of Restructuring Parallel Programs for Data Race Detection

Ha, Keum-Sook;Lee, Sung woo;Yoo, Kee-Young
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2000년도 ITC-CSCC -2
- /
- pp.715-718
- /
- 2000
Although shared memory parallel programs are designed to be deterministic both in their final results and intermediate states, the races that occur when different processes access a common memory location in an order not guaranteed by synchronization could result in unintended non-deterministic executions of the program. So, Detecting races, particularly first data races, is important for debugging explicit shared memory parallel programs. It is possible that all data races reported by other on-the-fly algorithms would disappear once the first races were removed. To detect races parallel programs with nested loops and inter-thread coordination, it must guarantee the order of synchronization operations in an execution instance. In this paper, we propose an improved restructuring method that guarantee ordering execution instance and preserve the semantics of original program. This method requires O(np) time and (s + up) space, where n is the number of total operations, s is the number of synchronization operations and p is the number of parallelism in the execution. Also, this method makes on-the-fly detection of parallel program with nested loops and inter-thread coordination more easily in space and time complexity.
PDF

검색결과 21건 처리시간 0.028초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)