Search | Korea Science

A Study on the Automatic Parallelization Method and Tool Development

Shin, Woochang
- International Journal of Internet, Broadcasting and Communication
- /
- v.12 no.3
- /
- pp.87-94
- /
- 2020
Recently, computer hardware is evolving toward increasing the number of computing cores, not increasing the clock speed. In order to use the performance of parallelized hardware to the maximum, the running program must also be parallelized. However, software developers are accustomed to sequential programs, and in most cases, write programs that operate sequentially. They also have a lot of difficulty designing and developing software in parallel. We propose a method to automatically convert a sequential C/C++ program into a parallelized program, and develop a parallelization tool that supports it. It supports open multiprocessing (OpenMP) and parallel patterns library (PPL) as a parallel framework. Perfect automatic parallelization is difficult due to dynamic features such as pointer operation and polymorphism in C/C++ language. This study focuses on verifying the conditions of parallelization rather than focusing on fully automatic parallelization, and providing advice to developers in detail if parallelization is not possible.
https://doi.org/10.7236/IJIBC.2020.12.3.87 인용 PDF KSCI

A Study on Efficient Parallel Programming (효율적인 병렬처리 프로그램 방식에 관한 연구)

Yoon, Sang-Hyuk;Kim, Youngtae
- Proceedings of the Korea Information Processing Society Conference
- /
- 2016.04a
- /
- pp.67-69
- /
- 2016
분산 병렬 프로그램의 성능을 향상시키기 위하여 분산 컴퓨터에서는 메시지 전송 방식(MPI)을 사용하고 독립적인 컴퓨터 내에서는 OpenMP를 사용하여 성능을 향상시키는 혼합형 병렬 방식이 많이 사용되고 있다. 본 논문에서는 OpenMP방식과 MPI 방식을 혼용하는 방식을 순수 MPI만 사용하는 방식과 비교하여 성능을 분석하였다. 성능 분석 결과, MPI만을 사용하는 방식의 성능이 효율적임을 보여주었다.
https://doi.org/10.3745/PKIPS.y2016m04a.67 인용 PDF

Interest-Information Monitoring System for Debugging of Parallel Programs (병렬 프로그램의 디버깅을 위한 관심정보 모니터링 시스템)

Park, Myeong-Chul
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2007.10a
- /
- pp.607-610
- /
- 2007
In this paper, proposes the monitoring system it will be able to trace the executed of each threads in OpenMP based a parallel program. The monitoring system of existing in uses each threads label information and the analysis technique which uses the access-history was most. This has the problem which raises the time and space complexity which is caused by with massive information creation. In this paper, only the thread which includes interest information it creates tracking information with the target. And it provides information which is intuitive to the user it provides the visualization system for to a same time. The visualization model is composed the images-information of a base. This does to be it will be able to understandable a program execute situation using an image processing technique. Therefore, this paper provides the parallel program an effective debugging environment.
PDF

An Empirical Comparison of Monitoring Filtering Techniques for Dynamic Data Race Detection in Parallel Programs with OpenMP Directives (OpenMP 디렉티브 병렬프로그램에서의 동적 자료경합 탐지를 위한 감시 필터링 기술의 실험적 비교)

Cho, Ahra;Ha, Ok-Kyoon
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2016.07a
- /
- pp.1-2
- /
- 2016
다중 스레드 기반 병렬 프로그램에서의 자료경합 탐지는 동시에 수행되는 스레드 간의 비결정적인 상호작용 때문에 탐지하기 어려운 것으로 잘 알려져 있다. 동적 분석기술을 사용하여 자료경합을 탐지할 경우 프로그램 수행의 감시와 충돌하는 모든 메모리 연산의 분석을 위해 추가적인 오버헤드가 발생한다는 단점이 있다. 이러한 동적 분석의 추가적인 오버헤드를 줄이는 방법으로 감시 필터링 기술이 소개되고 있으며, 본 논문에서는 동적 자료경합 탐지를 위한 감시 필터링 기술 중 OpenMP 디렉티브 병렬 프로그램에 적용 가능한 두 기술을 대상으로 실용성과 효율성을 실험적으로 비교한다.
PDF

A Reconfigurable Parallel Processor for Efficient Processing of Mobile Multimedia (모바일 멀티미디어의 효율적 처리를 위한 재구성형 병렬 프로세서의 구조)

Yoo, Se-Hoon;Kim, Ki-Chul;Yang, Yil-Suk;Roh, Tae-Moon
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.10
- /
- pp.23-32
- /
- 2007
This paper proposes a reconfigurable parallel processor architecture which can efficiently implement various multimedia applications, such as 3D graphics, H.264/H.263/MPEG-4, JPEG/JPEG2000, and MP3. The proposed architecture directly connects memories and processors so that memory access time and power consumption are reduced. It supports floating-point operations needed in the geometry stage of 3D graphics. It adopts partitioned SIMD to reduce hardware costs. Conditional execution of instructions is used for easy development of parallel algorithms.
PDF KSCI

Hybrid Parallelization for High Performance of CFD_NIMR Model (기상 모델 CFD_NIMR의 최적 성능을 위한 혼합형 병렬 프로그램 구현)

Kim, Min-Wook;Choi, Young-Jean;Kim, Young-Tae
- Atmosphere
- /
- v.22 no.1
- /
- pp.109-115
- /
- 2012
We parallelized the CFD_NIMR model, which is a numerical meteorological model, for best performance on both of distributed and shared memory parallel computers. This hybrid parallelization uses MPI (Message Passing Interface) to apply horizontal 2-dimensional sub-domain out of the 3-dimensional computing domain for distributed memory system, as well as uses OpenMP (Open Multi-Processing) to apply vertical 1-dimensional sub-domain for utilizing advantage of shared memory structure. We validated the parallel model with the original sequential model, and the parallel CFD_NIMR model shows efficient speedup on the distributed and shared memory system.
https://doi.org/10.14191/Atmos.2012.22.1.109 인용 PDF KSCI

Towards a Fair Comparison of Parallel Machines (병렬컴퓨터들의 비교를 위한 기법)

Kim, Yeong-Tae
- Journal of KIISE:Computer Systems and Theory
- /
- v.26 no.1
- /
- pp.43-52
- /
- 1999
이 논문은 다른 병렬컴퓨터들의 비교를 통한 예를 이용하여 다음의 3 질문엣 중점을 두었다. (ⅰ) 각각의 다른 효율의 기준들이 다르게 적용되었을 때 어떻게 비교할 수 있는가\ulcorner (ⅱ) 병렬 컴퓨터의 설계에 있어서 연산과 통신 등의 구조적인 균형이 어떻게 컴퓨터의 효능에 영향을 미치게 되는가\ulcorner(ⅲ) 작은수의 빠른 프로세서들을 가진 병렬 컴퓨터와 많은 수의 덜 빠른 프로세서들을 가진 병렬컴퓨터중 어떤 것이 더 나은가\ulcorner 이 논문에서는 병렬컴퓨터 MasPar 16K 프로세서 MP-1과 4K 프로세서 MP-2가 예로써 비교된다. MP-2는 MP-1보다 프로세서의 개수는 적지만, 프로세서의 연산속도는 MP-1 보다 4-5 배 빠르다. 3가지의 다른 잘 알려진 수치 알고리즘들을 이용한 연산, 통신, 메모리 접근 그리고 기타의 오버헤드의 분석을 통하여 위의 질문들이 연구된다.

Implementation of Parallel Volume Rendering Using the Sequential Shear-Warp Algorithm (순차 Shear-Warp 알고리즘을 이용한 병렬볼륨렌더링의 구현)

Kim, Eung-Kon
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.6
- /
- pp.1620-1632
- /
- 1998
This paper presents a fast parallel algorithm for volume rendering and its implementation using C language and MPI MasPar Programming Language) on the 4,096 processor MasPar MP-2 machine. This parallel algorithm is a parallelization hased on the Lacroute' s sequential shear - warp algorithm currently acknowledged to be the fastest sequential volume rendering algorithm. This algorithm reduces communication overheads by using the sheared space partition scheme and the load balancing technique using load estimates from the previous iteration, and the number of voxels to be processed by using the run-length encoded volume data structure.Actual performance is 3 to 4 frames/second on the human hrain scan dataset of $128\times128\times128$ voxels. Because of the scalability of this algorithm, performance of ]2-16 frames/sc.'cond is expected on the 16,384 processor MasPar MP-2 machine. It is expected that implementation on more current SIMD or MIMD architectures would provide 3O~60 frames/second on large volumes.
PDF

Detecting the First Race in OpenMP Program with Nested Parallelism (내포 병렬성을 가지는 OpenMP 프로그램의 최초 경합 탐지)

Chon, Byoung-Gyu;Woo, Jong-Jung;Jun, Yong-Kee
- The KIPS Transactions:PartA
- /
- v.8A no.3
- /
- pp.253-260
- /
- 2001
It is important to detect races for debugging shared-memoy parallel programs, because the races cause unintended nondeterministic program execution. Previous on-the-fly techniques to detect races can not guarantee the first race detection in nested parallel programs. Detecting the first race is important for debugging parallel programs, since the removal of the first race may make the next occurred races disappear. In this paper, we presents an on-the-fly detection technique to detect all of the first races through the reexecution of the debugged programs. We assume that the debugged parallel program may have one-way nested parallel programs. The number of reexecution is at the least the nesting depth of the program in the worst case. The space complexity is O(VT) and the time complexity to detect race in each access of access history is O(T), where V is number of shared variables and T is the maximum parallelism of the program. This efficiency of our technique in each execution is the same with the previous on-the-fly detection techniques. Therefore, this technique makes debugging parallel programs more effective and practical.
PDF

Performance Comparison of Two Parallel LU Decomposition Algorithms on MasPar Machines

Kim, Yong-Tae
- Journal of IKEEE
- /
- v.2 no.2 s.3
- /
- pp.247-254
- /
- 1998
This paper presents a performance study of two LU decomposition algorithms on two massively parallel SIMD machines: the 16K processor MasPar MP-1 and the 4K processor MasPar MP-2. The paper presents experimental results and an analysis of the algorithms to explain the results. While the blocked and the nonblocked algorithms for LU decomposition have been studied individually by others, we compare the two algorithms and identify the tradeoffs between them. Our analysis of the blocked algorithm shows how the block size affects the interprocessor communication cost and the memory read/write overhead. The analysis in this paper is useful to determine an optimum block size for the blocked algorithm.
PDF

Search Result 34, Processing Time 0.035 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)