Search | Korea Science

Optimizing Shared Memory Accesses for GPGPU Computations (GPGPU를 위한 공유 메모리 최적화)

Tran, Nhat-Phuong;Lee, Myungho;Hong, Sugwon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.11a
- /
- pp.197-199
- /
- 2012
Recently, a lot of general-purpose application programs in addition to graphic applications have been parallelized for boosting their performance using Graphic Processing Unit (GPU)'s excellent floating-point performance. In order to maximize the application performance on GPUs, optimizing the memory hierarchy and the on-chip caches such as the shared memory is essential. In this paper, we propose techniques to optimize the shared memory, and verify its effectiveness using a pattern matching application program.
https://doi.org/10.3745/PKIPS.y2012m11a.197 인용 PDF

An Analysis of Race Detection Tool for OpenMP Programs (OpenMP 프로그램을 위한 경합탐지 도구의 분석)

김영주;강문혜;전용기
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.10a
- /
- pp.478-480
- /
- 2003
공유메모리 기반의 OpenMP 프로그램에서 발생하는 경합은 의도하지 않은 비결정적 수행 결과를 초래하므로 효과적으로 경합을 탐지하는 도구가 필요하다. 본 연구는 OpenMP 프로그램의 경합탐지를 위한 Intel 사의 Thread Checker를 내포병렬성의 여부와 접근사건들의 분포 형태를 기준으로 개발한 커널프로그램 집합을 이용하여 분석한 결과로서, 스레드들을 순서적으로 수행하면서 내포된 스레드를 부모 스레드와 동일한 스레드로 간주하고 적어도 하나의 읽기와 쓰기 접근사건들을 유지하면서 수행중에 경합을 탐지하는 도구임을 보인다. 이 도구는 접근사건의 발생 시에 이전 접근사건들과의 경합 여부를 검사한 후에 그 접근사건의 유지 여부를 결정하므로, 논리적 병행성 관계를 반영하지 못하는 내포된 스레드가 존재하지 않으면 경합의 존재를 검증한다.
PDF

An Efficient Filtered Monitoring Technique for Dynamic Analysis of Parallel Programs (병렬 프로그램의 동적 분석을 위한 효율적인 감시 필터링 기술)

Park, Se-Won;Ha, Ok-Kyoon
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2016.01a
- /
- pp.9-10
- /
- 2016
본 논문에서는 멀티 스레드 기반 병렬 프로그램의 동적 분석을 통한 자료경합 탐지를 위해 효율적인 감시 필터링 기술을 제시한다. 제시하는 감시 필터링 기술은 자료경합과 같은 동시성 오류를 탐지하기 위해 프로그램의 수행 중에 발생하는 공유 메모리에 대한 스레드들의 접근 분석 시에 동일한 코드영역을 단순 반복하는 스레드들을 감시대상에서 제외시킴으로써 동적 분석을 위한 추가적인 오버헤드를 최소화한다. 제시하는 필터링 기술을 수행 중 자료경합 탐지 도구에 적용한 실험적 평가를 통해 개선된 수행시간 오버헤드의 제공이 가능함을 보인다.
PDF

A Study on Parallel Processing by Multi-Microprocessors (마이크로프로세서복합에 의한 병렬처리에 관한 연구)

Chung, Yon-Tack;Song, Young-Jae
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.17 no.5
- /
- pp.36-42
- /
- 1980
In this study, multi-microprocessors system in which slave microprocessor is conrlected with master microprocessor bus through the DMA controller is designed by the use of four 8085 CPU. A high degree of processing efficiency could be obtained by making this system work parallel processing. The result of measuring relat ions bet ween working microproressor and system throughput was 70-80 percents lower than ideal value Master microprocessor takes charge of resource allocation and scheduling, common memory assigns communication between microprocessors and a store of common data. The met hod of detecting Pa rallelism from source Program composed by series is also suggested.
PDF

Design and Implementation of a Java Package for Sharing Array Data by the DSM Interface on a Cluster of Workstations (워크스테이션 클러스터 상에서 분산공유메모리 인터페이스로 배열 데이터의 공유를 지원하는 Java 패키지의 설계와 구현)

Lim, Hae-Jung;Kim, Myung
- Journal of Korea Multimedia Society
- /
- v.2 no.3
- /
- pp.355-365
- /
- 1999
In this paper, we present JPAS(Java Package for Array Sharing) which is a Java Package for sharing arrays of data on a cluster of workstations. It allows us to divide an array of data into several pieces, and to place each piece on a different host. JPAS uses Java RMI so that the entire array can be accessed by a location transparent interface which is similar to that of a distributed shared memory system. JPAS is portable and easy to use since it is implemented using pure Java. In order to reduce network overhead, JPAS allows programmers to use their prior knowledge of the application. Data consistency can be maintained through the value updating methods defined for all the elements of an array. We developed parallel programs which use JPAS, and tested them on a cluster of workstations. The test results show that JPAS is a parallel programming tool with reasonably good performance.
PDF

Design Considerations on Large-scale Parallel Finite Element Code in Shared Memory Architecture with Multi-Core CPU (멀티코어 CPU를 갖는 공유 메모리 구조의 대규모 병렬 유한요소 코드에 대한 설계 고려 사항)

Cho, Jeong-Rae;Cho, Keunhee
- Journal of the Computational Structural Engineering Institute of Korea
- /
- v.30 no.2
- /
- pp.127-135
- /
- 2017
The computing environment has changed rapidly to enable large-scale finite element models to be analyzed at the PC or workstation level, such as multi-core CPU, optimal math kernel library implementing BLAS and LAPACK, and popularization of direct sparse solvers. In this paper, the design considerations on a parallel finite element code for shared memory based multi-core CPU system are proposed; (1) the use of optimized numerical libraries, (2) the use of latest direct sparse solvers, (3) parallelism using OpenMP for computing element stiffness matrices, and (4) assembly techniques using triplets, which is a type of sparse matrix storage. In addition, the parallelization effect is examined on the time-consuming works through a large scale finite element model.
https://doi.org/10.7734/COSEIK.2017.30.2.127 인용 PDF KSCI

On-the-fly Detection of the First Races for Reducing Bottlenecks by Summary Report Method (요약보고 방법에 의해 병목현상을 개선한 최초경합의 수행중 탐지기법)

Kim, Jeong-Si;Jeon, Yong-Gi
- Journal of KIISE:Computer Systems and Theory
- /
- v.26 no.9
- /
- pp.1042-1054
- /
- 1999
공유메모리 병렬프로그램의 오류수정에서 경합의 탐지는 중요하다. 왜냐하면 경합은 잘못된 수행 결과를 초래할 뿐만 아니라, 의도하지 않은 프로그램의 비결정적인 수행을 유발하여 오류수정을 어렵게 하기 때문이다. 특히 최초경합의 탐지는 더욱 중요하다. 그 이유는 최초경합을 제거함으로써 나머지 경합들을 방지할 수도 있기 때문이다. 기존의 수행중 경합 탐지기법들은 접근별 보고방식을 기반으로 하는데, 이 기법들은 임의 공유변수에 대한 병행 쓰레드들의 모든 접근사건들을 검사하기 위해서 접근역사라는 유일한 공유정보를 이용하므로 탐지과정에 심각한 병목현상을 유발시킨다. 그러나, 최초경합 탐지를 위한 경우 이러한 병목현상은 크게 개선될 수 있다. 본 논문에서는, 각 접근사건 검사를 위해 각 쓰레드에 공유되지 않는 독립적인 접근역사를 별개로 두고, 경합을 보고하는 시점인 쓰레드 합류시점에서만 공유되는 접근역사를 이용하도록 함으로써 병목현상을 개선하여 최초경합을 탐지할 수 있는 새로운 수행중 탐지기법을 제안한다. 그러므로 본 기법은 최초경합을 보다 효율적으로 탐지할 수 있기 때문에 수행중 경합 탐지를 더욱 효율적이고 실용적으로 할 수 있다. Abstract Detecting races is important for debugging shared-memory parallel programs, because the races lead to unintended nondeterministic executions of the programs as well as erroneous result and then make debugging programs difficult. Especially, detecting the first races is more important. The reason is that the removal of the first races can make other races disappear. Most existing on-the-fly techniques to detect the races are based on per- access reporting method incurring the serious central bottleneck, because the techniques use unique shared information called access history for checking all accesses of concurrent threads to a shared variable. Such bottleneck, however, can be improved considerably in case of detecting first races. This paper presents a new on-the-fly technique which detects the first races with reduced bottleneck through checking each accesses with private access histories and finally reporting races with shared access histories. Therefore, this technique makes on-the-fly race detection more efficient and practical.

A Software VIA based PC Cluster System on SCI Network (SCI 네트워크 상의 소프트웨어 VIA기반 PC글러스터 시스템)

Shin, Jeong-Hee;Chung, Sang-Hwa;Park, Se-Jin
- Journal of KIISE:Computer Systems and Theory
- /
- v.29 no.4
- /
- pp.192-200
- /
- 2002
The performance of a PC cluster system is limited by the use of traditional communication protocols, such as TCP/IP because these protocols are accompanied with significant software overheads. To overcome the problem, systems based on user-level interface for message passing without intervention of kernel have been developed. The VIA(Virtual Interface Architecture) is one of the representative user-level interfaces which provide low latency and high bandwidth. In this paper, a VIA system is implemented on an SCI(Scalable Coherent Interface) network based PC cluster. The system provides both message-passing and shared-memory programming environments and shows the maximum bandwidth of 84MB/s and the latency of $8{\mu}s$. The system also shows better performance in comparison with other comparable computer systems in carrying out parallel benchmark programs.
PDF KSCI

Design and Implementation of KDSM(KAIST Distributed Shared Memory) System (KDSM(KAIST Distributed Shared Memory) 시스템의 설계 및 구현)

Lee, Sang-Kwon;Yun, Hee-Chul;Lee, Joon-Won;Maeng, Seung-Ryoul
- Journal of KIISE:Computer Systems and Theory
- /
- v.29 no.5
- /
- pp.257-264
- /
- 2002
In this paper, we give a detailed description of KDSM(KAIST Distributed Shared Memory) system. KDSM is implemented as a user-level library running on Linux 2.2.13, and TCP/IP is used for communication. KDSM uses page-based invalidation protocol, multiple-writer protocol, and supports HLRC(Home-based Lazy Release Consistency) memory consistency model. To evaluate performance of KDSM, we executed 4 scientific applications and compared the result to JLAJLA. The results showed that performance of KDSM almost equal to JIAJIA for 2 applications and performance of KDSM is better than JIAJIA for 2 applications.
PDF KSCI

Design and Implementation of a High Performance Web Crawler (고성능 웹크롤러의 설계 및 구현)

권성호;이영탁;김영준;이용두
- Journal of Korea Society of Industrial Information Systems
- /
- v.8 no.4
- /
- pp.64-72
- /
- 2003
A Web crawler is an important Internet software technology used in a variety of Internet application software which includes search engines. As Internet continues to grow, implementations of high performance web crawlers are urgently demanded. In this paper, we study how to support dynamic scheduling for a multiprocess-based web crawler. For high peformance, web crawlers are usually based on multiprocess in their implementations. In these systems, crawl scheduling which manages the allocation of web pages to each process for loading is one of the important issues. In this paper, we identify issues which are important and challenging in the crawl scheduling. To address the issue, we propose a dynamic crawl scheduling framework and subsequently a system architecture for a web crawler with dynamic crawl scheduling support. And we analysed the behaviors of Web crawler. Based on the analysis result, we suggest the direction for the design of high performance Web crawler.
PDF

Search Result 44, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)