Search | Korea Science

Efficient Task Distribution for Pig Monitoring Applications Using OpenCL (OpenCL을 이용한 돈사 감시 응용의 효율적인 태스크 분배)

Kim, Jinseong;Choi, Younchang;Kim, Jaehak;Chung, Yeonwoo;Chung, Yongwha;Park, Daihee;Kim, Hakjae
- KIPS Transactions on Computer and Communication Systems
- /
- v.6 no.10
- /
- pp.407-414
- /
- 2017
Pig monitoring applications consisting of many tasks can take advantage of inherent data parallelism and enable parallel processing using performance accelerators. In this paper, we propose a task distribution method for pig monitoring applications into a heterogenous computing platform consisting of a multicore-CPU and a manycore-GPU. That is, a parallel program written in OpenCL is developed, and then the most suitable processor is determined based on the measured execution time of each task. The proposed method is simple but very effective, and can be applied to parallelize other applications consisting of many tasks on a heterogeneous computing platform consisting of a CPU and a GPU. Experimental results show that the performance of the proposed task distribution method on three different heterogeneous computing platforms can improve the performance of the typical GPU-only method where every tasks are executed on a deviceGPU by a factor of 1.5, 8.7 and 2.7, respectively.
https://doi.org/10.3745/KTCCS.2017.6.10.407 인용 PDF KSCI

A architecture for parallel rendering processor with by effective memory organization (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 구조)

Kim, Kyung-Su;Yoon, Duk-Ki;Kim, Il-San;Park, Woo-Chan
- Journal of Korea Game Society
- /
- v.5 no.3
- /
- pp.39-47
- /
- 2005
Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simulaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. The experimental results show that proposed architecture achieves almost linear speedup at best case even in sixteen rasterizer
PDF

A Detection Tool of Message Races in Parallel Programs for Linux Cluster Systems (리눅스 클러스터 시스템을 위한 병렬프로그램의 메시지경합 탐지 도구)

Park, S.;Kim, Y.;Park, M.;Kim, S.;Lee, S.;Jun, Y.
- Annual Conference of KIPS
- /
- 2000.10a
- /
- pp.645-648
- /
- 2000
병렬 오류인 메시지경합을 가진 메시지전달 프로그램은 비결정적인 수행결과를 보이므로, 이를 탐지하고 수정하는 것이 어렵다. 기존의 메시지경합 탐지 도구들은 메시지경합에 관련된 간접적인 정보를 제공하는 수준이며, 메시지경합의 원인을 자동으로 탐지하지 못한다. 그리고 탐지과정 중에 부가적인 메시지전달 작업이 발생하며, 대상 프로그램을 수정해야 하는 부담이 있다. 본 논문에서 제안된 탐지 도구는 리눅스 클러스터 시스템을 위한 병렬 프로그램의 메시지경합을 자동으로 탐지하여 직접적인 경합 정보를 제공한다. 그리고 탐지 엔진 부분을 리눅스 커널에 설치함으로써 경합 탐지를 위한 부가적인 메시지전달의 필요성을 제거하고, 대상 프로그램의 수정없이 경합을 탐지할 수 있는 투명성을 제공한다.
PDF

Real-Time Monitoring of Resource for Distributed/Parallel Framework on the Web (웹 기반 분산/병렬 프레임워크상에서 실시간 자원 모니터링)

Kim, Su-Ja;Jeong, Jae-Hong;Song, Eun-Ha;Han, Sung-Kook;Joo, Su-Chong;Jeong, Young-Sik
- Annual Conference of KIPS
- /
- 2003.05a
- /
- pp.117-120
- /
- 2003
웹의 다양한 자원을 이용하여 고성능 작업 처리를 요구하는 분산/병렬 시스템은 균형적인 작업 할당을 위해 각 호스트의 성능 평가가 중요하다. 하지만 성능 평가에 대한 지속적인 신뢰하기가 어려우며 뿐만 아니라, 작업 도중 호스트의 성능 변화를 예측하기가 어렵다. 성능 변화에 따른 효율적인 작업 스케줄링이 필요하며, 자원 관리자는 작업을 수행중인 호스트에 대한 모니터가 요구된다. 본 논문에서는 자원 관리자와 시스템 관리자에게 효율적인 자원 정책을 제안하기 위해 각 호스트의 자원을 모니터하고, 분산/병렬 시스템의 작업 할당 메커니즘에 의해 각 호스트의 성능 평가 기준을 정한다 또한 관리자에게 실시간으로 호스트의 성능 변화에 따른 자원 정보를 관리하도록 다양한 시각화를 제공한다.
PDF

A Software Method for Improving the Performance of Real-time Rendering of 3D Games (3D 게임의 실시간 렌더링 속도 향상을 위한 소프트웨어적 기법)

Whang, Suk-Min;Sung, Mee-Young;You, Yong-Hee;Kim, Nam-Joong
- Journal of Korea Game Society
- /
- v.6 no.4
- /
- pp.55-61
- /
- 2006
Graphics rendering pipeline (application, geometry, and rasterizer) is the core of real-time graphics which is the most important functionality for computer games. Usually this rendering process is completed by both the CPU and the GPU, and a bottleneck can be located either in the CPU or the GPU. This paper focuses on reducing the bottleneck between the CPU and the GPU. We are proposing a method for improving the performance of parallel processing for real-time graphics rendering by separating the CPU operations (usually performed using a thread) into two parts: pure CPU operations and operations related to the GPU, and let them operate in parallel. This allows for maximizing the parallelism in processing the communication between the CPU and the GPU. Some experiments lead us to confirm that our method proposed in this paper can allow for faster graphics rendering. In addition to our method of using a dedicated thread for GPU related operations, we are also proposing an algorithm for balancing the graphics pipeline using the idle time due to the bottleneck. We have implemented the two methods proposed in this paper in our networked 3D game engine and verified that our methods are effective in real systems.
PDF

An Effective Parallel and Pipelined Algorithm with Minimum Delayed Time in VLIW System (VLIW 시스템에서의 최소 시간 지연을 갖는 효율적인 병렬 파이프라인 알고리즘)

Seo, Jang-Won;Song, Jin-Hui;Ryu, Cheon-Yeol;Jeon, Mun-Seok
- The Transactions of the Korea Information Processing Society
- /
- v.2 no.4
- /
- pp.466-476
- /
- 1995
This pater describes pipelining algorithm issues for a VLIW(Very Long Instruction Word) System and the effective pipelined processing method by occurrence in pipelined management of processor minimized to timing delay. The proposed algorithm is executed in pipeline and parallel processings, and by combining basic operations variable instruction set can be desinged for various applications. In this paper, we prove and analyze the efficiency of the proposed pipeline algorithm and compare with other processor pipeline algorithm in terms of time minimizing.
PDF

An Image Processing Learning System with An Actual Practice (현장실습이 가능한 영상처리 학습 시스템)

하석운;신현갑
- Journal of the Korea Computer Industry Society
- /
- v.4 no.10
- /
- pp.673-684
- /
- 2003
In order to understand the concepts of image processing with effect, a learning system with an actual practice is necessary. As most image processing learning materials have some inconvenient respects that it is difficult to understand the processing procedure because they simply show the processed results as figures in the contents, and also, a separate practice tool is needed to operate the source codes because they provide the program source codes as a part in the context. In this paper, in order to solve above inconvenient respects, an image processing learning system that be able to improve the learning effects as accomplishing the theory studying and the actual practice in parallel is proposed. As this system is composed on the bases of java, it is independent to the platforms and it is possible to implement on the Web.
PDF

Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing (병렬 연산을 이용한 방출 단층 영상의 재구성 속도향상 기초연구)

Park, Min-Jae;Lee, Jae-Sung;Kim, Soo-Mee;Kang, Ji-Yeon;Lee, Dong-Soo;Park, Kwang-Suk
- Nuclear Medicine and Molecular Imaging
- /
- v.43 no.5
- /
- pp.443-450
- /
- 2009
Purpose: Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. Materials and Methods: The preliminary tests for the possibility on virtual manchines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Results: Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Conclusion: Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify.
PDF KSCI

Parallel Distributed Implementation of GHT on MPI-based PC Cluster (MPI 기반 PC 클러스터에서 GHT의 병렬 분산 구현)

Kim, Yeong-Soo;Kim, Jeong-Sahm;Choi, Heung-Moon
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.44 no.3
- /
- pp.81-89
- /
- 2007
This paper presents a parallel distributed implementation of the GHT (generalized Hough transform) for the fast processing on the MPI-based PC cluster. We tried to achieve the higher speedup mainly by alleviating the communication overhead through the pipelined broadcast and accumulator array partition strategy and by time overlapping of the communication and the computation over entire process. Experimental results show that nearly linear speedup is reachable by the proposed method on the MPI-based PC clusters connected through 100Mbps Ethernet switch.
PDF KSCI

Low-area Bit-parallel Systolic Array for Multiplication and Square over Finite Fields

Kim, Keewon
- Journal of the Korea Society of Computer and Information
- /
- v.25 no.2
- /
- pp.41-48
- /
- 2020
In this paper, we derive a common computational part in an algorithm that can simultaneously perform multiplication and square over finite fields, and propose a low-area bit-parallel systolic array that reduces hardware through sequential processing. The proposed systolic array has less space and area-time (AT) complexity than the existing related arrays. In detail, the proposed systolic array saves about 48% and 44% of Choi-Lee and Kim-Kim's systolic arrays in terms of area complexity, and about 74% and 44% in AT complexity. Therefore, the proposed systolic array is suitable for VLSI implementation and can be applied as a basic component in hardware constrained environment such as IoT.
https://doi.org/10.9708/jksci.2020.25.02.041 인용 PDF KSCI

Search Result 652, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)