• Title/Summary/Keyword: parallel computer processing

Search Result 652, Processing Time 0.027 seconds

Probabilistic Power-saving Scheduling of a Real-time Parallel Task on Discrete DVFS-enabled Multi-core Processors (이산적 DVFS 멀티코어 프로세서 상에서 실시간 병렬 작업을 위한 확률적 저전력 스케쥴링)

  • Lee, Wan Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.2
    • /
    • pp.31-39
    • /
    • 2013
  • In this paper, we propose a power-efficient scheduling scheme that stochastically minimizes the power consumption of a real-time parallel task while meeting the deadline on multicore processors. The proposed scheme applies the parallel processing that executes a task on multiple cores concurrently, and activates a part of all available cores with unused cores powered off, in order to save power consumption. It is proved that the proposed scheme minimizes the mean power consumption of a real-time parallel task with probabilistic computation amount on DVFS-enabled multicore processors with a finite set of discrete clock frequencies. Evaluation shows that the proposed scheme saves up to 81% power consumption of the previous method.

(Task Creation and Allocation for Static Load Balancing in Parallel Spatial Join (병렬 공간 조인 시 정적 부하 균등화를 위한 작업 생성 및 할당 방법)

  • Park, Yun-Phil;Yeom, Keun-Hyuk
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.418-429
    • /
    • 2001
  • Recently, a GIS has been applicable to the most important computer applications such as urban information systems and transportation information systems. These applications require spatial operations for an efficient management of a large volume of data. In particular, a spatial join among basic operations has the property that its response time is increased exponentially according to the number of spatial objects included in the operation. Therefore, it is not proper to the systems demanding the fast response time. To satisfy these requirements, the efficient parallel processing of spatial joins has been required. In this paper, the efficient method for creating and allocating tasks to balance statically the load of each processor in a parallel spatial join is presented. A task graph is developed in which a vertex weight is calculated by the cost model I have proposed. Then, it is partitioned through a graph partitioning algorithm. According to the experiments in CC16 parallel machine, our method made an improvement in the static load balance by decreasing the variance of a task execution time on each processor.

  • PDF

Parallel Accessed Mirroring based on Stripping (스트라이핑 기반의 병렬 접근 미러링 기법)

  • Kang, Dong-Jae;Kim, Chang-Soo;Shin, Bum-Joo;Kim, Hag-Young
    • Annual Conference of KIPS
    • /
    • 2002.11a
    • /
    • pp.539-542
    • /
    • 2002
  • 멀티미디어와 인터넷의 대중화가 야기한 급격한 데이터의 증가는 테라(Tera)바이트 이상의 대용량 저장공간과 대용량 정보의 효율적인 공유를 지원하는 스토리지 시스템을 요구하고 있으며 이를 위하여 SAN 기반의 스토리지 클러스터링 시스템들이 많이 사용되고 있다. 이러한 환경에서 하드웨어 또는 소프트웨어 RAID(Redundant Array of Independent Disks)는 대용량 정보의 고성능의 입출력과 신뢰성을 위해서 필수적이 되었다. 범용적인 RAID로는 RAE-0, RAID-1, RAID-5가 주로 사용되고 있으며 각각의 레벨은 장단점을 갖는다. 본 논문에서는 RAID-0와 RAID-1이 갖는 문제점들의 보완을 위하여 변형된 RAID 레벨인 RAID-SM을 제안한다. RAID-SM은 기존의 RAID-1이 가지는 데이터의 가용성을 유지하면서 추가적인 비용 없이 RAID-0의 우수한 입출력 성능을 얻기 위한 RAID-1의 변형된 방식이다. RAID-SM의 구현을 위하여 디스크상의 데이터의 배치 및 데이터 맵핑 탕식을 정의하고 RAID-SM에서의 I/O방법을 기술한다. 제안하는 RAID-SM은 멀티미디어나 GIS 데이터와 같은 읽기 연산 집약적인 시스템을 대상으로 하는 안정적인 레이드 방식이며 RAID-SM의 장점 및 성능은 본 논문에서의 실험을 통한 결과로서 제시한다.

  • PDF

An Implementation of a Video-Equipped Real-Time Fire Detection Algorithm Using GPGPU (GPGPU를 이용한 비디오 기반 실시간 화재감지 알고리즘 구현)

  • Shon, Dong-Koo;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.8
    • /
    • pp.1-10
    • /
    • 2014
  • This paper proposes a parallel implementation of the video based 4-stage fire detection algorithm using a general-purpose graphics processing unit (GPGPU) to support real-time processing of the high computational algorithm. In addition, this paper compares the performance of the GPGPU based fire detection implementation with that of the CPU implementation to show the effectiveness of the proposed method. Experimental results using five fire included videos with an SXGA ($1400{\times}1050$) resolution, the proposed GPGPU implementation achieves 6.6x better performance that the CPU implementation, showing 30.53ms per frame which satisfies real-time processing (30 frames per second, 30fps) of the fire detection algorithm.

An Explicit Superconcentrator Construction for Parallel Interconnection Network (병렬 상호 연결망을 위한 초집중기의 구성)

  • Park, Byoung-Soo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.1
    • /
    • pp.40-48
    • /
    • 1998
  • Linear size expanders have been studied in many fields for the practical use, which make it possible to connect large numbers of device chips in both parallel communication systems and parallel computers. One major limitation on the efficiency of parallel computer designs has been the highly cost of parallel communication between processors and memories. Linear order concentrators can be used to construct theoretically optimal interconnection network schemes. Existing explicitly defined constructions are based on expanders, which have large constant factors, thereby rendering them impractical for reasonable sized networks. For these objectives, we use the more detailed matching points in permutation functions, to find out the bigger expansion constant from an equation, $\mid\Gamma_x\mid\geq[1+d(1-\midX\mid/n)]\midX\mid$. This paper presents an improvement of expansion constant on constructing concentrators using expanders, which realizes the reduction of the size in a superconcentrator by a constant factor. As a result, this paper shows an explicit construction of (n, 5, $1-\sqrt{3/2}$) expander. Thus, superconcentrators with 209n edges can be obtained by applying to the expanders of Gabber and Galil's construction.

  • PDF

CUDA-based Object Oriented Programming Techniques for Efficient Parallel Visualization of 3D Content (3차원 콘텐츠의 효율적인 병렬 시각화를 위한 CUDA 환경 기반 객체 지향 프로그래밍 기법)

  • Park, Tae-Jung
    • Journal of Digital Contents Society
    • /
    • v.13 no.2
    • /
    • pp.169-176
    • /
    • 2012
  • This paper presents a parallel object-oriented programming (OOP) platform for efficient visualization of three-dimensional content in CUDA environments. For this purpose, this paper discusses the features and limitations in implementing C++ object-oriented codes using CUDA and proposes the solutions. Also, it presents how to implement a 3D parallel visualization platform based on the MVC (Model/View/Controller) design pattern. Also, it provides sample implementations for integral MLS (iMLS) and signed distance fields (SDFs) based on the Marching Cubes and Raytracing. The proposed approach enables GPU parallel processing only by implementing simple interfaces. Based on this, developers can expect general benefits that are common in general OOP techniques including abstractization and inheritance. Though I implemented only two specific samples in this paper, I expect my approach can be widely applied to general computer graphics problems.

Design of a Parallel Rendering Processor Architecture with Effective Memory System (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 설계)

  • Park Woo-Chan;Yoon Duk-Ki;Kim Kyoung-Su
    • The KIPS Transactions:PartA
    • /
    • v.13A no.4 s.101
    • /
    • pp.305-316
    • /
    • 2006
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. For the above two goals, effective memory organizations including a new pixel cache architecture are presented. The experimental results show that the proposed architecture achieves almost linear speedup at best case even in sixteen rasterizers.

Parallel Implementation and Performance Evaluation of the SIFT Algorithm Using a Many-Core Processor (매니코어 프로세서를 이용한 SIFT 알고리즘 병렬구현 및 성능분석)

  • Kim, Jae-Young;Son, Dong-Koo;Kim, Jong-Myon;Jun, Heesung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.9
    • /
    • pp.1-10
    • /
    • 2013
  • In this paper, we implement the SIFT(Scale-Invariant Feature Transform) algorithm for feature point extraction using a many-core processor, and analyze the performance, area efficiency, and system area efficiency of the many-core processor. In addition, we demonstrate the potential of the proposed many-core processor by comparing the performance of the many-core processor with that of high-performance CPU and GPU(Graphics Processing Unit). Experimental results indicate that the accuracy result of the SIFT algorithm using the many-core processor was same as that of OpenCV. In addition, the many-core processor outperforms CPU and GPU in terms of execution time. Moreover, this paper proposed an optimal model of the SIFT algorithm on the many-core processor by analyzing energy efficiency and area efficiency for different octave sizes.

A Study on High Speed Image Rotation Algorithm using CUDA (CUDA를 이용한 고속 영상 회전 알고리즘에 관한 연구)

  • Kwon, Hee-Choul;Cho, Hyung-Jin;Kwon, Hee-Yong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.5
    • /
    • pp.1-6
    • /
    • 2016
  • Image rotation is one of main pre-processing step in image processing or image pattern recognition. It is implemented with rotation matrix multiplication. However it requires lots of floating point arithmetic operations and trigonometric function calculations, so it takes long execution time. We propose a new high speed image rotation algorithm without two major time-consuming operations. It use just 2 shear translation operations, so it is very fast. In addition, we apply a parallel computing technique with CUDA. CUDA is a massively parallel computing architecture using prevailed GPU recently. As GPU is a dedicated graphic processor, it is exellent for parallel processing of pixels. We compare the proposed algorithm with the conventional rotation one with various size images. Experimental results show that the proposed algorithm is superior to the conventional rotation ones.

A Study of designing Parallel File System for Massive Information Processing (대규모 정보처리를 위한 병렬 화일시스템 설계에 관한 연구)

  • Jang, Si-Ung;Jeong, Gi-Dong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.5
    • /
    • pp.1221-1230
    • /
    • 1997
  • In this study, the performance of a parallel file system(N-PFS), which is inplemented using conventional disks as disk arrays on a Workstation Cluster, is analyzed by using analytical method and adtual values in experiments.N-PFS can be used as high-performance file sever in small-scale server systems and effciently pro-cess massive data I/Os such as multimedia and scientifid data. In this paper, an analytical model was suggested and the correctness of the suggested was verified by analyzing the experimental values on a system.The result of the appropriate stping unit for processing massive data of the Workstation Cluster with 8 disks is 64-128Kbytes and the maximum throughput on it is 15.8 Mbytes/ses.In addition, the performance of parallel file system on massive data is bounded by the time required to copy data between buffers.

  • PDF