• Title/Summary/Keyword: parallel computer processing

Search Result 652, Processing Time 0.029 seconds

The Development of a Parallel Media Streaming System and Performance Improvement (병렬 미디어 스트리밍 시스템 구현 및 성능 향상에 관한 연구)

  • Lee, Jun-Young;Pom, Jeong-Wung;Son, Seung-Chul;Kang, Mi-Young;Nam, Ji-Seung
    • Annual Conference of KIPS
    • /
    • 2005.11a
    • /
    • pp.233-236
    • /
    • 2005
  • 본 논문에서는 실시간 스트리밍 서비스를 제공하는 리눅스 기반의 병렬 미디어 스트리밍 시스템을 구현하였다. VoD 시스템은 증가하는 사용자의 부하를 효율적으로 처리할 수 있도록 고성능이며, 방대한 데이터를 위한 저장공간을 충분히 확보하여야 한다. 서비스의 질 또한 우수해야 하며 동영상의 경우 화질이 만족스러워야 함은 물론 끊어짐이 없어야 한다. 이를 만족시키고 최대 성능을 실현하기 위한 저장 파일들의 스트라이핑 정책을 제시하였다. 시뮬레이션을 통해 병렬 미디어 스트리밍 시스템의 성능향상을 보인다.

  • PDF

A Sclable Parallel Labeling Algorithm on Mesh Connected SIMD Computers (메쉬 구조형 SIMD 컴퓨터 상에서 신축적인 병렬 레이블링 알고리즘)

  • 박은진;이갑섭성효경최흥문
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.731-734
    • /
    • 1998
  • A scalable parallel algorithm is proposed for efficient image component labeling with local operatos on a mesh connected SIMD computer. In contrast to the conventional parallel labeling algorithms, where a single pixel is assigned to each PE, the algorithm presented here is scalable and can assign m$\times$m pixel set to each PE according to the input image size. The assigned pixel set is converted to a single pixel that has representative value, and the amount of the required memory and processing time can be highly reduced. For N$\times$N image, if m$\times$m pixel set is assigned to each PE of P$\times$P mesh, where P=N/m, the time complexity due to the communication of each PE and the computation complexity are reduced to O(PlogP) bit operations and O(P) bit operations, respectively, which is 1/m of each of the conventional method. This method also diminishes the amount of memory in each PE to O(P), and can decrease the number of PE to O(P2) =Θ(N2/m2) as compared to O(N2) of conventional method. Because the proposed parallel labeling algorithm is scalable, we can adapt to the increase of image size without the hardware change of the given mesh connected SIMD computer.

  • PDF

A Reconfigurable Parallel Processor for Efficient Processing of Mobile Multimedia (모바일 멀티미디어의 효율적 처리를 위한 재구성형 병렬 프로세서의 구조)

  • Yoo, Se-Hoon;Kim, Ki-Chul;Yang, Yil-Suk;Roh, Tae-Moon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.23-32
    • /
    • 2007
  • This paper proposes a reconfigurable parallel processor architecture which can efficiently implement various multimedia applications, such as 3D graphics, H.264/H.263/MPEG-4, JPEG/JPEG2000, and MP3. The proposed architecture directly connects memories and processors so that memory access time and power consumption are reduced. It supports floating-point operations needed in the geometry stage of 3D graphics. It adopts partitioned SIMD to reduce hardware costs. Conditional execution of instructions is used for easy development of parallel algorithms.

Implememtation of Fast Rasterizer processing using GPGPU based on SIMT structure (SIMT 구조 기반 GPGPU를 이용한 고속 Rasterizer 구현)

  • Kim, Chiyong
    • Journal of IKEEE
    • /
    • v.21 no.3
    • /
    • pp.276-279
    • /
    • 2017
  • In this paper, SIMT structure based GPGPU (General Purpose Computing on Graphics Processing Units) is used for accelerating the Rasterizer which constitutes the screen of the display device in pixel unit. The GPU has a large number of ALUs, and the processing is very fast because of parallel processing. Therefore, in this paper, we implemented a rasterizer that generates a 3D graphics model using a CPU that performs operations sequentially and a GPU that performs operations in parallel. We confirmed that proposed rasterizer in this paper is 1.45 times better than rasterizer using Intel CPU when generating one frame.

Analysis of Stator-Rotor Interactions by using Parallel Computer (정익-동익 상호작용의 병렬처리해석)

  • Lee J. J.;Choi J. M.;Lee D. H.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2004.10a
    • /
    • pp.111-114
    • /
    • 2004
  • CFD code that simulates stator-rotor interactions is developed applying parallel computing method. Modified Multi-Block Grid System which enhances perpendicularity in grid and is appropriate in parallel processing is introduced and Patched Algorithm is applied in sliding interface which is caused by movement of rotor. The experimental model in the turbo-machine is composed of 11 stators and 14 rotors. Analyses on two test cases which are one stator - one rotor model and three stators - four rotors model are performed. The results of the two cases have been compared with the experimental test data.

  • PDF

Performance Enhancement of Parallel Prime Sieving with Hybrid Programming and Pipeline Scheduling (혼합형 병렬처리 및 파이프라이닝을 활용한 소수 연산 알고리즘)

  • Ryu, Seung-yo;Kim, Dongseung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.10
    • /
    • pp.337-342
    • /
    • 2015
  • We develop a new parallelization method for Sieve of Eratosthenes algorithm, which enhances both computation speed and energy efficiency. A pipeline scheduling is included for better load balancing after proper workload partitioning. They run on multicore CPUs with hybrid parallel programming model which uses both message passing and multithreading computation. Experimental results performed on both small scale clusters and a PC with a mobile processor show significant improvement in execution time and energy consumptions.

Proposition and Evaluation of Parallelism-Independent Scheduling Algorithms for DAGs of Tasks with Non-Uniform Execution Time

  • Kirilka Nikolova;Atusi Maeda;Sowa, Masa-Hiro
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.289-293
    • /
    • 2000
  • We propose two new algorithms for parallelism-independent scheduling. The machine code generated from the compiler using these algorithms in its scheduling phase is parallelism-independent code, executable in minimum time regardless of the number of the processors in the parallel computer. Our new algorithms have the following phases: finding the minimum number of processors on which the program can be executed in minimal time, scheduling by an heuristic algorithm for this predefined number of processors, and serialization of the parallel schedule according to the earliest start time of the tasks. At run time tasks are taken from the serialized schedule and assigned to the processor which allows the earliest start time of the task. The order of the tasks decided at compile time is not changed at run time regardless of the number of the available processors which means there is no out-of-order issue and execution. The scheduling is done predominantly at compile time and dynamic scheduling is minimized and diminished to allocation of the tasks to the processors. We evaluate the proposed algorithms by comparing them in terms of schedule length to the CP/MISF algorithm. For performance evaluation we use both randomly generated DAGs (directed acyclic graphs) and DACs representing real applications. From practical point of view, the algorithms we propose can be successfully used for scheduling programs for in-order superscalar processors and shared memory multiprocessor systems. Superscalar processors with any number of functional units can execute the parallelism-independent code in minimum time without necessity for dynamic scheduling and out-of-order issue hardware. This means that the use of our algorithms will lead to reducing the complexity of the hardware of the processors and the run-time overhead related to the dynamic scheduling.

  • PDF

Parallel Computing on Intensity Offset Tracking Using Synthetic Aperture Radar for Retrieval of Glacier Velocity

  • Hong, Sang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.1
    • /
    • pp.29-37
    • /
    • 2019
  • Synthetic Aperture Radar (SAR) observations are powerful tools to monitor surface's displacement very accurately, induced by earthquake, volcano, ground subsidence, glacier movement, etc. Especially, radar interferometry (InSAR) which utilizes phase information related to distance from sensor to target, can generate displacement map in line-of-sight direction with accuracy of a few cm or mm. Due to decorrelation effect, however, degradation of coherence in the InSAR application often prohibit from construction of differential interferogram. Offset tracking method is an alternative approach to make a two-dimensional displacement map using intensity information instead of the phase. However, there is limitation in that the offset tracking requires very intensive computation power and time. In this paper, efficiency of parallel computing has been investigated using high performance computer for estimation of glacier velocity. Two TanDEM-X SAR observations which were acquired on September 15, 2013 and September 26, 2013 over the Narsap Sermia in Southwestern Greenland were collected. Atotal of 56 of 2.4 GHz Intel Xeon processors(28 physical processors with hyperthreading) by operating with linux environment were utilized. The Gamma software was used for application of offset tracking by adjustment of the number of processors for the OpenMP parallel computing. The processing times of the offset tracking at the 256 by 256 pixels of window patch size at single and 56 cores are; 26,344 sec and 2,055 sec, respectively. It is impressive that the processing time could be reduced significantly about thirteen times (12.81) at the 56 cores usage. However, the parallel computing using all the processors prevent other background operations or functions. Except the offset tracking processing, optimum number of processors need to be evaluated for computing efficiency.

An Analysis of Existing Studies on Parallel and Distributed Processing of the Rete Algorithm (Rete 알고리즘의 병렬 및 분산 처리에 관한 기존 연구 분석)

  • Kim, Jaehoon
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.7
    • /
    • pp.31-45
    • /
    • 2019
  • The core technologies for intelligent services today are deep learning, that is neural networks, and parallel and distributed processing technologies such as GPU parallel computing and big data. However, for intelligent services and knowledge sharing services through globally shared ontologies in the future, there is a technology that is better than the neural networks for representing and reasoning knowledge. It is a knowledge representation of IF-THEN in RIF or SWRL, which is the standard rule language of the Semantic Web, and can be inferred efficiently using the rete algorithm. However, when the number of rules processed by the rete algorithm running on a single computer is 100,000, its performance becomes very poor with several tens of minutes, and there is an obvious limitation. Therefore, in this paper, we analyze the past and current studies on parallel and distributed processing of rete algorithm, and examine what aspects should be considered to implement an efficient rete algorithm.

A Road Region Extraction Using OpenCV CUDA To Advance The Processing Speed (처리 속도 향상을 위해 OpenCV CUDA를 활용한 도로 영역 검출)

  • Lee, Tae-Hee;Hwang, Bo-Hyun;Yun, Jong-Ho;Choi, Myung-Ryul
    • Journal of Digital Convergence
    • /
    • v.12 no.6
    • /
    • pp.231-236
    • /
    • 2014
  • In this paper, we propose a processing speed improvement by adding a parallel processing based on device(graphic card) into a road region extraction by host(PC) based serial processing. The OpenCV CUDA supports the many functions of parallel processing method by interworking a conventional OpenCV with CUDA. Also, when interworking the OpenCV and CUDA, OpenCV functions completed a configuration are optimized the User's device(Graphic Card) specifications. Thus, OpenCV CUDA usage provides an algorithm verification and easiness of simulation result deduction. The proposed method is verified that the proposed method has a about 3.09 times faster processing speed than a conventional method by using OpenCV CUDA and graphic card of NVIDIA GeForce GTX 560 Ti model through experimentation.