• Title/Summary/Keyword: Hybrid Parallel Processing

Search Result 62, Processing Time 0.023 seconds

Parallel Connected Component Labeling Based on the Selective Four Directional Label Search Using CUDA

  • Soh, Young-Sung;Hong, Jung-Woo
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.3
    • /
    • pp.83-89
    • /
    • 2015
  • Connected component labeling (CCL) is a mandatory step in image segmentation where objects are extracted and uniquely labeled. CCL is a computationally expensive operation and thus is often done in parallel processing framework to reduce execution time. Various parallel CCL methods have been proposed in the literature. Among them are NSZ label equivalence (NSZ-LE) method, modified 8 directional label selection (M8DLS) method, HYBRID1 method, and HYBRID2 method. Soh et al. showed that HYBRID2 outperforms the others and is the best so far. In this paper we propose a new hybrid parallel CCL algorithm termed as HYBRID3 that combines selective four directional label search (S4DLS) with label backtracking (LB). We show that the average percentage speedup of the proposed over M8DLS is around 60% more than that of HYBRID2 over M8DLS for various kinds of images.

An Improved Hybrid Approach to Parallel Connected Component Labeling using CUDA

  • Soh, Young-Sung;Ashraf, Hadi;Kim, In-Taek
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.1
    • /
    • pp.1-8
    • /
    • 2015
  • In many image processing tasks, connected component labeling (CCL) is performed to extract regions of interest. CCL was usually done in a sequential fashion when image resolution was relatively low and there are small number of input channels. As image resolution gets higher up to HD or Full HD and as the number of input channels increases, sequential CCL is too time-consuming to be used in real time applications. To cope with this situation, parallel CCL framework was introduced where multiple cores are utilized simultaneously. Several parallel CCL methods have been proposed in the literature. Among them are NSZ label equivalence (NSZ-LE) method[1], modified 8 directional label selection (M8DLS) method[2], and HYBRID1 method[3]. Soh [3] showed that HYBRID1 outperforms NSZ-LE and M8DLS, and argued that HYBRID1 is by far the best. In this paper we propose an improved hybrid parallel CCL algorithm termed as HYBRID2 that hybridizes M8DLS with label backtracking (LB) and show that it runs around 20% faster than HYBRID1 for various kinds of images.

A Study on Hybrid Image Coder Using a Reconfigurable Multiprocessor System (Study I : H/W Implementation) (재구성 가능한 다중 프로세서 시스템을 이용한 혼합 영상 보호화기 구현에 관한 연구 (연구 I : H/W구현))

  • 최상훈;이광기;김제익;윤승철;박규태
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.30B no.10
    • /
    • pp.1-12
    • /
    • 1993
  • A multiprocessor system for high-speed processing of hybrid image coding algorithms such as H.261, MPEG, or Digital HDTV is presented in this study. Using a combination of highly parallel 32-bit microprocessor, DCT(Discrete Cosine Transform), and motion detection processor, a new processing module is designed for the implementation of high performance coding system. The sysyem is implemented to allow parallel processing since a single module alone cannot perform hybrid coding algorithms at high speed, and crossbar switch is used to realize various parallel processing architectures by altering interconnections between processing modules within the system.

  • PDF

VDI Performance Optimization with Hybrid Parallel Processing in Thick Client System under Heterogeneous Multi-Core Environment (Heterogeneous 멀티 코어 환경의 Thick Client에서 VDI 성능 최적화를 위한 혼합 병렬 처리 기법 연구)

  • Kim, Myeong-Seob;Huh, Eui-Nam
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.3
    • /
    • pp.163-171
    • /
    • 2013
  • Recently, the requirement of processing High Definition (HD) video or 3D application on low, mobile devices has been expanded and content data has been increased as well. It is becoming a major issue in Cloud computing where a Virtual Desktop Infrastructure (VDI) Service needs efficient data processing ability to provide Quality of Experience (QoE) in Cloud computing. In this paper, we propose three kind of Thick-Thin VDI Service which can share and delegate VDI service based on Thick Client using CPU and GPU. Furthermore, we propose and discuss the VDI Service Optimization Method in mixed CPU and GPU Heterogeneous Environment using CPU Parallel Processing OpenMP and GPU Parallel Processing CUDA.

Parallel Processing of k-Means Clustering Algorithm for Unsupervised Classification of Large Satellite Images: A Hybrid Method Using Multicores and a PC-Cluster (대용량 위성영상의 무감독 분류를 위한 k-Means Clustering 알고리즘의 병렬처리: 다중코어와 PC-Cluster를 이용한 Hybrid 방식)

  • Han, Soohee;Song, Jeong Heon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.445-452
    • /
    • 2019
  • In this study, parallel processing codes of k-means clustering algorithm were developed and implemented in a PC-cluster for unsupervised classification of large satellite images. We implemented intra-node code using multicores of CPU (Central Processing Unit) based on OpenMP (Open Multi-Processing), inter-nodes code using a PC-cluster based on message passing interface, and hybrid code using both. The PC-cluster consists of one master node and eight slave nodes, and each node is equipped with eight multicores. Two operating systems, Microsoft Windows and Canonical Ubuntu, were installed in the PC-cluster in turn and tested to compare parallel processing performance. Two multispectral satellite images were tested, which are a medium-capacity LANDSAT 8 OLI (Operational Land Imager) image and a high-capacity Sentinel 2A image. To evaluate the performance of parallel processing, speedup and efficiency were measured. Overall, the speedup was over N / 2 and the efficiency was over 0.5. From the comparison of the two operating systems, the Ubuntu system showed two to three times faster performance. To confirm that the results of the sequential and parallel processing coincide with the other, the center value of each band and the number of classified pixels were compared, and result images were examined by pixel to pixel comparison. It was found that care should be taken to avoid false sharing of OpenMP in intra-node implementation. To process large satellite images in a PC-cluster, code and hardware should be designed to reduce performance degradation caused by file I / O. Also, it was found that performance can differ depending on the operating system installed in a PC-cluster.

A Genetic Algorithm for Scheduling Sequence-Dependant Jobs on Parallel Identical Machines (병렬의 동일기계에서 처리되는 순서의존적인 작업들의 스케쥴링을 위한 유전알고리즘)

  • Lee, Moon-Kyu;Lee, Seung-Joo
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.25 no.3
    • /
    • pp.360-368
    • /
    • 1999
  • We consider the problem of scheduling n jobs with sequence-dependent processing times on a set of parallel-identical machines. The processing time of each job consists of a pure processing time and a sequence-dependent setup time. The objective is to maximize the total remaining machine available time which can be used for other tasks. For the problem, a hybrid genetic algorithm is proposed. The algorithm combines a genetic algorithm for global search and a heuristic for local optimization to improve the speed of evolution convergence. The genetic operators are developed such that parallel machines can be handled in an efficient and effective way. For local optimization, the adjacent pairwise interchange method is used. The proposed hybrid genetic algorithm is compared with two heuristics, the nearest setup time method and the maximum penalty method. Computational results for a series of randomly generated problems demonstrate that the proposed algorithm outperforms the two heuristics.

  • PDF

Pilot Symbol Assisted Hybrid Detection for OFDM-Based Spatial Multiplexing Systems

  • So, Yoon-Jae;Jeon, Hyoung-Goo;You, Young-Hwan;Baek, Myung-Sun;Song, Hyoung-Kyu
    • ETRI Journal
    • /
    • v.26 no.5
    • /
    • pp.397-404
    • /
    • 2004
  • In this paper, we provide a new detection scheme for a pilot symbol assisted interference nulling and cancellation operation to reduce unexpected effects owing to parallel transmission in orthogonal frequency division multiplexing (OFDM)-based spatial multiplexing systems. We have shown that the investigated OFDM vertical Bell laboratories layered space time (VBLAST) detection based on hybrid processing performs better than ordinary OFDM-VBLAST detections based on serial processing and parallel processing, respectively.

  • PDF

Performance Enhancement of Parallel Prime Sieving with Hybrid Programming and Pipeline Scheduling (혼합형 병렬처리 및 파이프라이닝을 활용한 소수 연산 알고리즘)

  • Ryu, Seung-yo;Kim, Dongseung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.10
    • /
    • pp.337-342
    • /
    • 2015
  • We develop a new parallelization method for Sieve of Eratosthenes algorithm, which enhances both computation speed and energy efficiency. A pipeline scheduling is included for better load balancing after proper workload partitioning. They run on multicore CPUs with hybrid parallel programming model which uses both message passing and multithreading computation. Experimental results performed on both small scale clusters and a PC with a mobile processor show significant improvement in execution time and energy consumptions.

Performance Analysis of the Parallel CUPID Code for Various Parallel Programming Models in Symmetric Multi-Processing System (Symmetric Multi-Processing 시스템에서 다양한 병렬 기법 모델을 적용한 병렬 CUPID 코드의 성능분석)

  • Jeon, Byoung Jin;Lee, Jae Ryong;Yoon, Han Young;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.38 no.1
    • /
    • pp.71-79
    • /
    • 2014
  • A parallelization of the bi-conjugate gradient solver for the pressure equation of the CUPID (component unstructured program for interfacial dynamics) code, which was developed for analyzing the components of a pressurized water-cooled reactor, was studied in a symmetric multi-processing system. The parallel performance was investigated for three typical parallel programming models (MPI, OpenMP, Hybrid) by solving incompressible backward-facing step flow at various grid resolutions. It was confirmed that parallel performance was low when problem size was small or the memory requirement for each thread was considerably higher than the cache memory. Furthermore, it was shown that MPI was better than OpenMP regardless of the problem size, and Hybrid was the best when the number of threads was relatively small.

Hybrid Flow Shop with Parallel Machines at the First Stage and Dedicated Machines at the Second Stage

  • Yang, Jaehwan
    • Industrial Engineering and Management Systems
    • /
    • v.14 no.1
    • /
    • pp.22-31
    • /
    • 2015
  • In this paper, a two-stage hybrid flow shop problem is considered. Specifically, there exist identical parallel machines at stage 1 and two dedicated machines at stage 2, and the objective of the problem is to minimize makespan. After being processed by any machine at stage 1, a job must be processed by a specific machine at stage 2 depending on the job type, and one type of jobs can have different processing times on each machine. First, we introduce the problem and establish complexity of several variations of the problem. For some special cases, we develop optimal polynomial time solution procedures. Then, we establish some simple lower bounds for the problem. In order to solve this NP-hard problem, three heuristics based on simple rules such as the Johnson's rule and the LPT (Longest Processing Time first) rule are developed. For each of the heuristics, we provide some theoretical analysis and find some worst case bound on relative error. Finally, we empirically evaluate the heuristics.