• Title/Summary/Keyword: parallel computer processing

Search Result 652, Processing Time 0.023 seconds

Design and Performance Analysis of the H/V-bus Parallel Computer (H/V-버스 병렬컴퓨터의 설계 및 성능 분석)

  • 김종현
    • Journal of the Korea Society for Simulation
    • /
    • v.3 no.1
    • /
    • pp.29-42
    • /
    • 1994
  • The architecture of a MIMD-type parallel computer system is specified: a simulator is developed to support design and evaluation of systems based on the architecture: and conducted with the simulator to evaluate system performance. The horizontal/vertical-bus(H/V-bus) system architecture provides an NxN array of processing elements which communicate with each other through a network of N horizontal buses and N vertical buses. The simulator, written in SLAM II and FORTRAN, is designed to provide high-resolution in simulating the IPC mechanism. Parameters provide the user with independent control of system size, PE speed and IPC mechanism speed. Results generated by the simulator include execution times, PE utilizations, queue lengths, and other data. The simulator is used to study system performance when a partial differential equation is solved by parallel Gauss-Seidel method. For comparisons, the benchmark is also executed on a single-bus system simulator that is derived from the H/V-bus system simulator. The benchmark is also solved on a single PE to obtain data for computing speedups. An extensive analysis of results is presented.

  • PDF

Vertex disjoint covering cycle set in hypercubes (하이퍼큐브에서의 정점을 공유하지 않는 커버링사이클 집합)

  • Park, Won;Lim, Hyeong-Seok
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.11-14
    • /
    • 2003
  • In interconnection network for parallel processing, the cycle partitioning problem for parallel transmission with faulty vertieces or edges is very important. In this paper, we assume that k($\leq$m-1) edges do not share any vertices of m dimension hypercube Q$_{m}$ and show that it is possible to construct a cycle set which consists of k cycles covering all the vertices of the hypercube and one cycle including one of the given edges. This cycle set can be used to parallel transmission between two vertices joined by faulty edges.s.

  • PDF

Time Complexity Measurement on CUDA-based GPU Parallel Architecture of Morphology Operation

  • Izmantoko, Yonny S.;Choi, Heung-Kook
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.4
    • /
    • pp.444-452
    • /
    • 2013
  • Operation time of a function or procedure is a thing that always needs to be optimized. Parallelizing the operation is the general method to reduce the operation time of the function. One of the most powerful parallelizing methods is using GPU. In image processing field, one of the most commonly used operations is morphology operation. Three types of morphology operations kernel, na$\ddot{i}$ve, global and shared, are presented in this paper. All kernels are made using CUDA and work parallel on GPU. Four morphology operations (erosion, dilation, opening, and closing) using square structuring element are tested on MRI images with different size to measure the speedup of the GPU implementation over CPU implementation. The results show that the speedup of dilation is similar for all kernels. However, on erosion, opening, and closing, shared kernel works faster than other kernels.

A Study on Task Allocation of Parallel Spatial Joins using Fixed Grids (고정 그리드를 이용한 병렬 공간 조인의 태스크 할당에 관한 연구)

  • Kim, Jin-Deok;Seo, Yeong-Deok;Hong, Bong-Hui
    • The KIPS Transactions:PartD
    • /
    • v.8D no.4
    • /
    • pp.347-360
    • /
    • 2001
  • The most expensive spatial operation in spatial databases is a spatial join which computes a combined table of which tuple consists of two tuples of the two tables satisfying a spatial predicate. Although the execution time of sequential processing of a spatial join has been so far considerably improved, the response time is not tolerable because of not meeting the requirements of interactive users. It is usually appropriate to use parallel processing to improve the performance of spatial join processing. However, as the number of processors increases, the efficiency of each processor decreases rapidly because of the disk bottleneck and the overhead of message passing. This paper proposes the method of task allocation to soften the disk bottleneck caused by accessing the shared disk at the same time, and to minimize message passing among processors. In order to evaluate the performance of the proposed method in terms of the number of disk accesses and message passing, we conduct experiments on the two kinds of parallel spatial join algorithms. The experimental tests on the MIMD parallel machine with shared disks show that the proposed semi-dynamic task allocation method outperforms the static and dynamic task allocation methods.

  • PDF

Vergence control of parallel stereoscopic camera using the binocular disparity information (시차정보를 이용한 수평이동방식 입체영상 카메라의 주시각제어)

  • Kwon, Ki-Chul;Kim, Nam
    • Korean Journal of Optics and Photonics
    • /
    • v.15 no.2
    • /
    • pp.123-129
    • /
    • 2004
  • This paper concerns auto vergence control of a parallel stereoscopic camera through geometrical analysis. In the construction of a parallel stereoscopic camera, we experimentally demonstrated linear relationship between the key object distance and the amount of vergence control. And we proposed a vergence control system for the stereoscopic camera using binocular disparity information. For the real-time calculation of disparity information, the Hybrid Cepstral filter algorithm, with input data acquired from the vertical projection data and from the down sampling data from the source images, was proposed for precision and high speed processing. With the disparity information algorithm and the vergence control of the parallel stereoscopic camera system, the stereoscopic images become more like those of the human eye.

Two-Level Tries: A General Acceleration Structure for Parallel Routing Table Accesses

  • Mingche, Lai;Lei, Gao
    • Journal of Communications and Networks
    • /
    • v.13 no.4
    • /
    • pp.408-417
    • /
    • 2011
  • The stringent performance requirement for the high efficiency of routing protocols on the Internet can be satisfied by exploiting the threaded border gateway protocol (TBGP) on multi-cores, but the state-of-the-art TBGP performance is restricted by a mass of contentions when racing to access the routing table. To this end, the highly-efficient parallel access approach appears to be a promising solution to achieve ultra-high route processing speed. This study proposes a general routing table structure consisting of two-level tries for fast parallel access, and it presents a heuristic-based divide-and-recombine algorithm to solve a mass of contentions, thereby accelerating the parallel route updates of multi-threading and boosting the TBGP performance. As a projected TBGP, this study also modifies the table operations such as insert and lookup, and validates their correctness according to the behaviors of the traditional routing table. Our evaluations on a dual quad-core Xeon server show that the parallel access contentions decrease sharply by 92.5% versus the traditional routing table, and the maximal update time of a thread is reduced by 56.8 % on average with little overhead. The convergence time of update messages are improved by 49.7%.

CUDA-based Parallel Bi-Conjugate Gradient Matrix Solver for BioFET Simulation (BioFET 시뮬레이션을 위한 CUDA 기반 병렬 Bi-CG 행렬 해법)

  • Park, Tae-Jung;Woo, Jun-Myung;Kim, Chang-Hun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.1
    • /
    • pp.90-100
    • /
    • 2011
  • We present a parallel bi-conjugate gradient (Bi-CG) matrix solver for large scale Bio-FET simulations based on recent graphics processing units (GPUs) which can realize a large-scale parallel processing with very low cost. The proposed method is focused on solving the Poisson equation in a parallel way, which requires massive computational resources in not only semiconductor simulation, but also other various fields including computational fluid dynamics and heat transfer simulations. As a result, our solver is around 30 times faster than those with traditional methods based on single core CPU systems in solving the Possion equation in a 3D FDM (Finite Difference Method) scheme. The proposed method is implemented and tested based on NVIDIA's CUDA (Compute Unified Device Architecture) environment which enables general purpose parallel processing in GPUs. Unlike other similar GPU-based approaches which apply usually 32-bit single-precision floating point arithmetics, we use 64-bit double-precision operations for better convergence. Applications on the CUDA platform are rather easy to implement but very hard to get optimized performances. In this regard, we also discuss the optimization strategy of the proposed method.

A CPU-GPU Hybrid System of Environment Perception and 3D Terrain Reconstruction for Unmanned Ground Vehicle

  • Song, Wei;Zou, Shuanghui;Tian, Yifei;Sun, Su;Fong, Simon;Cho, Kyungeun;Qiu, Lvyang
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1445-1456
    • /
    • 2018
  • Environment perception and three-dimensional (3D) reconstruction tasks are used to provide unmanned ground vehicle (UGV) with driving awareness interfaces. The speed of obstacle segmentation and surrounding terrain reconstruction crucially influences decision making in UGVs. To increase the processing speed of environment information analysis, we develop a CPU-GPU hybrid system of automatic environment perception and 3D terrain reconstruction based on the integration of multiple sensors. The system consists of three functional modules, namely, multi-sensor data collection and pre-processing, environment perception, and 3D reconstruction. To integrate individual datasets collected from different sensors, the pre-processing function registers the sensed LiDAR (light detection and ranging) point clouds, video sequences, and motion information into a global terrain model after filtering redundant and noise data according to the redundancy removal principle. In the environment perception module, the registered discrete points are clustered into ground surface and individual objects by using a ground segmentation method and a connected component labeling algorithm. The estimated ground surface and non-ground objects indicate the terrain to be traversed and obstacles in the environment, thus creating driving awareness. The 3D reconstruction module calibrates the projection matrix between the mounted LiDAR and cameras to map the local point clouds onto the captured video images. Texture meshes and color particle models are used to reconstruct the ground surface and objects of the 3D terrain model, respectively. To accelerate the proposed system, we apply the GPU parallel computation method to implement the applied computer graphics and image processing algorithms in parallel.

Holographic femtosecond laser processing

  • Hayasaki, Yoshio
    • Proceedings of the Optical Society of Korea Conference
    • /
    • 2008.07a
    • /
    • pp.61-63
    • /
    • 2008
  • Parallel femtosecond laser processing using a computer-generated hologram (CGH) displayed on a liquid crystal spatial light modulator (LCSLM) is demonstrated. The use of the LCSLM enables to perform an arbitrary and variable patterning. This holographic femtosecond laser processing has advantages of high throughput and high light-use efficiency. A critical issue is to precisely control the intensities of the diffraction peaks of the CGH. We demonstrate some methods for the control of the diffraction peaks. We also demonstrate the laser processing with two-dimensional and three-dimensional parallelism.

  • PDF

Development of the design methodology for large-scale database based on MongoDB

  • Lee, Jun-Ho;Joo, Kyung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.57-63
    • /
    • 2017
  • The recent sudden increase of big data has characteristics such as continuous generation of data, large amount, and unstructured format. The existing relational database technologies are inadequate to handle such big data due to the limited processing speed and the significant storage expansion cost. Thus, big data processing technologies, which are normally based on distributed file systems, distributed database management, and parallel processing technologies, have arisen as a core technology to implement big data repositories. In this paper, we propose a design methodology for large-scale database based on MongoDB by extending the information engineering methodology based on E-R data model.