• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.04 seconds

Performance Improvement of Prediction-Based Parallel Gate-Level Timing Simulation Using Prediction Accuracy Enhancement Strategy (예측정확도 향상 전략을 통한 예측기반 병렬 게이트수준 타이밍 시뮬레이션의 성능 개선)

  • Yang, Seiyang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.12
    • /
    • pp.439-446
    • /
    • 2016
  • In this paper, an efficient prediction accuracy enhancement strategy is proposed for improving the performance of the prediction-based parallel event-driven gate-level timing simulation. The proposed new strategy adopts the static double prediction and the dynamic prediction for input and output values of local simulations. The double prediction utilizes another static prediction data for the secondary prediction once the first prediction fails, and the dynamic prediction tries to use the on-going simulation result accumulated dynamically during the actual parallel simulation execution as prediction data. Therefore, the communication overhead and synchronization overhead, which are the main bottleneck of parallel simulation, are maximally reduced. Throughout the proposed two prediction enhancement techniques, we have observed about 5x simulation performance improvement over the commercial parallel multi-core simulation for six test designs.

Domain Decomposition using Substructuring Method and Parallel Computation of the Rigid-Plastic Finite Element Analysis (부구조법에 의한 영역 분할 및 강소성 유한요소해석의 병렬 계산)

  • Park, Keun;Yang, Dong-Yol
    • Transactions of Materials Processing
    • /
    • v.7 no.5
    • /
    • pp.474-480
    • /
    • 1998
  • In the present study a domain decomposition scheme using the substructuring method is developed for the computational efficiency of the finite element analysis of metal forming processes. in order to avoid calculation of an inverse matrix during the substructuring procedure, the modified Cholesky decomposition method is implemented. As obtaining the data independence by the substructuring method the program is easily paralleized using the Parallel Virtual machine(PVM) library on a work-station cluster connected on networks. A numerical example for a simple upsetting is calculated and the speed-up ratio with respect to various number of subdomains and number of processors. The efficiency of the parallel computation is discussed by comparing the results.

  • PDF

A Synchronous/Asynchronous Hybrid Parallel Power Iteration for Large Eigenvalue Problems by the MPMD Methodology (MPMD 방식의 동기/비동기 병렬 혼합 멱승법에 의한 거대 고유치 문제의 해법)

  • Park, Pil-Seong
    • The KIPS Transactions:PartA
    • /
    • v.11A no.1
    • /
    • pp.67-74
    • /
    • 2004
  • Most of today's parallel numerical schemes use synchronous algorithms, where some processors that have finished their tasks earlier than others must wait at synchronization points for correct computation. Hence overall performance of the system is dependent upon the speed of the slowest processor. In this paper, we det·ise a synchronous/asynchronous hybrid algorithm to accelerate convergence of the solution for finding the dominant eigenpair of a large matrix, by reducing the idle times of faster processors using MPMD programming methodology.

Direct Methods for Linear System on Distributed Memory Parallel Computers

  • Nishimura, S.;Shigehara, T.;Mizoguchi, H.;Mishima, T.;Kobayashi, H.
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.333-336
    • /
    • 2000
  • We discuss the direct methods (Gauss-Jordan and Gaussian eliminations) to solve linear systems on distributed memory parallel computers. It will be shown that the so-called row-cyclic storage gives rise to the best performance among the standard three (row-cyclic, column-cyclic and cyclic-cyclic) data storages. We also show that Gauss-Jordan elimination, rather than Gaussian elimination, is highly efficient for the direct solution of linear systems in parallel processing, though Gauss-Jordan elimination requires a larger number of arithmetic operations than Gaussian elimination. Numerical experiment is performed on HITACHI SR12201 with the standard libraries MPI and BLAS.

  • PDF

An efficient architecture for motion estimation processor satisfying CCITT H.261 (CCITT H.261를 위한 효율적인 구조의 움직임 추정 프로세서 VLSI 설계)

  • 주락현;김영민
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.1
    • /
    • pp.30-38
    • /
    • 1995
  • In this paper, we propose an efficient architecture for motion estimation processor which performs one of essential functions in moving picture coding algorithms. Simple control mechanism of data flow in register array which stores pixel data, parallel processing of pixel data and pipelining scheme in arithmetic umit allow this architecture to process a 352*288 pixel image at the frame rate of 30fs, which is compatable with CCITT standard H.261.

  • PDF

DATA LINKS BETWEEN EMS AND SCADA SYSTEMS (EMS - SCADA 자료연계)

  • Lee, Kyung-Jae;Yu, Sung-Chul;Kim, Yeong-Han;Lee, Hyo-Sang
    • Proceedings of the KIEE Conference
    • /
    • 1987.11a
    • /
    • pp.109-113
    • /
    • 1987
  • This paper briefly introduces a strategy of data links between EMS and SCADA systems for the KEPCO's EMS project. For the data links, front end/communication processors are adopted that provide for protocol conversion and communication processing in parallel with EMS/SCADA processing. The data link configuration for a communication channel is point to point links. The EMS - hardware and software - has capability of handling links utilizing both bisync protocol and X.25 (Level 1 and 2)protocol for the data link protocol.

  • PDF

Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing (병렬 연산을 이용한 방출 단층 영상의 재구성 속도향상 기초연구)

  • Park, Min-Jae;Lee, Jae-Sung;Kim, Soo-Mee;Kang, Ji-Yeon;Lee, Dong-Soo;Park, Kwang-Suk
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.43 no.5
    • /
    • pp.443-450
    • /
    • 2009
  • Purpose: Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. Materials and Methods: The preliminary tests for the possibility on virtual manchines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Results: Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Conclusion: Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify.

Efficient Parallel Spatial Join Processing Method in a Shared-Nothing Database Cluster System (비공유 공간 클러스터 환경에서 효율적인 병렬 공간 조인 처리 기법)

  • Chung, Warn-Ill;Lee, Chung-Ho;Bae, Hae-Young
    • The KIPS Transactions:PartD
    • /
    • v.10D no.4
    • /
    • pp.591-602
    • /
    • 2003
  • Delay and discontinuance phenomenon of service are cause by sudden increase of the network communication amount and the quantity consumed of resources when Internet users are driven excessively to a conventional single large database sewer. To solve these problems, spatial database cluster consisted of several single nodes on high-speed network to offer high-performance is risen. But, research about spatial join operation that can reduce the performance of whole system in case process at single node is not achieved. So, in this paper, we propose efficient parallel spatial join processing method in a spatial database cluster system that uses data partitions and replications method that considers the characteristics of space data. Since proposed method does not need the creation step and the assignment step of tasks, and does not occur additional message transmission between cluster nodes that appear in existent parallel spatial join method, it shows performance improvement of 23% than the conventional parallel R-tree spatial join for a shared-nothing architecture about expensive spatial join queries. Also, It can minimize the response time to user because it removes redundant refinement operation at each cluster node.

Automatic Parameter Tuning for Simulated Annealing based on Threading Technique and its Application to Traveling Salesman Problem

  • Fangyan Dong;Iyoda, Eduardo-Masato;Kewei Chen;Hajime Nobuhara;Kaoru Hirota
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.439-442
    • /
    • 2003
  • In order to solve the difficulties of parameter settings in SA algorithm, an improved practical SA algorithm is proposed by employing the threading techniques, appropriate software structures, and dynamic adjustments of temperature parameters. Threads provide a mechanism to realize a parallel processing under a disperse environment by controlling the flux of internal information of an application. Thread services divide a process by multiple processes leading to parallel processing of information to access common data. Therefore, efficient search is achieved by multiple search processes, different initial conditions, and automatic temperature adjustments. The proposed are methods are evaluated, for three types of Traveling Salesman Problem (TSP) (random-tour, fractal-tour, and TSPLIB test data)are used for the performance evaluation. The experimental results show that the computational time is 5% decreased comparing to conventional SA algorithm, furthermore there is no need for manual parameter settings. These results also demonstrate that the proposed method is applicable to real-world vehicle routing problems.

  • PDF

Design and Implementation of a DSP Chip for Portable Multimedia Applications (휴대 멀티미디어 응용을 위한 DSP 칩 설계 및 구현)

  • 윤성현;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.12
    • /
    • pp.31-39
    • /
    • 1998
  • This paper presents the design and implementation of a new multimedia fixed-point DSP (MDSP) core for portable multimedia applications. The MDSP instruction set is designed through the analysis of multimedia algorithms and DSP instruction sets. The MDSP architecture employs parallel processing techniques, such as SIMD and vector processing as well as DSP techniques. The instruction set can handle various data formats and MDSP can perform two MAC operations in parallel. The switching network and packing network can increase the performance by overlapping data rearrangement cycles with computation cycles. We have designed Verilog HDL models and the 0.6 $\mu\textrm{m}$ Samsung KG75000 SOG library is used. The total gate count is 68,831 and the clock frequency is 30 MHz.

  • PDF