• Title/Summary/Keyword: MP Parallel Computer

Search Result 34, Processing Time 0.02 seconds

Accelerating Group Fusion for Ligand-Based Virtual Screening on Multi-core and Many-core Platforms

  • Mohd-Hilmi, Mohd-Norhadri;Al-Laila, Marwah Haitham;Hassain Malim, Nurul Hashimah Ahamed
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.724-740
    • /
    • 2016
  • The performance issues of screening large database compounds and multiple query compounds in virtual screening highlight a common concern in Chemoinformatics applications. This study investigates these problems by choosing group fusion as a pilot model and presents efficient parallel solutions in parallel platforms, specifically, the multi-core architecture of CPU and many-core architecture of graphical processing unit (GPU). A study of sequential group fusion and a proposed design of parallel CUDA group fusion are presented in this paper. The design involves solving two important stages of group fusion, namely, similarity search and fusion (MAX rule), while addressing embarrassingly parallel and parallel reduction models. The sequential, optimized sequential and parallel OpenMP of group fusion were implemented and evaluated. The outcome of the analysis from these three different design approaches influenced the design of parallel CUDA version in order to optimize and achieve high computation intensity. The proposed parallel CUDA performed better than sequential and parallel OpenMP in terms of both execution time and speedup. The parallel CUDA was 5-10x faster than sequential and parallel OpenMP as both similarity search and fusion MAX stages had been CUDA-optimized.

Efficient Scientific Computation on WP Parallel Computer (MP 병렬컴퓨터에서 효과적인 과학계산의 수행)

  • 김선경
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.8 no.4
    • /
    • pp.26-30
    • /
    • 2003
  • The Lanczos algorithm is the most commonly used in approximating a small number of extreme eigenvalues for symmetric large sparse matrices. Global communications in MP(Message Passing) parallel computer decrease the computation speed. In this paper, we introduce the s-step Lanczos method, and s-step method generates reduction matrices which are similar to reduction matrices generated by the standard Lanczos method. One iteration of the s-step Lanczos algorithm corresponds to s iterations of the standard Lanczos algorithm. The s-step method has the minimized global communication and has the superior parallel properties to the standard method. These algorithms are implemented on Cray T3E and performance results are presented.

  • PDF

Performance Analysis of a Parallel Mesh Smoothing Algorithm using Graph Coloring and OpenMP (그래프 컬러링과 OpenMP를 이용한 병렬 메쉬 스무딩 알고리즘의 성능 분석)

  • Shin, Myeonggyu;Kim, Jibum
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.6
    • /
    • pp.80-87
    • /
    • 2016
  • We propose a parallel mesh smoothing algorithm using graph coloring and OpenMP library for shared memory many core computer architectures. The proposed algorithm partitions a mesh into independent sets and performs a parallel mesh smoothing using OpenMP library. We study the effect of using various graph coloring and color reordering algorithms on the efficiency of performing the proposed parallel mesh smoothing algorithm. We also investigate the influence of using various OpenMP loop scheduling methods on the parallel mesh smoothing efficiency.

Implementation and Translation of Major OpenMP Directives for Chip Multiprocessor without using OS (단일 칩 다중 프로세서상에서 운영체제를 사용하지 않은 OpenMP 구현 및 주요 디렉티브 변환)

  • Jeun, Woo-Chul;Ha, Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.4
    • /
    • pp.145-157
    • /
    • 2007
  • OpenMP is an attractive parallel programming model for a chip multiprocessor because there is no standard parallel programming method for a chip multiprocessor and it is easy to write a parallel program in OpenMP. Then, chip multiprocessor systems can have various architectures according to target application programs. So, we need to implement OpenMP in different way for each system. In this paper, we propose the implementation and the effective translation of major OpenMP directives for a chip multiprocessor without using OS to improve the performance without using special hardware and without extending the OpenMP directives. We present the experimental results on our target platform CT3400.

Parallel Computation of Elliptic Partial Differential Equation on MP-2 (MP-2에서의 타원형 편미분 방정식 병렬계산)

  • Kim, Hyoung-Joong;Lee, Yong-Ho
    • Journal of Industrial Technology
    • /
    • v.14
    • /
    • pp.19-28
    • /
    • 1994
  • We can get a tridiagonal block Toeplitz linear system by the finite difference approximation of 2-D Poisson equation. To exploit the nice property of this linear equation, we transform the equation into a Lyapunov equation and apply DST (discrete sine transform) to get diagonal matrix based Lyapunov equation. DST can be performed using FFT, which enables high-speed computaion. All the computations are performed on an SIMD parallel computer, the MasPar MP-2 with 4,096 processing elements. In this paper, parallel algorithm, mapping method of the algorithm onto the MP-2, and timing results are presented.

  • PDF

Implementation of Underwater Simulation of a Net using OpenMP (OpenMP 병렬프로그램을 이용한 그물의 수중형상 시뮬레이션 구현)

  • Park, Myeong-Chul;Park, Seok-Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.2
    • /
    • pp.11-17
    • /
    • 2008
  • The net shape effects by the various vectors in underwater. Each particle of the net calculating the effect of all vectors augments an accuracy and reality. But, the time complexity becomes larger because of huge calculation. The previous techniques reduced a physics reality. And embodied the underwater virtual reality which augments visual reality with simulation. In this paper, parallel processing the particles, it embodied the simulation which is satisfied a physical reality and time reality. The parallel processing used the OpenMP, and the reality graphic expression used the OpenGL. The simulation which this paper Proposes will be the possibility becoming the fundamental data for a model analysis or a specialist system from game and marine field.

  • PDF

Performance Comparison of Parallel Programming Frameworks in Digital Image Transformation

  • Shin, Woochang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.3
    • /
    • pp.1-7
    • /
    • 2019
  • Previously, parallel computing was mainly used in areas requiring high computing performance, but nowadays, multicore CPUs and GPUs have become widespread, and parallel programming advantages can be obtained even in a PC environment. Various parallel programming frameworks using multicore CPUs such as OpenMP and PPL have been announced. Nvidia and AMD have developed parallel programming platforms and APIs for program developers to take advantage of multicore GPUs on their graphics cards. In this paper, we develop digital image transformation programs that runs on each of the major parallel programming frameworks, and measure the execution time. We analyze the characteristics of each framework through the execution time comparison. Also a constant K indicating the ratio of program execution time between different parallel computing environments is presented. Using this, it is possible to predict rough execution time without implementing a parallel program.

Implementation of augmented reality using parallel structure (병렬구조를 이용한 증강현실 구현)

  • Park, Tae-Ryong;Heo, Hoon;Kwak, Jae-Chang
    • Journal of IKEEE
    • /
    • v.17 no.3
    • /
    • pp.371-377
    • /
    • 2013
  • This thesis propose an efficient parallel structure method for implementing a FAST and BRIEF algorithm based Augmented Reality. SURF algorithm that is well known in the object recognition algorithms is robust in object recognition. However, there is a disadvantage for real time operation because, SURF implementation requires a lot of computation. Therefore, we used a FAST and BRIEF algorithm for object recognition, and we improved Conventional Parallel Structure based on OpenMP Library. As a result, it achieves a 70%~100% improvement in execution time on the embedded system.

PERFORMANCE OF A KNIGHT TOUR PARALLEL ALGORITHM ON MULTI-CORE SYSTEM USING OPENMP

  • VIJAYAKUMAR SANGAMESVARAPPA;VIDYAATHULASIRAMAN
    • Journal of applied mathematics & informatics
    • /
    • v.41 no.6
    • /
    • pp.1317-1326
    • /
    • 2023
  • Today's computers, desktops and laptops were build with multi-core architecture. Developing and running serial programs in this multi-core architecture fritters away the resources and time. Parallel programming is the only solution for proper utilization of resources available in the modern computers. The major challenge in the multi-core environment is the designing of parallel algorithm and performance analysis. This paper describes the design and performance analysis of parallel algorithm by taking the Knight Tour problem as an example using OpenMP interface. Comparison has been made with performance of serial and parallel algorithm. The comparison shows that the proposed parallel algorithm achieves good performance compared to serial algorithm.

Acceleration for Removing Sea-fog using Graphic Processors and Parallel Processing (그래픽 프로세서를 이용한 병렬연산 기반 해무 제거 고속화)

  • Kim, Young-doo;Kwak, Jae-min;Seo, Young-ho;Choi, Hyun-jun
    • Journal of Advanced Navigation Technology
    • /
    • v.21 no.5
    • /
    • pp.485-490
    • /
    • 2017
  • In this paper, we propose a technique for high speed removal of sea-fog using a graphic processor. This technique uses a host processor(CPU) and several graphics processors(GPU) capable of parallel processing to remove sea-fog from the input image. In the process of removing sea-fog, the dark channel extraction, the maximum brightness channel extraction, and the calculation of the transmission are performed by the host processor, and the process of refining the transmission by applying the bidirectional filter is performed in parallel through the graphic processor. To verify the proposed parallel processing method, three NVIDIA GTX 1070 GPUs were used to construct the verification environment. As a result, it takes about 140ms when implemented with one graphics processor, and 26ms when implemented using OpenMP and multiple GPGPUs. The proposed a parallel processing algorithm based on the graphics processor unit can be used for safe navigation, port control and monitoring system.