• Title/Summary/Keyword: MP Parallel Computer

Search Result 34, Processing Time 0.027 seconds

Parallelization of sheet forming analysis program using MPI (MPI를 이용한 판재성형해석 프로그램의 병렬화)

  • Kim, Eui-Joong;Suh, Yeong-Sung
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.22 no.1
    • /
    • pp.132-141
    • /
    • 1998
  • A parallel version of sheet forming analysis program was developed. This version is compatible with any parallel computers which support MPI that is one of the most recent and popular message passing libraries. For this purpose, SERI-SFA, a vector version which runs on Cray Y-MP C90, a sequential vector computer, was used as a source code. For the sake of the effectiveness of the work, the parallelization was focused on the selected part after checking the rank of CPU consumed from the exemplary calculation on Cray Y-MP C90. The subroutines associated with contact algorithm was selected as targe parts. For this work, MPI was used as a message passing library. For the performance verification, an oil pan and an S-rail forming simulation were carried out. The performance check was carried out by the kernel and total CPU time along with theoretical performance using Amdahl's Law. The results showed some performance improvement within the limit of the selective paralellization.

A Study of Performance Improvement of CFCS SW Using HPC (HPC를 활용한 지휘무장통제체계 SW 성능향상 연구)

  • Baek, Chi-Sun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.07a
    • /
    • pp.1-2
    • /
    • 2017
  • 본 논문에서는 지휘무장통제체계(이하 CFCS) 소프트웨어의 성능 향상 기법으로 고성능 컴퓨팅(이하 HPC) 시스템 활용 기법을 제안한다. 이 기법으로 본 논문에서는 HPC 분야인 멀티코어 프로세서를 활용하는 방법을 제안한다. 복잡한 반복연산을 하는 작업이 많은 CFCS의 특정 SW모듈에 대해 멀티코어 프로세싱 아키텍처를 이용한 병렬처리를 적용하여 기존 순차처리 대비 작업실행시간을 단축함으로써 작업 응답시간을 상당히 줄일 수 있다. 본 논문에서는 CFCS 시험 환경의 일부 특정 SW모듈 상에서 기존의 순차처리 방식으로 수행한 연산 결과와 다중 처리 프로그래밍 API인 OpenMP를 적용하여 수행한 연산 결과를 비교하여 CFCS에서의 멀티코어 프로세싱이 체계 전반의 성능 향상 면에서 효율적으로 사용될 수 있음을 보인다.

  • PDF

Parallel Computing on Intensity Offset Tracking Using Synthetic Aperture Radar for Retrieval of Glacier Velocity

  • Hong, Sang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.1
    • /
    • pp.29-37
    • /
    • 2019
  • Synthetic Aperture Radar (SAR) observations are powerful tools to monitor surface's displacement very accurately, induced by earthquake, volcano, ground subsidence, glacier movement, etc. Especially, radar interferometry (InSAR) which utilizes phase information related to distance from sensor to target, can generate displacement map in line-of-sight direction with accuracy of a few cm or mm. Due to decorrelation effect, however, degradation of coherence in the InSAR application often prohibit from construction of differential interferogram. Offset tracking method is an alternative approach to make a two-dimensional displacement map using intensity information instead of the phase. However, there is limitation in that the offset tracking requires very intensive computation power and time. In this paper, efficiency of parallel computing has been investigated using high performance computer for estimation of glacier velocity. Two TanDEM-X SAR observations which were acquired on September 15, 2013 and September 26, 2013 over the Narsap Sermia in Southwestern Greenland were collected. Atotal of 56 of 2.4 GHz Intel Xeon processors(28 physical processors with hyperthreading) by operating with linux environment were utilized. The Gamma software was used for application of offset tracking by adjustment of the number of processors for the OpenMP parallel computing. The processing times of the offset tracking at the 256 by 256 pixels of window patch size at single and 56 cores are; 26,344 sec and 2,055 sec, respectively. It is impressive that the processing time could be reduced significantly about thirteen times (12.81) at the 56 cores usage. However, the parallel computing using all the processors prevent other background operations or functions. Except the offset tracking processing, optimum number of processors need to be evaluated for computing efficiency.

Improvement of Processing Speed for UAV Attitude Information Estimation Using ROI and Parallel Processing

  • Ha, Seok-Wun;Park, Myeong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.155-161
    • /
    • 2021
  • Recently, researches for military purposes such as precision tracking and mission completion using UAVs have been actively conducted. In particular, if the posture information of the leading UAV is estimated and the mission UAV uses this information to follow in stealth and complete its mission, the speed of the posture information estimation of the guide UAV must be processed in real time. Until recently, research has been conducted to accurately estimate the posture information of the leading UAV using image processing and Kalman filters, but there has been a problem in processing speed due to the sequential processing of the processing process. Therefore, in this study we propose a way to improve processing speed by applying methods that the image processing area is limited to the ROI area including the object, not the entire area, and the continuous processing is distributed to OpenMP-based multi-threads and processed in parallel with thread synchronization to estimate attitude information. Based on the experimental results, it was confirmed that real-time processing is possible by improving the processing speed by more than 45% compared to the basic processing, and thus the possibility of completing the mission can be increased by improving the tracking and estimating speed of the mission UAV.

Improved Disparity Map Computation on Stereoscopic Streaming Video with Multi-core Parallel Implementation

  • Kim, Cheong Ghil;Choi, Yong Soo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.2
    • /
    • pp.728-741
    • /
    • 2015
  • Stereo vision has become an important technical issue in the field of 3D imaging, machine vision, robotics, image analysis, and so on. The depth map extraction from stereo video is a key technology of stereoscopic 3D video requiring stereo correspondence algorithms. This is the matching process of the similarity measure for each disparity value, followed by an aggregation and optimization step. Since it requires a lot of computational power, there are significant speed-performance advantages when exploiting parallel processing available on processors. In this situation, multi-core CPU may allow many parallel programming technologies to be realized in users computing devices. This paper proposes parallel implementations for calculating disparity map using a shared memory programming and exploiting the streaming SIMD extension technology. By doing so, we can take advantage both of the hardware and software features of multi-core processor. For the performance evaluation, we implemented a parallel SAD algorithm with OpenMP and SSE2. Their processing speeds are compared with non parallel version on stereoscopic streaming video. The experimental results show that both technologies have a significant effect on the performance and achieve great improvements on processing speed.

High-resolution Urban Flood Modeling using Cellular Automata-based WCA2D in the Oncheon-cheon Catchment in Busan, South Korea (셀룰러 오토마타 기반 WCA2D 모형을 이용한 부산 온천천 유역 고해상도 도시 침수 해석)

  • Choi, Hyeonjin;Lee, Songhee;Woo, Hyuna;Noh, Seong Jin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.5
    • /
    • pp.587-599
    • /
    • 2023
  • As climate change increasesthe frequency and risk of flooding in major cities around theworld, the importance ofsimulation technology that can quickly and accurately analyze high-resolution 2D flooding information in large-scale areasis emerging. The physically-based approaches based on the Shallow Water Equations (SWE) often requires huge computer resources hindering high-resolution flood prediction. This study investigated the theoretical background of Weighted Cellular Automata 2D (WCA2D), which simulates spatio-temporal changes offlooding using transition rules and weight-based system, and assessed feasibility to simulate pluvial flooding in the urbancatchment, theOncheon-cheon catchmentinBusan, SouthKorea.Inaddition,the computation performancewas compared by applying versions using OpenComputing Language (OpenCL) andOpenMulti-Processing (OpenMP) parallel computing techniques. Simulationresultsshowed that the maximuminundation depthmap by theWCA2Dmodel cansimilarly reproduce historical inundation maps. Also, it can precisely simulate spatio-temporal changes of flooding extent in the urban catchment with complex topographic characteristics. For computation efficiency, parallel computing schemes, theOpenCLandOpenMP, improved the computation by about 8~14 and 5~6 folds respectively, compared to the sequential computation.

The Effect of Mesh Reordering on Laplacian Smoothing for Nonuniform Memory Access Architecture-based High Performance Computing Systems (NUMA구조를 가진 고성능 컴퓨팅 시스템에서의 메쉬 재배열의 라플라시안 스무딩에 대한 효과)

  • Kim, Jbium
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.3
    • /
    • pp.82-88
    • /
    • 2014
  • We study the effect of mesh reordering on Laplacian smoothing for parallel high performance computing systems. Specifically, we use the Reverse-Cuthill McKee algorithm to reorder meshes and use Laplacian Smoothing to improve the mesh quality on Nonuniform memory access architecture-based parallel high performance computing systems. First, we investigate the effect of using mesh reordering on Laplacian smoothing for a single core system and extend the idea to NUMA-based high performance computing systems.

A Study on Improved Image Matching Method using the CUDA Computing (CUDA 연산을 이용한 개선된 영상 매칭 방법에 관한 연구)

  • Cho, Kyeongrae;Park, Byungjoon;Yoon, Taebok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.4
    • /
    • pp.2749-2756
    • /
    • 2015
  • Recently, Depending on the quality of data increases, the problem of time-consuming to process the image is raised by being required to accelerate the image processing algorithms, in a traditional CPU and CUDA(Compute Unified Device Architecture) based recognition system for computing speed and performance gains compared to OpenMP When character recognition has been learned by the system to measure the input by the character data matching is implemented in an environment that recognizes the region of the well, so that the font of the characters image learning English alphabet are each constant and standardized in size and character an image matching method for calculating the matching has also been implemented. GPGPU (General Purpose GPU) programming platform technology when using the CUDA computing techniques to recognize and use the four cores of Intel i5 2500 with OpenMP to deal quickly and efficiently an algorithm, than the performance of existing CPU does not produce the rate of four times due to the delay of the data of the partition and merge operation proposed a method of improving the rate of speed of about 3.2 times, and the parallel processing of the video card that processes a result, the sequential operation of the process compared to CPU-based who performed the performance gain is about 21 tiems improvement in was confirmed.

A Multi-Scale Parallel Convolutional Neural Network Based Intelligent Human Identification Using Face Information

  • Li, Chen;Liang, Mengti;Song, Wei;Xiao, Ke
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1494-1507
    • /
    • 2018
  • Intelligent human identification using face information has been the research hotspot ranging from Internet of Things (IoT) application, intelligent self-service bank, intelligent surveillance to public safety and intelligent access control. Since 2D face images are usually captured from a long distance in an unconstrained environment, to fully exploit this advantage and make human recognition appropriate for wider intelligent applications with higher security and convenience, the key difficulties here include gray scale change caused by illumination variance, occlusion caused by glasses, hair or scarf, self-occlusion and deformation caused by pose or expression variation. To conquer these, many solutions have been proposed. However, most of them only improve recognition performance under one influence factor, which still cannot meet the real face recognition scenario. In this paper we propose a multi-scale parallel convolutional neural network architecture to extract deep robust facial features with high discriminative ability. Abundant experiments are conducted on CMU-PIE, extended FERET and AR database. And the experiment results show that the proposed algorithm exhibits excellent discriminative ability compared with other existing algorithms.

Heterogeneous Chain-mail Model for CPU-based Volume Deformation (CPU 기반의 볼륨 변형을 위한 다형질 Chainmail 모델)

  • Lee, Sein;Kye, Heewon
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.7
    • /
    • pp.759-769
    • /
    • 2019
  • Since a surgery simulation should be able to represent the internal structure of the human body, it is advantageous to adopt volume based techniques rather than polygon based techniques. However, the volume based techniques induce large computation to deform heterogeneous volume datasets such as bones and muscles. In this study, we propose a new method to deform volume data using multi-core CPUs. By improving previous studies, the proposed method minimizes unnecessary propagation operations. Moreover, we propose an efficient task-partitioning method for volume deformation using multi-core CPUs. As a result, we can simulate the deformation of heterogeneous volume data at an interactive speed without special hardware.