• Title/Summary/Keyword: Parallel Computing Efficiency

Search Result 123, Processing Time 0.02 seconds

A Study for Parallel Computing Efficiency Comparing Numerical Solutions of Battery Pack (배터리 팩 수치해석 해의 비교를 통한 병렬연산 효율성 연구)

  • Kim, Kwang Sun;Jang, Kyung Min
    • Journal of the Semiconductor & Display Technology
    • /
    • v.15 no.2
    • /
    • pp.20-25
    • /
    • 2016
  • The parallel computer cluster system has been known as the powerful tool to solve a complex physical phenomenon numerically. The numerical analysis of large size of Li-ion battery pack, which has a complex physical phenomenon, requires a large amount of computing time. In this study, the numerical analyses were conducted for comparing the computing efficiency between the single workstation and the parallel cluster system both with multicore CPUs'. The result shows that the parallel cluster system took the time 80 times faster than the single work station for the same battery pack model. The performance of cluster system was increased linearly with more CPU cores being increased.

Performance Optimization of Parallel Algorithms

  • Hudik, Martin;Hodon, Michal
    • Journal of Communications and Networks
    • /
    • v.16 no.4
    • /
    • pp.436-446
    • /
    • 2014
  • The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.

A Performance Comparison of Parallel Programming Models on Edge Devices (엣지 디바이스에서의 병렬 프로그래밍 모델 성능 비교 연구)

  • Dukyun Nam
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.4
    • /
    • pp.165-172
    • /
    • 2023
  • Heterogeneous computing is a technology that utilizes different types of processors to perform parallel processing. It maximizes task processing and energy efficiency by leveraging various computing resources such as CPUs, GPUs, and FPGAs. On the other hand, edge computing has developed with IoT and 5G technologies. It is a distributed computing that utilizes computing resources close to clients, thereby offloading the central server. It has evolved to intelligent edge computing combined with artificial intelligence. Intelligent edge computing enables total data processing, such as context awareness, prediction, control, and simple processing for the data collected on the edge. If heterogeneous computing can be successfully applied in the edge, it is expected to maximize job processing efficiency while minimizing dependence on the central server. In this paper, experiments were conducted to verify the feasibility of various parallel programming models on high-end and low-end edge devices by using benchmark applications. We analyzed the performance of five parallel programming models on the Raspberry Pi 4 and Jetson Orin Nano as low-end and high-end devices, respectively. In the experiment, OpenACC showed the best performance on the low-end edge device and OpenSYCL on the high-end device due to the stability and optimization of system libraries.

COMPUTATIONAL EFFICIENCY OF A MODIFIED SCATTERING KERNEL FOR FULL-COUPLED PHOTON-ELECTRON TRANSPORT PARALLEL COMPUTING WITH UNSTRUCTURED TETRAHEDRAL MESHES

  • Kim, Jong Woon;Hong, Ser Gi;Lee, Young-Ouk
    • Nuclear Engineering and Technology
    • /
    • v.46 no.2
    • /
    • pp.263-272
    • /
    • 2014
  • Scattering source calculations using conventional spherical harmonic expansion may require lots of computation time to treat full-coupled three-dimensional photon-electron transport in a highly anisotropic scattering medium where their scattering cross sections should be expanded with very high order (e.g., $P_7$ or higher) Legendre expansions. In this paper, we introduce a modified scattering kernel approach to avoid the unnecessarily repeated calculations involved with the scattering source calculation, and used it with parallel computing to effectively reduce the computation time. Its computational efficiency was tested for three-dimensional full-coupled photon-electron transport problems using our computer program which solves the multi-group discrete ordinates transport equation by using the discontinuous finite element method with unstructured tetrahedral meshes for complicated geometrical problems. The numerical tests show that we can improve speed up to 17~42 times for the elapsed time per iteration using the modified scattering kernel, not only in the single CPU calculation but also in the parallel computing with several CPUs.

A Parallel Computation of Finite Element Analysis on a Transputer System (트랜스퓨터를 이용한 유안영속해석의 병렬계산)

  • Kim, Keun-Hwan;Choi, Kyung;Jung, Hyun-Kyo;Lee, Ki-Sik;Hahn, Song-Yop
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.41 no.7
    • /
    • pp.735-741
    • /
    • 1992
  • This paper presents a parallel algorithm for the finite element analysis using relatively inexpensive transputer parallel system. The substructure method, which is highly parallel in nature, is used to improve the parallel computing efficiency by splitting up the whole structure into substructures. The proposed algorithm is applied to a simple two-dimensional magnetostatic problem. It is found that the more the number of transputer is increased, the more the total computation time is reduced. And the computational efficiency becomes better as the number of internal boundary nodes becomes smaller.

  • PDF

Efficient Parallel TLD on CPU-GPU Platform for Real-Time Tracking

  • Chen, Zhaoyun;Huang, Dafei;Luo, Lei;Wen, Mei;Zhang, Chunyuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.201-220
    • /
    • 2020
  • Trackers, especially long-term (LT) trackers, now have a more complex structure and more intensive computation for nowadays' endless pursuit of high accuracy and robustness. However, computing efficiency of LT trackers cannot meet the real-time requirement in various real application scenarios. Considering heterogeneous CPU-GPU platforms have been more popular than ever, it is a challenge to exploit the computing capacity of heterogeneous platform to improve the efficiency of LT trackers for real-time requirement. This paper focuses on TLD, which is the first LT tracking framework, and proposes an efficient parallel implementation based on OpenCL. In this paper, we firstly make an analysis of the TLD tracker and then optimize the computing intensive kernels, including Fern Feature Extraction, Fern Classification, NCC Calculation, Overlaps Calculation, Positive and Negative Samples Extraction. Experimental results demonstrate that our efficient parallel TLD tracker outperforms the original TLD, achieving the 3.92 speedup on CPU and GPU. Moreover, the parallel TLD tracker can run 52.9 frames per second and meet the real-time requirement.

Parallel Computing on Intensity Offset Tracking Using Synthetic Aperture Radar for Retrieval of Glacier Velocity

  • Hong, Sang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.1
    • /
    • pp.29-37
    • /
    • 2019
  • Synthetic Aperture Radar (SAR) observations are powerful tools to monitor surface's displacement very accurately, induced by earthquake, volcano, ground subsidence, glacier movement, etc. Especially, radar interferometry (InSAR) which utilizes phase information related to distance from sensor to target, can generate displacement map in line-of-sight direction with accuracy of a few cm or mm. Due to decorrelation effect, however, degradation of coherence in the InSAR application often prohibit from construction of differential interferogram. Offset tracking method is an alternative approach to make a two-dimensional displacement map using intensity information instead of the phase. However, there is limitation in that the offset tracking requires very intensive computation power and time. In this paper, efficiency of parallel computing has been investigated using high performance computer for estimation of glacier velocity. Two TanDEM-X SAR observations which were acquired on September 15, 2013 and September 26, 2013 over the Narsap Sermia in Southwestern Greenland were collected. Atotal of 56 of 2.4 GHz Intel Xeon processors(28 physical processors with hyperthreading) by operating with linux environment were utilized. The Gamma software was used for application of offset tracking by adjustment of the number of processors for the OpenMP parallel computing. The processing times of the offset tracking at the 256 by 256 pixels of window patch size at single and 56 cores are; 26,344 sec and 2,055 sec, respectively. It is impressive that the processing time could be reduced significantly about thirteen times (12.81) at the 56 cores usage. However, the parallel computing using all the processors prevent other background operations or functions. Except the offset tracking processing, optimum number of processors need to be evaluated for computing efficiency.

Many-objective joint optimization for dependency-aware task offloading and service caching in mobile edge computing

  • Xiangyu Shi;Zhixia Zhang;Zhihua Cui;Xingjuan Cai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.5
    • /
    • pp.1238-1259
    • /
    • 2024
  • Previous studies on joint optimization of computation offloading and service caching policies in Mobile Edge Computing (MEC) have often neglected the impact of dependency-aware subtasks, edge server resource constraints, and multiple users on policy formulation. To remedy this deficiency, this paper proposes a many-objective joint optimization dependency-aware task offloading and service caching model (MaJDTOSC). MaJDTOSC considers the impact of dependencies between subtasks on the joint optimization problem of task offloading and service caching in multi-user, resource-constrained MEC scenarios, and takes the task completion time, energy consumption, subtask hit rate, load variability, and storage resource utilization as optimization objectives. Meanwhile, in order to better solve MaJDTOSC, a many-objective evolutionary algorithm TSMSNSGAIII based on a three-stage mating selection strategy is proposed. Simulation results show that TSMSNSGAIII exhibits an excellent and stable performance in solving MaJDTOSC with different number of users setting and can converge faster. Therefore, it is believed that TSMSNSGAIII can provide appropriate sub-task offloading and service caching strategies in multi-user and resource-constrained MEC scenarios, which can greatly improve the system offloading efficiency and enhance the user experience.

Calculation Effect of GPU Parallel Programing for Planar Multibody System Dynamics (평면 다물체 동역학 해석에서 GPU 병렬 프로그래밍의 계산효과)

  • Jun, C.W.;Sohn, J.H.
    • Journal of Power System Engineering
    • /
    • v.16 no.4
    • /
    • pp.12-16
    • /
    • 2012
  • In this paper, the equations of motions for planar multibody dynamics are established for considering the parallel programming based on GPU. Cartesian coordinates are used to formulate the equations of motion and implicit integration method called HHT-alpha is employed. Open chain multibody system is considered for computer simulation. CUDA toolkit is employed for establishing the GPU parallel programming. The exactness of the analysis is verified from the comparison with ADAMS. The results from parallel computing based on GPU are compared with the results from the sequential programming based on CPU in terms of calculation time. The multiple pendulum with bodies and joints is employed for the computer simulation. In the pendulum system that has 290 bodies, the parallel program indicates an improved efficiency of about 25.5 second(15.5% improvement). It is noted that the larger the size of system is, the time efficiency is better.

Evaluating Computational Efficiency of Spatial Analysis in Cloud Computing Platforms (클라우드 컴퓨팅 기반 공간분석의 연산 효율성 분석)

  • CHOI, Changlock;KIM, Yelin;HONG, Seong-Yun
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.21 no.4
    • /
    • pp.119-131
    • /
    • 2018
  • The increase of high-resolution spatial data and methodological developments in recent years has enabled a detailed analysis of individual experiences in space and over time. However, despite the increasing availability of data and technological advances, such individual-level analysis is not always possible in practice because of its computing requirements. To overcome this limitation, there has been a considerable amount of research on the use of high-performance, public cloud computing platforms for spatial analysis and simulation. The purpose of this paper is to empirically evaluate the efficiency and effectiveness of spatial analysis in cloud computing platforms. We compare the computing speed for calculating the measure of spatial autocorrelation and performing geographically weighted regression analysis between a local machine and spot instances on clouds. The results indicate that there could be significant improvements in terms of computing time when the analysis is performed parallel on clouds.