• Title/Summary/Keyword: computational scalability

Search Result 65, Processing Time 0.024 seconds

Computational Methods for On-Node Performance Optimization and Inter-Node Scalability of HPC Applications

  • Kim, Byoung-Do;Rosales-Fernandez, Carlos;Kim, Sungho
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.4
    • /
    • pp.294-309
    • /
    • 2012
  • In the age of multi-core and specialized accelerators in high performance computing (HPC) systems, it is critical to understand application characteristics and apply suitable optimizations in order to fully utilize advanced computing system. Often time, the process involves multiple stages of application performance diagnosis and a trial-and-error type of approach for optimization. In this study, a general guideline of performance optimization has been demonstrated with two class-representing applications. The main focuses are on node-level optimization and inter-node scalability improvement. While the number of optimization case studies is somewhat limited in this paper, the result provides insights into the systematic approach in HPC applications performance engineering.

Implementation and Performance Analysis of a Parallel SIMPLER Model Based on Domain Decomposition (영역 분할에 의한 SIMPLER 모델의 병렬화와 성능 분석)

  • Kwak Ho Sang;Lee Sangsan
    • Journal of computational fluids engineering
    • /
    • v.3 no.1
    • /
    • pp.22-29
    • /
    • 1998
  • Parallel implementation is conducted for a SIMPLER finite volume model. The present parallelism is based on domain decomposition and explicit message passing using MPI and SHMEM. Two parallel solvers to tridiagonal matrix equation are employed. The implementation is verified on the Cray T3E system for a benchmark problem of natural convection in a sidewall-heated cavity. The test results illustrate good scalability of the present parallel models. Performance issues are elaborated in view of convergence as well as conventional parallel overheads and single processor performance. The effectiveness of a localized matrix solution algorithm is demonstrated.

  • PDF

Service ORiented Computing EnviRonment (SORCER) for deterministic global and stochastic aircraft design optimization: part 1

  • Raghunath, Chaitra;Watson, Layne T.;Jrad, Mohamed;Kapania, Rakesh K.;Kolonay, Raymond M.
    • Advances in aircraft and spacecraft science
    • /
    • v.4 no.3
    • /
    • pp.297-316
    • /
    • 2017
  • With rapid growth in the complexity of large scale engineering systems, the application of multidisciplinary analysis and design optimization (MDO) in the engineering design process has garnered much attention. MDO addresses the challenge of integrating several different disciplines into the design process. Primary challenges of MDO include computational expense and poor scalability. The introduction of a distributed, collaborative computational environment results in better utilization of available computational resources, reducing the time to solution, and enhancing scalability. SORCER, a Java-based network-centric computing platform, enables analyses and design studies in a distributed collaborative computing environment. Two different optimization algorithms widely used in multidisciplinary engineering design-VTDIRECT95 and QNSTOP-are implemented on a SORCER grid. VTDIRECT95, a Fortran 95 implementation of D. R. Jones' algorithm DIRECT, is a highly parallelizable derivative-free deterministic global optimization algorithm. QNSTOP is a parallel quasi-Newton algorithm for stochastic optimization problems. The purpose of integrating VTDIRECT95 and QNSTOP into the SORCER framework is to provide load balancing among computational resources, resulting in a dynamically scalable process. Further, the federated computing paradigm implemented by SORCER manages distributed services in real time, thereby significantly speeding up the design process. Part 1 covers SORCER and the algorithms, Part 2 presents results for aircraft panel design with curvilinear stiffeners.

Service ORiented Computing EnviRonment (SORCER) for deterministic global and stochastic aircraft design optimization: part 2

  • Raghunath, Chaitra;Watson, Layne T.;Jrad, Mohamed;Kapania, Rakesh K.;Kolonay, Raymond M.
    • Advances in aircraft and spacecraft science
    • /
    • v.4 no.3
    • /
    • pp.317-334
    • /
    • 2017
  • With rapid growth in the complexity of large scale engineering systems, the application of multidisciplinary analysis and design optimization (MDO) in the engineering design process has garnered much attention. MDO addresses the challenge of integrating several different disciplines into the design process. Primary challenges of MDO include computational expense and poor scalability. The introduction of a distributed, collaborative computational environment results in better utilization of available computational resources, reducing the time to solution, and enhancing scalability. SORCER, a Java-based network-centric computing platform, enables analyses and design studies in a distributed collaborative computing environment. Two different optimization algorithms widely used in multidisciplinary engineering design-VTDIRECT95 and QNSTOP-are implemented on a SORCER grid. VTDIRECT95, a Fortran 95 implementation of D. R. Jones' algorithm DIRECT, is a highly parallelizable derivative-free deterministic global optimization algorithm. QNSTOP is a parallel quasi-Newton algorithm for stochastic optimization problems. The purpose of integrating VTDIRECT95 and QNSTOP into the SORCER framework is to provide load balancing among computational resources, resulting in a dynamically scalable process. Further, the federated computing paradigm implemented by SORCER manages distributed services in real time, thereby significantly speeding up the design process. Part 1 covers SORCER and the algorithms, Part 2 presents results for aircraft panel design with curvilinear stiffeners.

The Review of JPEG2000 Algorithm using Optimal Rate Control (비율 제어 최적화를 이용한 JPEG2000 알고리즘 리뷰)

  • Chong, Hyun-Jin;Kim, Young-Seop
    • Journal of the Semiconductor & Display Technology
    • /
    • v.8 no.1
    • /
    • pp.19-25
    • /
    • 2009
  • Abstract JPEG2000 achieve quality scalability through the rate control method used in the encoding process, which embeds quality layers to the code-stream. This architecture might raise two drawbacks. First, when the coding process finishes, the number and bit-rates of quality layers are fixed, causing a lack of quality scalability to code-stream encoded with a single or few quality layers. Second, in Post compression rate distortion (PCRD) the bit streams after the truncation points discarded. Therefore, computational power for the discarded bit streams is wasted. For solving of problem, through bit rate control, there are many researches. Each proposed algorithms have specially target feature that is improved performance like reducing computational power. Research results have strength and weakness. For the mean time, research contents are reviewed and compared, so we proposed research direction in the future.

  • PDF

Lineage Tracing: Computational Reconstruction Goes Beyond the Limit of Imaging

  • Wu, Szu-Hsien (Sam);Lee, Ji-Hyun;Koo, Bon-Kyoung
    • Molecules and Cells
    • /
    • v.42 no.2
    • /
    • pp.104-112
    • /
    • 2019
  • Tracking the fate of individual cells and their progeny through lineage tracing has been widely used to investigate various biological processes including embryonic development, homeostatic tissue turnover, and stem cell function in regeneration and disease. Conventional lineage tracing involves the marking of cells either with dyes or nucleoside analogues or genetic marking with fluorescent and/or colorimetric protein reporters. Both are imaging-based approaches that have played a crucial role in the field of developmental biology as well as adult stem cell biology. However, imaging-based lineage tracing approaches are limited by their scalability and the lack of molecular information underlying fate transitions. Recently, computational biology approaches have been combined with diverse tracing methods to overcome these limitations and so provide high-order scalability and a wealth of molecular information. In this review, we will introduce such novel computational methods, starting from single-cell RNA sequencing-based lineage analysis to DNA barcoding or genetic scar analysis. These novel approaches are complementary to conventional imaging-based approaches and enable us to study the lineage relationships of numerous cell types during vertebrate, and in particular human, development and disease.

A Study on the Scalability of Multi-core-PC Cluster for Seismic Design of Reinforced-Concrete Structures based on Genetic Algorithm (유전알고리즘 기반 콘크리트 구조물의 최적화 설계를 위한 멀티코어 퍼스널 컴퓨터 클러스터의 확장 가능성 연구)

  • Park, Keunhyoung;Choi, Se Woon;Kim, Yousok;Park, Hyo Seon
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.26 no.4
    • /
    • pp.275-281
    • /
    • 2013
  • In this paper, determination of the scalability of the cluster composed common personal computer was performed when optimization of reinforced concrete structure using genetic algorithm. The goal of this research is watching the potential of multi-core-PC cluster for optimization of seismic design of reinforced-concrete structures. By increasing the number of core-processer of cluster, decreasing of computation time per each generation of genetic algorithm was observed. After classifying the components in singular personal computer, the estimation of the expected bottle-neck phenomenon and comparison with wall-clock time and Amdahl's law equation was performed. So we could obseved the scalability of the cluster appear complex tendency. For separating the bottle-neck phenomenon of physical and algorithm, the different size of population was selected for genetic algorithm cases. When using 64 core-processor, the efficiency of cluster is low as 31.2% compared with Amdahl's law efficiency.

Fine Granular Scalable Coding using Matching Pursuit with Multi-Step Search (다단계 탐색 기반 Matching Pursuit을 이용한 미세 계층적 부호화 기법)

  • 최웅일
    • Journal of Broadcast Engineering
    • /
    • v.6 no.3
    • /
    • pp.225-233
    • /
    • 2001
  • Real-time video communication applications over Internet should be able to support such functionality as scalability because of the unpredictable and varying channel bandwidth between server and client. To accommodatethe wide variety of channel bitrates, a new scalable coding tool, namely the Fine Granular Scalability (FGS) coding tool has been reduce the adopted In the MPEG-4 video standard. This paper presentsa new FGS algorithm with matching Pursuit that can reduce the computational complexity of ordinal matching pursuit-based algorithm. The Proposed coding algorithm can make trade-off between Picture Quality and computationalcomplexity. Our simulation result shows that the proposed algorithm can reduce the computational cumplexity up to 1/5 compared to the conventional FGS method while retaining a similar picture quality.

  • PDF

Fast Coding Mode Decision for MPEG-4 AVC|H.264 Scalable Extension (MPEG-4 AVC|H.264 Scalable Extension을 위한 고속 모드 결정 방법)

  • Lim, Sun-Hee;Yang, Jung-Youp;Jeon, Byeung-Woo
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.6
    • /
    • pp.95-107
    • /
    • 2008
  • In this paper, we propose a fast mode decision method for temporal and spatial scalability to reduce computational complexity of mode decision that used to be computationally one of the most intensive processes of the MPEG-4 AVC|H.264 SE(Scalable Extension) encoding. For temporal scalability, we propose an early skip method and MHM(mode history map) method. The early skip method confines macroblock modes of backward and forward frames within selected a few candidates. The MHM method utilizes stored information of frames inside a GOP of lower levels for the decision of MHM at higher level. For the spatial scalability, we propose the method that uses a candidate mode according to the MHM method and adds the BL_mode as candidates. The proposed scheme reduces the number of candidate modes to reduce computational complexity in mode decision. The proposed scheme reduces total encoding time by about 52% for temporal scalability and 47% for spatial scalability without significant loss of RD performance.