• Title/Summary/Keyword: vector-parallel performance analysis

Search Result 18, Processing Time 0.029 seconds

Assessment of computational performance for a vector parallel implementation: 3D probabilistic model discrete cracking in concrete

  • Paz, Carmen N.M.;Alves, Jose L.D.;Ebecken, Nelson F.F.
    • Computers and Concrete
    • /
    • v.2 no.5
    • /
    • pp.345-366
    • /
    • 2005
  • This work presents an assessment of the computational performance of a vector-parallel implementation of probabilistic model for concrete cracking in 3D. This paper shows the continuing efforts towards code optimization as reported in earlier works Paz, et al. (2002a,b and 2003). The probabilistic crack approach is based on the direct Monte Carlo method. Cracking is accounted by means of 3D interface elements. This approach considers that all nonlinearities are restricted to interface elements modeling cracks. The heterogeneity governs the overall cracking behavior and related size effects on concrete fracture. Computational kernels in the implementation are the inexact Newton iterative driver to solve the non-linear problem and a preconditioned conjugate gradient (PCG) driver to solve linearized equations, using an element by element (EBE) strategy to compute matrix-vector products. In particular the paper analyzes code behavior using OpenMP directives in parallel vector processors (PVP), such as the CRAY SV1 and CRAY T94. The impact of the memory architecture on code performance, and also some strategies devised to circumvent this issue are addressed by numerical experiment.

Parallelization of sheet forming analysis program using MPI (MPI를 이용한 판재성형해석 프로그램의 병렬화)

  • Kim, Eui-Joong;Suh, Yeong-Sung
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.22 no.1
    • /
    • pp.132-141
    • /
    • 1998
  • A parallel version of sheet forming analysis program was developed. This version is compatible with any parallel computers which support MPI that is one of the most recent and popular message passing libraries. For this purpose, SERI-SFA, a vector version which runs on Cray Y-MP C90, a sequential vector computer, was used as a source code. For the sake of the effectiveness of the work, the parallelization was focused on the selected part after checking the rank of CPU consumed from the exemplary calculation on Cray Y-MP C90. The subroutines associated with contact algorithm was selected as targe parts. For this work, MPI was used as a message passing library. For the performance verification, an oil pan and an S-rail forming simulation were carried out. The performance check was carried out by the kernel and total CPU time along with theoretical performance using Amdahl's Law. The results showed some performance improvement within the limit of the selective paralellization.

The vector control performance analysis for driving the parallel connected induction motors (유도전동기 병렬 구동을 위한 벡터제어 제어성능분석)

  • Byun, Yeun-Sub;Bae, Chang-Han;Lee, Byung-Song;Kim, Young-Chol
    • Proceedings of the KIEE Conference
    • /
    • 2004.07d
    • /
    • pp.2281-2283
    • /
    • 2004
  • In this paper, we show the vector control performances for the parallel-connected motor drive system using the indirect vector control and the proposed vector control. The suggested estimation scheme of the rotor flux position is presented to reduce the sensitivity due to the load difference between the motors. To confirm the validity of the proposed control method, we compare the simulation results of the proposed control method with those of the conventional indirect vector control method. The simulation results show that the proposed control method is more effective for a change in the load torque.

  • PDF

Performance Analysis of a Vector DLL Based GPS Receiver

  • Lim, Deok Won;Choi, Heon Ho;Lee, Sang Jeong;Heo, Moon Beom
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.1 no.1
    • /
    • pp.1-6
    • /
    • 2012
  • For a Global Positioning System (GPS) receiver, it is known that a Vector Delay Locked Loop (DLL) in which the code signals of each satellite are tracked in parallel by using navigation results shows better performance in the aspect of the tracking accuracy and the robustness than that of a Scalar DLL. However, the quantitative analysis and the logical grounds for that performance enhancement of the Vector DLL are not sufficient. This paper, therefore, proposes the structure of the GPS receiver with the Vector DLL and analyzes the performance of it. The tracking and the positioning accuracy of the Vector DLL are theoretically analyzed and confirmed by simulation results. From the simulation results, it can be seen that the tracking and positioning accuracy has been improved about 30% in case that the receiver is static and the positioning is conducted for every Pre-detection Integration Time (PIT) while C/N0 is 45 dB-Hz.

A Massively Parallel Algorithm for Fuzzy Vector Quantization (퍼지 벡터 양자화를 위한 대규모 병렬 알고리즘)

  • Huynh, Luong Van;Kim, Cheol-Hong;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.411-418
    • /
    • 2009
  • Vector quantization algorithm based on fuzzy clustering has been widely used in the field of data compression since the use of fuzzy clustering analysis in the early stages of a vector quantization process can make this process less sensitive to its initialization. However, the process of fuzzy clustering is computationally very intensive because of its complex framework for the quantitative formulation of the uncertainty involved in the training vector space. To overcome the computational burden of the process, this paper introduces an array architecture for the implementation of fuzzy vector quantization (FVQ). The arrayarchitecture, which consists of 4,096 processing elements (PEs), provides a computationally efficient solution by employing an effective vector assignment strategy during the clustering process. Experimental results indicatethat the proposed parallel implementation providessignificantly greater performance and efficiency than appropriately scaled alternative array systems. In addition, the proposed parallel implementation provides 1000x greater performance and 100x higher energy efficiency than other implementations using today's ARMand TI DSP processors in the same 130nm technology. These results demonstrate that the proposed parallel implementation shows the potential for improved performance and energy efficiency.

Fault Diagnosis of a Voltage-Fed PWM Inverter for a Three-parallel Power Conversion System in a Wind Turbine

  • Ko, Young-Jong;Lee, Kyo-Beum
    • Journal of Power Electronics
    • /
    • v.10 no.6
    • /
    • pp.686-693
    • /
    • 2010
  • In this paper, a fault diagnosis method based on fuzzy logic for the three-parallel power converter in a wind turbine system is presented. The method can not only detect both open and short faults but can also identify faulty switching devices without additional voltage sensors or an analysis modeling of the system. The location of a faulty switch can be indicated by six-patterns of a stator current vector and the fault switching device detection is achieved by analyzing the current vector. A fault tolerant algorithm is also presented to maintain proper performance under faulty conditions. The reliability of the proposed fault detection technique has been proven by simulations and experiments with a 10kW simulator.

Analyzing Errors in Bilingual Multi-word Lexicons Automatically Constructed through a Pivot Language

  • Seo, Hyeong-Won;Kim, Jae-Hoon
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.39 no.2
    • /
    • pp.172-178
    • /
    • 2015
  • Constructing a bilingual multi-word lexicon is confronted with many difficulties such as an absence of a commonly accepted gold-standard dataset. Besides, in fact, there is no everybody's definition of what a multi-word unit is. In considering these problems, this paper evaluates and analyzes the context vector approach which is one of a novel alignment method of constructing bilingual lexicons from parallel corpora, by comparing with one of general methods. The approach builds context vectors for both source and target single-word units from two parallel corpora. To adapt the approach to multi-word units, we identify all multi-word candidates (namely noun phrases in this work) first, and then concatenate them into single-word units. As a result, therefore, we can use the context vector approach to satisfy our need for multi-word units. In our experimental results, the context vector approach has shown stronger performance over the other approach. The contribution of the paper is analyzing the various types of errors for the experimental results. For the future works, we will study the similarity measure that not only covers a multi-word unit itself but also covers its constituents.

Runtime Prediction Based on Workload-Aware Clustering (병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구)

  • Kim, Eunhye;Park, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.56-63
    • /
    • 2015
  • Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

Effect of Representation Methods on Time Complexity of Genetic Algorithm based Task Scheduling for Heterogeneous Network Systems

  • Kim, Hwa-Sung
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.1 no.1
    • /
    • pp.35-53
    • /
    • 1997
  • This paper analyzes the time complexity of Genetic Algorithm based Task Scheduling (GATS) which is designed for the scheduling of parallel programs with diverse embedded parallelism types in a heterogeneous network systems. The analysis of time complexity is performed based on two representation methods (REIA, REIS) which are proposed in this paper to encode the scheduling information. And the heterogeneous network systems consist of a set of loosely coupled parallel and vector machines connected via a high-speed network. The objective of heterogeneous network computing is to solve computationally intensive problems that have several types of parallelism, on a suite of high performance and parallel machines in a manner that best utilizes the capabilities of each machine. Therefore, when scheduling in heterogeneous network systems, the matching of the parallelism characteristics between tasks and parallel machines should be carefully handled in order to obtain more speedup. This paper shows how the parallelism type matching affects the time complexity of GATS.

  • PDF

Design and Performance Analysis of a Parallel Cell-Based Filtering Scheme using Horizontally-Partitioned Technique (수평 분할 방식을 이용한 병렬 셀-기반 필터링 기법의 설계 및 성능 평가)

  • Chang, Jae-Woo;Kim, Young-Chang
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.459-470
    • /
    • 2003
  • It is required to research on high-dimensional index structures for efficiently retrieving high-dimensional data because an attribute vector in data warehousing and a feature vector in multimedia database have a characteristic of high-dimensional data. For this, many high-dimensional index structures have been proposed, but they have so called ‘dimensional curse’ problem that retrieval performance is extremely decreased as the dimensionality is increased. To solve the problem, the cell-based filtering (CBF) scheme has been proposed. But the CBF scheme show a linear decreasing on performance as the dimensionality. To cope with the problem, it is necessary to make use of parallel processing techniques. In this paper, we propose a parallel CBF scheme which uses a horizontally-partitioned technique as declustering. In order to maximize the retrieval performance of the proposed parallel CBF scheme, we construct our parallel CBF scheme under a SN (Shared Nothing) cluster architecture. In addition, we present a data insertion algorithm, a rage query processing one, and a k-NN query processing one which are suitable for the SN cluster architecture. Finally, we show that our parallel CBF scheme achieves better retrieval performance in proportion to the number of servers in the SN cluster architecture, compared with the conventional CBF scheme.