• Title/Summary/Keyword: data parallelism

Search Result 188, Processing Time 0.031 seconds

Monitoring of Parallel Transfer Performance for MPTCP-based Globus Service (MPTCP기반 Globus 서비스 적용을 위한 병렬 전송성능 모니터링)

  • Hong, Wontaek
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.354-356
    • /
    • 2021
  • For science applications that requires rapid transfer and sharing of large volume data, many efforts to improve data transfer performance have been made based on concurrency, parallelism and pipelining in data transfer applications such as Globus/GridFTP. In this paper, as a similar trial, experiments have been conducted for the expected transfer throughput enhancement by the increased number of network interface and parallelism in the Mptcp emulation environment and the result is presented.

  • PDF

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data (빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계)

  • Kim, Do Gyun;Choi, Jin Young
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.4
    • /
    • pp.553-566
    • /
    • 2020
  • Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.

Exploiting Implicit Parallelism for Single Loops in Java Programming Language (JAVA 프로그래밍 언어에서 단일루프구조의 무시적 병렬성 검출)

  • Kwon, Oh-Jin
    • Journal of Information Management
    • /
    • v.29 no.3
    • /
    • pp.1-26
    • /
    • 1998
  • The loop is a fundamental for the parallelism exploiting as it has a large portion of execution time for sequential Java program on the parallel machine. This paper proposes the method of exploiting the implicit parallelism through the analysis of data dependence in the existing Java programming language having a single loop structure. The parallel code generation method through the restructuring compiler and the translation method of Java source program into multithread statement, which is supported in the level of the Java programming language, are also proposed here. The performance test of the program translated into the thread statement is conducted using the trip count of loop and the thread count as parameters. The restructuring compiler makes it possible for users to reduce overhead and exploit parallelism efficiently in the Java programming.

  • PDF

Performance Evaluation of Unimodular and Non-unimodular Transformation (Unimodular 및 Non-unimodular 변환의 성능평가)

  • Song Worl-Bong
    • Journal of the Korea Computer Industry Society
    • /
    • v.6 no.2
    • /
    • pp.365-372
    • /
    • 2005
  • Generally, In a application program the core part for parallel processing is a loop. therefore exist data dependencies between the array index variables of a loop. The data dependence relations between statements which from variable or constant dependence distance are specially complex. Therefore extracting parallelism for those statements at compile time is very difficult. in this paper, among the proposed methods of extracting parallelism, analysis the unimodular method and non-unimodular method and grasping the merits and demerits of them. hereafter, this method will go far toward solving the effectively extracting parallelism of the loop.

  • PDF

Parallelism point selection in nested parallelism situations with focus on the bandwidth selection problem (평활량 선택문제 측면에서 본 중첩병렬화 상황에서 병렬처리 포인트선택)

  • Cho, Gayoung;Noh, Hohsuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.383-396
    • /
    • 2018
  • Various parallel processing R packages are used for fast processing and the analysis of big data. Parallel processing is used when the work can be decomposed into tasks that are non-interdependent. In some cases, each task decomposed for parallel processing can also be decomposed into non-interdependent subtasks. We have to choose whether to parallelize the decomposed tasks in the first step or to parallelize the subtasks in the second step when facing nested parallelism situations. This choice has a significant impact on the speed of computation; consequently, it is important to understand the nature of the work and decide where to do the parallel processing. In this paper, we provide an idea of how to apply parallel computing effectively to problems by illustrating how to select a parallelism point for the bandwidth selection of nonparametric regression.

Design of Parallel Processing of Lane Detection System Based on Multi-core Processor (멀티코어를 이용한 차선 검출 병렬화 시스템 설계)

  • Lee, Hyo-Chan;Moon, Dai-Tchul;Park, In-hag;Heo, Kang
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1778-1784
    • /
    • 2016
  • we improved the performance by parallelizing lane detection algorithms. Lane detection, as a intellectual assisting system, helps drivers make an alarm sound or revise the handle in response of lane departure. Four kinds of algorithms are implemented in order as following, Gaussian filtering algorithm so as to remove the interferences, gray conversion algorithm to simplify images, sobel edge detection algorithm to find out the regions of lanes, and hough transform algorithm to detect straight lines. Among parallelized methods, the data level parallelism algorithm is easy to design, yet still problem with the bottleneck. The high-speed data level parallelism is suggested to reduce this bottleneck, which resulted in noticeable performance improvement. In the result of applying actual road video of black-box on our parallel algorithm, the measurement, in the case of single-core, is approximately 30 Frames/sec. Furthermore, in the case of octa-core parallelism, the data level performance is approximately 100 Frames/sec and the highest performance comes close to 150 Frames/sec.

Deterministic Parallelism for Symbolic Execution Programs based on a Name-Freshness Monad Library

  • Ahn, Ki Yung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.1-9
    • /
    • 2021
  • In this paper, we extend a generic library framework based on the state monad to exploit deterministic parallelism in a purely functional language Haskell and provide benchmarks for the extended features on a multicore machine. Although purely functional programs are known to be well-suited to exploit parallelism, unintended squential data dependencies could prohibit effective parallelism. Symbolic execution programs usually implement fresh name generation in order to prevent confusion between variables in different scope with the same name. Such implementations are often based on squential state management, working against parallelism. We provide reusable primitives to help developing parallel symbolic execution programs with unbound-genercis, a generic name-binding library for Haskell, avoiding sequential dependencies in fresh name generation. Our parallel extension does not modify the internal implementation of the unbound-generics library, having zero possibility of degrading existing serial implementations of symbolic execution based on unbound-genecrics. Therefore, our extension can be applied only to the parts of source code that need parallel speedup.

The Procedure Transformation using Data Dependency Elimination Methods (자료 종속성 제거 방법을 이용한 프로시저 변환)

  • Jang, Yu-Suk;Park, Du-Sun
    • The KIPS Transactions:PartA
    • /
    • v.9A no.1
    • /
    • pp.37-44
    • /
    • 2002
  • Most researches of transforming sequential programs into parallel programs have been based on the loop structure transformation method. However, most programs have implicit interprocedure parallelism. This paper suggests a way of extracting parallelism from the loops with procedure calls using the data dependency elimination method. Most parallelization of the loop with procedure calls have been conducted for extracting parallelism from the uniform code. In this paper, we propose interprocedural transformation, which can be apply to both uniform and nonuniform code. We show the examples of uniform, nonuniform, and complex code parallelization. We then evaluated the performance of the various transformation methods using the CRAY-T3E system. The comparison results show that the proposed algorithm out-performs other conventional methods.

A Vectorization Technique at Object Code Level (목적 코드 레벨에서의 벡터화 기법)

  • Lee, Dong-Ho;Kim, Ki-Chang
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.5
    • /
    • pp.1172-1184
    • /
    • 1998
  • ILP(Instruction Level Parallelism) processors use code reordering algorithms to expose parallelism in a given sequential program. When applied to a loop, this algorithm produces a software-pipelined loop. In a software-pipelined loop, each iteration contains a sequence of parallel instructions that are composed of data-independent instructions collected across from several iterations. For vector loops, however the software pipelining technique can not expose the maximum parallelism because it schedules the program based only on data-dependencies. This paper proposes to schedule differently for vector loops. We develop an algorithm to detect vector loops at object code level and suggest a new vector scheduling algorithm for them. Our vector scheduling improves the performance because it can schedule not only based on data-dependencies but on loop structure or iteration conditions at the object code level. We compare the resulting schedules with those by software-pipelining techniques in the aspect of performance.

  • PDF

The Parallelism Extraction in Loops with Procedure Calls (프로시저 호출을 가진 루프에서 병렬성 추출)

  • 장유숙;박두순
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.3
    • /
    • pp.270-279
    • /
    • 2001
  • Since most program execution time is spent in the loop structure, extracting parallelism from sequential loop programs hale been focused. But, most programs hare implicit parallelism of interprocedure. This paper presents a generalized Parallelism extraction in loop\ulcorner with procedure calls. Most parallelization of loops with Procedure calls just focus on the uniform code which data dependency distance is constant. We presents algorithms which can be applied with uniform code, nonuniform code, and complex code. The proposed algorithm, loop extraction, loop embedding and procedure cloning transformation methods evaluate using CRAY-T3E. The result of performance evaluation is that proposed algorithm is an effect.

  • PDF