• Title/Summary/Keyword: data parallelism

Search Result 188, Processing Time 0.021 seconds

An Efficient Multidimensional Index Structure for Parallel Environments

  • Bok Koung-Soo;Song Seok-Il;Yoo Jae-Soo
    • International Journal of Contents
    • /
    • v.1 no.1
    • /
    • pp.50-58
    • /
    • 2005
  • Generally, multidimensional data such as image and spatial data require large amount of storage space. There is a limit to store and manage those large amounts of data in single workstation. If we manage the data on parallel computing environment which is being actively researched these days, we can get highly improved performance. In this paper, we propose a parallel multidimensional index structure that exploits the parallelism of the parallel computing environment. The proposed index structure is nP(processor)-nxmD(disk) architecture which is the hybrid type of nP-nD and 1P-nD. Its node structure in-creases fan-out and reduces the height of an index. Also, a range search algorithm that maximizes I/O parallelism is devised, and it is applied to k-nearest neighbor queries. Through various experiments, it is shown that the proposed method outperforms other parallel index structures.

  • PDF

A Preliminary Architecture for a Data Flow Machine Model with Node Labelling (Node Label에 의한 기본적 Data Flow Machine 모델)

  • 김원섭;박희순
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.34 no.8
    • /
    • pp.301-307
    • /
    • 1985
  • The first four generations of computers are all based on a single basic design: the Von Neuman Processor, which is sequential and does one operation at a time. Efforts to develop concurrent or parallel computers have been carried on for many years. Data flow approach is significant in these efforts to make high speed parallel machines and expected a great deal of parallelism. In this paper we propose a preliminary data Flow Machine Model operating asynchronously on the base of Node Labelling. We introduce a concept of Node Labeling for this purpose which is relevant to the Data dependency and Parallelism. And we explain how the Node Tokens are fired in the proposed system.

  • PDF

Transaction Management Using Update Protocol in Fully Replicated Databases (완전중복 데이터베이스에서 갱신 규약을 적용한 거래 관리)

  • Choe, Hui-Yeong;Hwang, Bu-Hyeon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.1
    • /
    • pp.11-20
    • /
    • 2002
  • We propose a new concurrency control algorithm for enhancing the degree of parallelism of the transactions in fully replicated databases designed to improve the availability and the reliability. The update transactions can be logically decomposed of a read-only transaction and a write-only transaction. In our algorithm, a set of read operations of an update transaction is treated as a read-only transaction and the read-only transaction reads data items in the site to which it is submitted. And a set of write operations of the update transaction is treated as a write-only transaction and it is submitted to all corresponding sites after the update transaction has been completed. By using the proposed update protocol, all write-only transactions can execute at all sites atomically. The proposed algorithm can have transactions execute concurrently at the site to which they are submitted and, after the completion of each transaction, the update protocol is performed for updating their data items and checking their serializability. Therefore, the degree of parallelism of the transactions can be improved. Especially, if the probability of conflict among transactions is low, we can expect the higher degree of their parallelism.

Transform Nested Loops into MultiThread in Java Programming Language for Parallel Processing (자바 프로그래밍에서 병렬처리를 위한 중첩 루프 구조의 다중스레드 변환)

  • Hwang, Deuk-Young;Choi, Young-Keun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.8
    • /
    • pp.1997-2012
    • /
    • 1998
  • It is necessary to find out the parallelism in tlle sequential Java program to execute it on the parallel machine. The loop is a fundamental source to exploit parallelism as it process a large portion of total execution time in sequential Java program on the parallel machine. However, a complete parallel execution can hardly be achieved due to data dependence. This paper proposes the method of exploiting the implicit parallelism by structuring a dependence graph through the analysis of data dependence in the existing Java programming language having a nested loop structure. The parallel code generation method through the restructuring compiler and also the translation method of Java source program into multithread statement. which is supported by the Java programming language itself, are proposed here. The perforance evaluatlun of the program translaed into the thread statement is conducted using the trip cunt of loop and the trip Count of luop and the thread count as parameters The resttucturing compiler provides efficient way of exploiting parallelism by reducing manual overhead conveliing sequential Java program into parallel code. The execution time for the Java program as a result can be reduced un the parallel machine.

  • PDF

Static forwardin: an approach to reduce data hazards in VLIW processor (정적 포워딩에 의한 VLIW 프로세서의 데이터 hazard 처리)

  • 박형준;김이섭
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.2
    • /
    • pp.1-9
    • /
    • 1998
  • To achieve high performance in VLIW processors, they must exploit the parallelism on application programs. Data dependency makes it difficult to find the instruction-level parallelism. Among the three kinds of data dependency, true dependency causes RAW(Read After Wirte) hazards that occur most frequently in VILW processors. Forwarding is a widely used technique to reduce the performance degradation caused by RAW hazards. However, forwarding requires too much area of the chip when it is applied to VLIW processors. In this paper, static forwarding is proposed to reduce the hardware cost of forwarding circuits. It needs an extended compiler to detect RAW hazards and control the proposed forwarding scheme via instruction. And it uses the modified register file to shrink the area of forwarding path. VLIW Processor Model is also designed to verify static forwarding. This paper describes the operation of static forwarding and the comparison with the conventional forwarding.

  • PDF

Hardware Implementation of High Speed CODEC for PACS (PACS를 위한 고속 CODEC의 하드웨어 구현)

  • 유선국;박성욱
    • Journal of Biomedical Engineering Research
    • /
    • v.15 no.4
    • /
    • pp.475-480
    • /
    • 1994
  • For the effective management of medical images, it becomes popular to use computing machines in medical practice, namely PACS. However, the amount of image data is so large that there is a lack of storage space. We usually use data compression techniques to save storage, but the process speed of machines is not fast enough to meet surgical requirement. So a special hardware system processing medical images faster is more important than ever. To meet the demand for high speed image processing, especially image compression and decompression, we designed and implemented the medical image CODEC (COder/DECoder) based on MISD (Multiple Instruction Single Data stream) architecture to adopt parallelism in it. Considering not being a standard scheme of medical image compression/decompression, the CODEC is designed programable and general. In this paper, we use JPEG (Joint Photographic Experts Group) algorithm to process images and evalutate the CODEC.

  • PDF

Parallelization of Recursive Functions for Recursive Data Structures (재귀적 자료구조에 대한 재귀 함수의 병렬화)

  • An, Jun-Seon;Han, Tae-Suk
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.12
    • /
    • pp.1542-1552
    • /
    • 1999
  • 자료 병렬성이란 자료 집합의 원소들에 대하여 동일한 작업을 동시에 수행하므로써 얻어지는 병렬성을 말한다. 함수형 언어에서 자료 집합에 대한 반복 수행은 재귀적 자료형에 대한 재귀 함수에 의하여 표현된다. 본 논문에서는 이러한 재귀 함수를 자료 병렬 프로그램으로 변환하기 위한 병렬화 방법을 제시한다. 생성되는 병렬 프로그램의 병렬 수행 구조로는 일반적인 형태의 재귀적 자료형에 대하여 정의되는 다형적인 자료 병렬 연산을 사용하여 트리, 리스트 등과 같은 일반적인 재귀적 자료 집합에 대한 자료 병렬 수행이 가능하도록 하였다. 재귀 함수의 병렬화를 위해서는, 함수를 이루는 각각의 계산들의 병렬성을 재귀 호출에 의해 존재하는 의존성에 기반하여 분류하고, 이에 기반하여 각각의 계산들에 대한 적절한 자료 병렬 연산을 사용하는 병렬 프로그램을 생성하였다.Abstract Data parallelism is obtained by applying the same operations to each element of a data collection. In functional languages, iterative computations on data collections are expressed by recursions on recursive data structures. We propose a parallelization method for data-parallel implementation of such recursive functions. We employ polytypic data-parallel primitives to represent the parallel execution structure of the object programs, which enables data parallel execution with general recursive data structures, such as trees and lists. To transform sequential programs to their parallelized versions, we propose a method to classify the types of parallelism in subexpressions, based on the dependencies of the recursive calls, and generate the data-parallel programs using data-parallel primitives appropriately.

GPU-Based Acceleration of Quantum-Inspired Evolutionary Algorithm (GPU를 이용한 Quantum-Inspired Evolutionary Algorithm 가속)

  • Ryoo, Ji-Hyun;Park, Han-Min;Choi, Ki-Young
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.8
    • /
    • pp.1-9
    • /
    • 2012
  • Quantum-Inspired Evolutionary Algorithm(QEA) contains sufficient data-level parallelism to be naturally accelerated on GPUs. For an efficient reduction of execution time, however, careful task-mapping should be done to properly reflect the characteristics of CPU and GPU. Furthermore, when deciding which part of the application should run on GPU, we need to consider the data transfer between CPU and GPU memory spaces as well as the data-level parallelism. In addition, the usage of zero-copy host memory, proper choice of the execution configuration, and thread organization considering memory coalescing is important to further reduce the execution time. With all these techniques, we could run QEA 3.69 times faster on average in comparison with the multi-threading CPU for the case of 0-1 knapsack problem with 30,000 items.

Design of a High-Speed RFID Filtering Engine and Cache Based Improvement (고속 RFID 필터링 엔진의 설계와 캐쉬 기반 성능 향상)

  • Park Hyun-Sung;Kim Jong-Deok
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.5A
    • /
    • pp.517-525
    • /
    • 2006
  • In this paper, we present a high-speed RFID data filtering engine designed to carry out filtering under the conditions of massive data and massive filters. We discovered that the high-speed RFID data filtering technique is very similar to the high-speed packet classification technique which is used in high-speed routers and firewall systems. Actually, our filtering engine is designed based on existing packet classification algorithms, Bit Parallelism and Aggregated Bit Vector(ABV). In addition, we also discovered that there are strong temporal relations and redundancy in the RFID data filtering operations. We incorporated two kinds of caches, tag and filter caches, to make use of this characteristic to improve the efficiency of the filtering engine. The performance of the proposed engine has been examined by implementing a prototype system and testing it. Compared to the basic sequential filter comparison approach, our engine shows much better performance, and it gets better as the number of filters increases.

Adaptive Load Balancing Scheme using a Combination of Hierarchical Data Structures and 3D Clustering for Parallel Volume Rendering on GPU Clusters (계층 자료구조의 결합과 3차원 클러스터링을 이용하여 적응적으로 부하 균형된 GPU-클러스터 기반 병렬 볼륨 렌더링)

  • Lee Won-Jong;Park Woo-Chan;Han Tack-Don
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.1_2
    • /
    • pp.1-14
    • /
    • 2006
  • Sort-last parallel rendering using a cluster of GPUs has been widely used as an efficient method for visualizing large- scale volume datasets. The performance of this method is constrained by load balancing when data parallelism is included. In previous works static partitioning could lead to self-balance when only task level parallelism is included. In this paper, we present a load balancing scheme that adapts to the characteristic of volume dataset when data parallelism is also employed. We effectively combine the hierarchical data structures (octree and BSP tree) in order to skip empty regions and distribute workload to corresponding rendering nodes. Moreover, we also exploit a 3D clustering method to determine visibility order and save the AGP bandwidths on each rendering node. Experimental results show that our scheme can achieve significant performance gains compared with traditional static load distribution schemes.