• Title/Summary/Keyword: Data Parallel

Search Result 2,372, Processing Time 0.027 seconds

A Design of An Optimizer For Conversion of Parallel Constructs of Data Parallel Language Programs (자료 병렬 언어 프로그램의 병렬 구조 변환을 위한 최적화기 설계)

  • Gu, Mi-Sun;Park, Myeong-Sun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.3
    • /
    • pp.792-803
    • /
    • 1999
  • Most data parallel language compilers are source-to-source translators. Most Compilers of HPF which is recognized as a standard data parallel language convert a parallel program in PHF in a Fortran 77 program inserted message passing primitives. By the way, they currently generate significant amount of ineffective codes in the course of the conversion. Especially, FORALL construct is converted into several DO loops, so loop overhead of these codes is very increased. In this paper, we define and use relation distance vector to keep necessary informations. Then we evaluate and analyze execution time for the codes converted by our method and by PARADIGM method for various array sizes.

  • PDF

Efficient Parallel Visualization of Large-scale Finite Element Analysis Data in Distributed Parallel Computing Environment (분산 병렬 계산환경에 적합한 초대형 유한요소 해석 결과의 효율적 병렬 가시화)

  • Kim, Chang-Sik;Song, You-Me;Kim, Ki-Ook;Cho, Jin-Yeon
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.32 no.10
    • /
    • pp.38-45
    • /
    • 2004
  • In this paper, a parallel visualization algorithm is proposed for efficient visualization of the massive data generated from large-scale parallel finite element analysis through investigating the characteristics of parallel rendering methods. The proposed parallel visualization algorithm is designed to be highly compatible with the characteristics of domain-wise computation in parallel finite element analysis by using the sort-last-sparse approach. In the proposed algorithm, the binary tree communication pattern is utilized to reduce the network communication time in image composition routine. Several benchmarking tests are carried out by using the developed in-house software, and the performance of the proposed algorithm is investigated.

Parallel Design and Implementation of Shot Boundary Detection Algorithm (샷 경계 탐지 알고리즘의 병렬 설계와 구현)

  • Lee, Joon-Goo;Kim, SeungHyun;You, Byoung-Moon;Hwang, DooSung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.76-84
    • /
    • 2014
  • As the number of high-density videos increase, parallel processing approaches are necessary to process a large-scale of video data. When a processing method of video data requires thousands of simple operations, GPU-based parallel processing is preferred to CPU-based parallel processing by way of reducing the time and space complexities of a given computation problem. This paper studies the parallel design and implementation of a shot-boundary detection algorithm. The proposed shot-boundary detection algorithm uses pixel brightness comparisons and global histogram data among the blocks of frames, and the computation of these data is characterized with the high parallelism for the related operations. In order to maximize these operations in parallel, the computations of the pixel brightness and histogram are designed in parallel and implemented in NVIDIA GPU. The GPU-based shot detection method is tested with 10 videos from the set of videos in National Archive of Korea. In experiments, the detection rate is similar but the computation time is about 10 time faster to that of the CPU-based algorithm.

A Parallel Search Algorithm and Its Implementation for Digital k-Winners-Take-All Circuit

  • Yoon, Myungchul
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.4
    • /
    • pp.477-483
    • /
    • 2015
  • The k-Winners-Take-All (kWTA) is an operation to find the largest k (>1) inputs among N inputs. Parallel search algorithm of kWTA for digital inputs is not invented yet, so most of digital kWTA architectures have O(N) time complexity. A parallel search algorithm for digital kWTA operation and the circuits for its VLSI implementation are presented in this paper. The proposed kWTA architecture can compare all inputs simultaneously in parallel. The time complexity of the new architecture is O(logN), so that it is scalable to a large number of digital data. The high-speed kWTA operation and its O(logN) dependency of the new architecture are verified by simulations. It takes 290 ns in searching for 5 winners among 1024 of 32 bit data, which is more than thousands of times faster than existing digital kWTA circuits, as well as existing analog kWTA circuits.

A Parallel Programming Environment Implemented with Graphic User Interface (그래픽 사용자 인터페이스로 구현한 병렬 프로그래밍 환경)

  • Yoo, Jeong-Mok;Lee, Dong-Hee;Lee, Mann-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.8
    • /
    • pp.2388-2399
    • /
    • 2000
  • This paper describes a parallel programming environment to help programmers to write parallel programs. The parallel programming environment does lexical analysis and syntax analysis like front-end part of common compilers, data flow analysis and data dependence analysis for variables used in programs, and various program transformation methods for parallel programming. Especially, graphic user interface is provided for programmer to get parallel programs easily.

  • PDF

A Controllable Parallel CBC Block Cipher Mode of Operation

  • Ke Yuan;Keke Duanmu;Jian Ge;Bingcai Zhou;Chunfu Jia
    • Journal of Information Processing Systems
    • /
    • v.20 no.1
    • /
    • pp.24-37
    • /
    • 2024
  • To address the requirement for high-speed encryption of large amounts of data, this study improves the widely adopted cipher block chaining (CBC) mode and proposes a controllable parallel cipher block chaining (CPCBC) block cipher mode of operation. The mode consists of two phases: extension and parallel encryption. In the extension phase, the degree of parallelism n is determined as needed. In the parallel encryption phase, n cipher blocks generated in the expansion phase are used as the initialization vectors to open n parallel encryption chains for parallel encryption. The security analysis demonstrates that CPCBC mode can enhance the resistance to byte-flipping attacks and padding oracle attacks if parallelism n is kept secret. Security has been improved when compared to the traditional CBC mode. Performance analysis reveals that this scheme has an almost linear acceleration ratio in the case of encrypting a large amount of data. Compared with the conventional CBC mode, the encryption speed is significantly faster.

Diffusion of software innovation: a Petri Net theory perspective (Petri Net 이론 관점에서 본 소프트웨어 혁신의 확산)

  • Han, Jiyeon;Ahn, Jongchang;Lee, Ook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.2
    • /
    • pp.858-867
    • /
    • 2013
  • Hardware and software field are developed by environment of MPSOC. Also it is still working with economic world and academic world. This study focus on software side and try to classify from parallel programming design world. It can be divided by three; Data, Tasks, and Data flow model. Then we used Petri Net to CUDA and HOPES programmer and found how much they understand parallel programming for each side. We focus on two sides and what is different between their experience. Petri Net is easy to descript parallel program or parallel design pattern for Task, Data, and Hybird. This research can explain how they know and how much they know about parallel programming.

A Serial Input/Output Circuit with 8 bit and 16 bit Selection Modes

  • Yang, Yil-Suk;Kim, Jong-Dae;Roh, Tae-Moon;Lee, Dae-Woo;Koo, Jin-Gun;Kim, Sang-Gi;Park, Il-Yong;Yu, Byoung-Gon
    • ETRI Journal
    • /
    • v.24 no.6
    • /
    • pp.462-464
    • /
    • 2002
  • This paper presents a serial interface circuit that permits selection of the amount of data converted from serial-to-parallel and parallel-to-serial and overcomes the disadvantages of the conventional serial input/output interface. Based on the selected data length operating mode, 8 bit or 16 bit serial-to-parallel and 8 bit or 16 bit parallel-to-serial conversion takes place in data blocks of the selected data length.

  • PDF

Parallel Data Mining with Distributed Frequent Pattern Trees (분산형 FP트리를 활용한 병렬 데이터 마이닝)

  • 조두산;김동승
    • Proceedings of the IEEK Conference
    • /
    • 2003.07c
    • /
    • pp.2561-2564
    • /
    • 2003
  • Data mining is an effective method of the discovery of useful information such as rules and previously unknown patterns existing in large databases. The discovery of association rules is an important data mining problem. We have developed a new parallel mining called Distributed Frequent Pattern Tree (abbreviated by DFPT) algorithm on a distributed shared nothing parallel system to detect association rules. DFPT algorithm is devised for parallel execution of the FP-growth algorithm. It needs only two full disk data scanning of the database by eliminating the need for generating the candidate items. We have achieved good workload balancing throughout the mining process by distributing the work equally to all processors. We implemented the algorithm on a PC cluster system, and observed that the algorithm outperformed the Improved Count Distribution scheme.

  • PDF

Construction and Performance Test of a Supercomputing PC System using PC-clustering and Parallel Virtual Machine (PC-Clustering과 병렬가상장치에 의한 수치계산용 슈퍼컴퓨팅 PC 시스템 구축과 성능 테스트)

  • Hong, Woo-Pyo;Kim, Jong-Jae;Oh, Kwang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.2
    • /
    • pp.473-483
    • /
    • 1999
  • We introduce a way to construct a supercomputing capable system with some networked PCs, running the Linux operating system and computing power comparable with expensive commercial workstations, and with the Parallel Virtual Machine (PVM) software which enables one to control the total CPUs and memories of the networked PCs. By benchmarking the system using a PVM parallel program, we find that the system's parallel efficiency is close to 90 %.

  • PDF