• Title/Summary/Keyword: 병렬 프로그래밍

Search Result 225, Processing Time 0.028 seconds

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

Parallelizing 3D Frequency-domain Acoustic Wave Propagation Modeling using a Xeon Phi Coprocessor (제온 파이 보조 프로세서를 이용한 3차원 주파수 영역 음향파 파동 전파 모델링 병렬화)

  • Ryu, Donghyun;Jo, Sang Hoon;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.20 no.3
    • /
    • pp.129-136
    • /
    • 2017
  • 3D seismic data processing methods such as full waveform inversion or reverse-time migration require 3D wave propagation modeling and heavy calculations. We compared efficiency and accuracy of a Xeon Phi coprocessor to those of a high-end server CPU using 3D frequency-domain wave propagation modeling. We adopted the OpenMP parallel programming to the time-domain finite difference algorithm by considering the characteristics of the Xeon Phi coprocessors. We applied the Fourier transform using a running-integration to obtain the frequency-domain wavefield. A numerical test on frequency-domain wavefield modeling was performed using the 3D SEG/EAGE salt velocity model. Consequently, we could obtain an accurate frequency-domain wavefield and attain a 1.44x speedup using the Xeon Phi coprocessor compared to the CPU.

InterCom : Design and Implementation of an Agent-based Internet Computing Environment (InterCom : 에이전트 기반 인터넷 컴퓨팅 환경 설계 및 구현)

  • Kim, Myung-Ho;Park, Kweon
    • The KIPS Transactions:PartA
    • /
    • v.8A no.3
    • /
    • pp.235-244
    • /
    • 2001
  • Development of network and computer technology results in many studies to use physically distributed computers as a single resource. Generally, these studies have focused on developing environments based on message passing. These environments are mainly used to solve problems for scientific computation and process in parallel suing inside parallelism of the given problems. Therefore, these environments provide high parallelism generally, while it is difficult to program and use as well as it is required to have user accounts in the distributed computers. If a given problem is divided into completely independent subproblems, more efficient environment can be provided. We can find these problems in bio-informatics, 3D animatin, graphics, and etc., so the development of new environment for these problems can be considered to be very important. Therefore, we suggest new environment called InterCom based on a proxy computing, which can solve these problems efficiently, and explain the implementation of this environment. This environment consists of agent, server, and client. Merits of this environment are easy programing, no need of user accounts in the distributed computers, and easiness by compiling distributed code automatically.

  • PDF

Performance Comparison of Particle Simulation Using GPU Between OpenGL and Unity (OpenGL과 Unity간의 GPU를 이용한 Particle Simulation의 성능 비교)

  • Kim, Min Sang;Sung, Nak-Jun;Choi, Yoo-Joo;Hong, Min
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.479-486
    • /
    • 2017
  • Recently, GPGPU has been able to increase the degradation of computer performance, and it is now possible to run physically based real-time simulations on PCs that require high computational complexity. Physical calculations applied in physics simulation can be performed by parallel processing, and can be efficiently performed using parallel computation using Compute shader recently supported by OpenGL 4.3 and Unity 4.0. In this paper, we measure and compare the number of performance in real - time physics simulation in OpenGL running on various platforms and Unity, a content creation tool supporting various platforms. Particle simulation experiments show that particle simulation using Unity performs faster than 136.04%. It is expected that it will be able to select better development tools for future multi - platform support.

Efficient Parallel CUDA Random Number Generator on NVIDIA GPUs (NVIDIA GPU 상에서의 난수 생성을 위한 CUDA 병렬프로그램)

  • Kim, Youngtae;Hwang, Gyuhyeon
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1467-1473
    • /
    • 2015
  • In this paper, we implemented a parallel random number generation program on GPU's, which are known for high performance computing, using LCG (Linear Congruential Generator). Random numbers are important in all fields requiring the use of randomness, and LCG is one of the most widely used methods for the generation of pseudo-random numbers. We explained the parallel program using the NVIDIA CUDA model and MPI(Message Passing Interface) and showed uniform distribution and performance results. We also used a Monte Carlo algorithm to calculate pi(${\pi}$) comparing the parallel random number generator with cuRAND, which is a CUDA library function, and showed that our program is much more efficient. Finally we compared performance results using multi-GPU's with those of ideal speedups.

Alpha : Java Visualization Tool (Alpha : 자바 시각화 도구)

  • Kim, Cheol-Min
    • The Journal of Korean Association of Computer Education
    • /
    • v.7 no.3
    • /
    • pp.45-56
    • /
    • 2004
  • Java provides support for Web, concurrent programming, safety, portability, and GUI, so there is a steady increase in the number of Java users. Java is based on the object-oriented concepts such as classes, instances, encapsulation, inheritance, and polymorphism. However the JVM(Java Virtual Machine) hides most of the phenomena related to the concepts. This is why most of Java users have much difficulty in learning and using Java. As a solution to the problem, I have developed a tool Alpha that visualizes the phenomena occurred in the JVM from the standpoint of the concepts and will describe the design and features of the tool in this paper. For practicality and extendability Alpha has an MVC(Model-View-Controller) architecture and visualizes the phenomena such as object instantiations, method invocations, field accesses, cross-references among objects, and execution flows of threads in the various ways according to the levels and purposes of the users.

  • PDF

Parallel I/O DRAM BIST for Easy Redundancy Cell Programming (Redundancy Cell Programming이 용이한 병렬 I/O DRAM BIST)

  • 유재희;하창우
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.12
    • /
    • pp.1022-1032
    • /
    • 2002
  • A multibit DRAM BIST methodology reducing redundancy programming overhead has been proposed. It is capable of counting and locating faulty bits simultaneously with the test. If DRAM cells are composed of n blocks generally, the proposed BIST can detect the state of no error, the location of faulty bit block if there is one error and the existence of errors in more than two blocks, which are n + 2 states totally, with only n comparators and an 3 state encoder. Based on the proposed BIST methodology, the testing scheme which can detect the number and locations of faulty bits with the errors in two or more blocks, can be easily implemented. Based on performance evaluation, the test and redundancy programming time of 64MEG DRAM with 8 blocks is reduced by 1/750 times with 0.115% circuit overhead.

3D Texture based Fast Volume Rendering using Vertex and Pixel Shaders (꼭지점 및 픽셀 쉐이더를 이용한 3D 텍스쳐 기반의 빠른 볼륨 렌더링 기법)

  • Lee, Joong-Youn
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2005.05a
    • /
    • pp.1645-1648
    • /
    • 2005
  • PC 그래픽스 하드웨어의 급격한 발전에 따라 슈퍼컴퓨터 또는 여러 대의 컴퓨터를 이용한 병렬/분산 처리로나 가능하였던 실시간 볼륨 렌더링을 한대의 일반 PC에서 수행하려는 시도가 계속되고 있다. PC 그래픽스 하드웨어의 꼭지점 및 픽셀 쉐이더는 수치 계산에 최적화된 벡터 연산으로 빠른 볼륨 렌더링을 가능하게 하였을 뿐만 아니라 기존의 고정된 그래픽스 파이프라인에서 벗어나 사용자가 렌더링 과정에 개입하여 프로그래밍을 할 수 있도록 하였다. 본 논문에서는 이러한 그래픽스 하드웨어의 프로그래밍 기능 중 텍스쳐 좌표의 조작을 이용하여 다양한 종류의 볼륨 데이터를 빠르게 렌더링하고 픽셀 쉐이더의 여러 기능들을 이용하여 퐁 쉐이딩 연산, 이른 깊이 테스트, 팔진트리 텍스쳐등을 구현하여 고품질 영상을 실시간으로 얻고자 하였다.

  • PDF

Numerical Computing on Graphics Hardware

  • 임인성
    • 한국가시화정보학회:학술대회논문집
    • /
    • 2004.04a
    • /
    • pp.57-63
    • /
    • 2004
  • 최근 일반 범용 PC 에 장착되고 있는 ATI 나 NVIDIA 등의 그래픽스 가속기의 성능은 수년전과 비교할 때 비교가 안 될 정도의 빠른 속도를 자랑하고 있다. 이러한 속도 향상과 함께 급격하게 일어나고 있는 변화 중의 하나는 바로 기존의 고정된 기능의 그래픽스 파이프라인(fixed-function graphics pipeline)과는 달리 프로그래머가 가속기의 기능을 자유자재로 프로그래밍할 수 있도록 해주는 프로그래밍이 가능한 파이프라인(programmable graphics pipeline)의 출현이라 할 수 있다. 이러한 가속기에 장착되고 있는 GPU (Graphics Processing Unit)는 간단한 형태의 SIMD 프로세서라 할 수 있는데, 특히 GPU 의 한 부분인 픽셀 쉐이더는 그 처리 속도가 매우 높기 때문에 이를 통하여 기존의 수치 알고리즘을 병렬화 하려는 시도가 활발히 일어나고 있다. 본 강연에서는 다양한 수치 계산을 그래픽스 가속기를 사용하여 해결하려는 시도에 대하여 간단히 살펴본다.

  • PDF

The Mixed Finite Element Analysis for Nearly Incompressible and Impermeable Porous Media Using Parallel Algorithm (병렬알고리즘 이용한 비압축, 비투과성 포화 다공질매체의 혼합유한요소해석)

  • Tak, Moon-Ho;Kang, Yoon-Sik;Park, Tae-Hyo
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.23 no.4
    • /
    • pp.361-368
    • /
    • 2010
  • In this paper, the parallel algorithm using MPI(Message-Passing Interface) library is introduced in order to improve numerical efficiency for the staggered method for nearly incompressible and impermeable porous media which was introduced by Park and Tak(2010). The porous media theory and the staggered method are also briefly introduced in this paper. Moreover, we account for MPI library for blocking, non-blocking, and collective communication, and propose combined the staggered method with the blocking and nonblocking MPI library. And then, we present how to allocate CPUs on the staggered method and the MPI library, which is related with the numerical efficiency in order to solve unknown variables on nearly incompressible and impermeable porous media. Finally, the results comparing serial solution with parallel solution are verified by 2 dimensional saturated porous model according to the number of FEM meshes.