• Title/Summary/Keyword: GPU Parallel Processing

Search Result 224, Processing Time 0.029 seconds

Non-Photorealistic Rendering Using CUDA-Based Image Segmentation (CUDA 기반 영상 분할을 사용한 비사실적 렌더링)

  • Yoon, Hyun-Cheol;Park, Jong-Seung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.11
    • /
    • pp.529-536
    • /
    • 2015
  • When rendering both three-dimensional objects and photo images together, the non-photorealistic rendering results are in visual discord since the two contents have their own independent color distributions. This paper proposes a non-photorealistic rendering technique which renders both three-dimensional objects and photo images such as cartoons and sketches. The proposed technique computes the color distribution property of the photo images and reduces the number of colors of both photo images and 3D objects. NPR is performed based on the reduced colormaps and edge features. To enhance the natural scene presentation, the image region segmentation process is preferred when extracting and applying colormaps. However, the image segmentation technique needs a lot of computational operations. It takes a long time for non-photorealistic rendering for large size frames. To speed up the time-consuming segmentation procedure, we use GPGPU for the parallel computing using the GPU. As a result, we significantly improve the execution speed of the algorithm.

Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing

  • Pei, Songwen;Zhang, Junge;Jiang, Linhua;Kim, Myoung-Seo;Gaudiot, Jean-Luc
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.7
    • /
    • pp.3231-3244
    • /
    • 2016
  • As processor design has been transiting from homogeneous multicore processor to heterogeneous multicore processor, traditional Amdahl's law cannot meet the new challenges for asymmetric multicore system. In order to further investigate the impact factors related to the Overhead of Data Preparation (ODP) for Asymmetric multicore systems, we evaluate an asymmetric multicore system built with CPU-GPU by measuring the overheads of memory transfer, computing kernel, cache missing and synchronization. This paper demonstrates that decreasing the overhead of data preparation is a promising approach to improve the whole performance of heterogeneous system.

GPU-based Monte Carlo Photon Migration Algorithm with Path-partition Load Balancing

  • Jeon, Youngjin;Park, Jongha;Hahn, Joonku;Kim, Hwi
    • Current Optics and Photonics
    • /
    • v.5 no.6
    • /
    • pp.617-626
    • /
    • 2021
  • A parallel Monte Carlo photon migration algorithm for graphics processing units that implements an improved load-balancing strategy is presented. Conventional parallel Monte Carlo photon migration algorithms suffer from a computational bottleneck due to their reliance on a simple load-balancing strategy that does not take into account the different length of the mean free paths of the photons. In this paper, path-partition load balancing is proposed to eliminate this computational bottleneck based on a mathematical formula that parallelizes the photon path tracing process, which has previously been considered non-parallelizable. The performance of the proposed algorithm is tested using three-dimensional photon migration simulations of a human skin model.

Face Detection using Skin Color Information and Parallel Processing Method on Multi-Core (멀티코어에서 피부색상 정보와 병렬처리 방법을 이용한 얼굴 검출)

  • Kim, Hong-Hee;Lee, Jae-Heung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.219-222
    • /
    • 2012
  • 최근 얼굴검출에 관한 연구는 FPGA를 통한 H/W설계부터 DSP, GPU, ARM Core에 효율적인 S/W 설계까지 다양하게 연구되고 있다. 본 연구에서는 Multi-Core에 효과적인 얼굴검출 방법을 제안한다. 피부색을 통한 얼굴 후보를 추출하고 그 외의 배경 이미지는 삭제하여 연산처리를 빠르게 하였다. Viola-Jones가 제안한 얼굴검출 알고리즘을 POSIX Thread를 사용하여 병렬 처리하였고 그 성능을 단일 코어와 멀티코어에서 측정하였다. 단일 코어에서는 성능의 향상이 없었으나 멀티코어에서는 약 1.8배 속도가 향상되었고 검출 성공률은 기존과 동일하였다.

OpenCL-based Efficient Parallel Processing in a Heterogeneous Computing Environment (이기종 컴퓨팅 환경에서 OpenCL을 이용한 효율적인 병렬처리)

  • Kim, Heegon;Lee, Sungju;Chung, Yongwha;Park, Daihee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.111-114
    • /
    • 2013
  • 최근 고성능 컴퓨팅과 모바일 컴퓨팅에서 GPU 등의 성능가속기 사용이 증가함에 따라 성능가속기를 사용한 다양한 병렬처리 방법이 소개되고 있다. 그러나 성능 가속기를 처음 접하거나 성능가속기를 사용한 병렬처리 경험이 적은 사용자의 경우, 이러한 성능가속기를 이용하여 효과적인 병렬처리를 하는 것은 쉽지 않다. 본 논문에서는 성능가속기와 마이크로프로세서를 동시에 사용하여 단순히 성능가속기만을 사용한 병렬처리보다 효율적인 병렬처리 방법을 제안하고, 성능가속기만을 사용하여 얻은 성능과 제안한 방법의 성능을 비교한다. 실험결과, 제안방법은 순차처리와 비교하여 약 40배의 성능 향상을 얻을 수 있었고, 성능가속기만을 사용한 병렬처리 방법보다도 25%의 성능 향상이 가능함을 확인하였다.

FLUID SIMULATION METHODS FOR COMPUTER GRAPHICS SPECIAL EFFECTS (컴퓨터 그래픽스 특수효과를 위한 유체시뮬레이션 기법들)

  • Jung, Moon-Ryul
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2009.11a
    • /
    • pp.1-1
    • /
    • 2009
  • In this presentation, I talk about various fluid simulation methods that have been developed for computer graphics special effects since 1996. They are all based on CFD but sacrifice physical reality for visual plausability and time. But as the speed of computer increases rapidly and the capability of GPU (graphics processing unit) improves, methods for more physical realism have been tried. In this talk, I will focus on four aspects of fluid simulation methods for computer graphics: (1) particle level-set methods, (2) particle-based simulation, (3) methods for exact satisfaction of incompressibility constraint, and (4) GPU-based simulation. (1) Particle level-set methods evolve the surface of fluid by means of the zero-level set and a band of massless marker particles on both sides of it. The evolution of the zero-level set captures the surface in an approximate manner and the evolution of marker particles captures the fine details of the surface, and the zero-level set is modified based on the particle positions in each step of evolution. (2) Recently the particle-based Lagrangian approach to fluid simulation gains some popularity, because it automatically respects mass conservation and the difficulty of tracking the surface geometry has been somewhat addressed. (3) Until recently fluid simulation algorithm was dominated by approximate fractional step methods. They split the Navier-Stoke equation into two, so that the first one solves the equation without considering the incompressibility constraint and the second finds the pressure which satisfies the constraint. In this approach, the first step introduces error inevitably, producing numerical diffusion in solution. But recently exact fractional step methods without error have been developed by fluid mechanics scholars), and another method was introduced which satisfies the incompressibility constraint by formulating fluid in terms of vorticity field rather than velocity field (by computer graphics scholars). (4) Finally, I want to mention GPU implementation of fluid simulation, which takes advantage of the fact that discrete fluid equations can be solved in parallel.

  • PDF

Acceleration of Anisotropic Elastic Reverse-time Migration with GPUs (GPU를 이용한 이방성 탄성 거꿀 참반사 보정의 계산가속)

  • Choi, Hyungwook;Seol, Soon Jee;Byun, Joongmoo
    • Geophysics and Geophysical Exploration
    • /
    • v.18 no.2
    • /
    • pp.74-84
    • /
    • 2015
  • To yield physically meaningful images through elastic reverse-time migration, the wavefield separation which extracts P- and S-waves from reconstructed vector wavefields by using elastic wave equation is prerequisite. For expanding the application of the elastic reverse-time migration to anisotropic media, not only the anisotropic modelling algorithm but also the anisotropic wavefield separation is essential. The anisotropic wavefield separation which uses pseudo-derivative filters determined according to vertical velocities and anisotropic parameters of elastic media differs from the Helmholtz decomposition which is conventionally used for the isotropic wavefield separation. Since applying these pseudo-derivative filter consumes high computational costs, we have developed the efficient anisotropic wavefield separation algorithm which has capability of parallel computing by using GPUs (Graphic Processing Units). In addition, the highly efficient anisotropic elastic reverse-time migration algorithm using MPI (Message-Passing Interface) and incorporating the developed anisotropic wavefield separation algorithm with GPUs has been developed. To verify the efficiency and the validity of the developed anisotropic elastic reverse-time migration algorithm, a VTI elastic model based on Marmousi-II was built. A synthetic multicomponent seismic data set was created using this VTI elastic model. The computational speed of migration was dramatically enhanced by using GPUs and MPI and the accuracy of image was also improved because of the adoption of the anisotropic wavefield separation.

3D Inspection by Registration of CT and Dual X-ray Images

  • Kim, Youngjun;Kim, Wontae;Lee, Deukhee
    • Journal of International Society for Simulation Surgery
    • /
    • v.3 no.1
    • /
    • pp.16-21
    • /
    • 2016
  • Computed tomography (CT) can completely digitize the interior and the exterior of nearly any object without any destruction. Generally, the resolution for industrial CT is below a few microns. The industrial CT scanning, however, has a limitation because it requires long measuring and processing time. Whereas, 2D X-ray imaging is fast. In this paper, we propose a novel concept of 3D non-destructive inspection technique using the advantages of both micro-CT and dual X-ray images. After registering the master object’s CT data and the sample objects’ dual X-ray images, 3D non-destructive inspection is possible by analyzing the matching results. Calculation for the registration is accelerated by parallel computing using graphics processing unit (GPU).

Redundant Parallel Hopfield Network Configurations: A New Approach to the Two-Dimensional Face Recognitions (병렬 다중 홉 필드 네트워크 구성으로 인한 2-차원적 얼굴인식 기법에 대한 새로운 제안)

  • Kim, Yong Taek;Deo, Kiatama
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.63-68
    • /
    • 2018
  • Interests in face recognition area have been increasing due to diverse emerging applications. Face recognition algorithm from a two-dimensional source could be challenging in dealing with some circumstances such as face orientation, illuminance degree, face details such as with/without glasses and various expressions, like, smiling or crying. Hopfield Network capabilities have been used specially within the areas of recalling patterns, generalizations, familiarity recognitions and error corrections. Based on those abilities, a specific experimentation is conducted in this paper to apply the Redundant Parallel Hopfield Network on a face recognition problem. This new design has been experimentally confirmed and tested to be robust in any kind of practical situations.

An Analytical Evaluation of 2D Mesh-connected SIMD Architecture for Parallel Matrix Multiplication (2D Mesh SIMD 구조에서의 병렬 행렬 곱셈의 수치적 성능 분석)

  • Kim, Cheong-Ghil
    • Journal of The Institute of Information and Telecommunication Facilities Engineering
    • /
    • v.10 no.1
    • /
    • pp.7-13
    • /
    • 2011
  • Matrix multiplication is a fundamental operation of linear algebra and arises in many areas of science and engineering. This paper introduces an efficient parallel matrix multiplication scheme on N ${\times}$ N mesh-connected SIMD array processor, called multiple hierarchical SIMD architecture (HMSA). The architectural characteristic of HMSA is the hierarchically structured control units which consist of a global control unit, N local control units configured diagonally, and $N^2$ processing elements (PEs) arranged in an N ${\times}$ N array. PEs are communicating through local buses connecting four adjacent neighbor PEs in mesh-torus networks and global buses running across the rows and columns called horizontal buses and vertical buses, respectively. This architecture enables HMSA to have the features of diagonally indexed concurrent broadcast and the accessibility to either rows (row control mode) or columns (column control mode) of 2D array PEs alternately. An algorithmic mapping method is used for performance evaluation by mapping matrix multiplication on the proposed architecture. The asymptotic time complexities of them are evaluated and the result shows that paralle matrix multiplication on HMSA can provide significant performance improvement.

  • PDF