• Title/Summary/Keyword: Graphics Processing Units

Search Result 86, Processing Time 0.029 seconds

Automatic Stereo Matching for Auto-stereoscopic 3D display (무안경식 3D 디스플레이를 위한 자동 스테레오 정합)

  • Choi, Ho Yeol;Park, Jiho;Kim, Y.H.
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.140-141
    • /
    • 2012
  • 최근 영상분야의 키워드는 초고품질화, 초실감화, 스마트화로 대표될 수 있다. 그 중에서도 무안경식 3D는 초실감화를 이루기 위한 핵심응용분야 중 하나이다. 하지만 무안경식 3D 단말기가 성공적으로 보급되기 위해서는 연구되어야 할 분야가 여전히 존재한다. 그 중에서도 본 논문에서는 고화질의 무안경식 3D 스마트 콘텐츠 제작에 필요한 자동 스테레오 정합 기법을 제안하였다. 이전까지 연구된 변이지도 추출을 위한 알고리즘은 전역적 최적화 방법을 사용할 시 영상의 해상도와 깊이 정도에 따른 연산량의 증가로 많은 수행시간이 요구되었다. 또한 좌/우 영상의 intensity 정보만으로는 정확한 변이지도 추출이 어렵다는 한계점이 존재하였다. 이러한 이유로 본 논문에서는 스트림 영상에서 프레임 간의 정보를 이용하여 신뢰지도와 경계정보를 생성하였으며 belief propagation 스테레오 정합 방법을 이용하여 고화질의 정확한 변이지도를 추출하였다. 또한, 알고리즘의 연산량에 대한 문제를 해결하기 위한 고속화 방안으로, 최근 많은 연구가 이루어지고 있는 GPU(graphics processing units) 를 이용한 병렬처리를 연구하였다. 마지막으로 연구결과의 신뢰성을 향상하기 위하여 다양한 데이터를 이용한 실험을 통해 고화질의 영상정보를 고속으로 추출할 수 있음을 확인하였다.

  • PDF

GPU-accelerated Lattice Boltzmann Simulation for the Prediction of Oil Slick Movement in Ocean Environment (GPU 가속 기술을 이용한 격자 볼츠만법 기반 원유 확산 과정 시뮬레이션)

  • Ha, Sol;Ku, Namkug;Roh, Myung-Il
    • Korean Journal of Computational Design and Engineering
    • /
    • v.18 no.6
    • /
    • pp.399-406
    • /
    • 2013
  • This paper describes a new simulation technique for advection-diffusion phenomena over the sea surface using the lattice Boltzmann method (LBM), capable of predicting oil dispersion from tankers. The LBM is used to solve the pollutant transport problem within the framework of the ocean environment. The sea space is represented by the lattices, where each lattice has the information on oil transportation. Since dispersed oils (i.e., oil droplets) at sea are transported by convection due to waves, buoyancy, and turbulent diffusion, the conservation of mass and many physical oil transport rules were used in the prediction model. Since the LBM is modeled using the uniform lattices and simple rules, it can be easily accelerated by the parallel mechanism, for example, GPU-accelerated method. The proposed model using the LBM is used to simulate a simple pollution event with the oil pollutants of 10,000 kL. The simulation results indicate that the LBM method accelerated with the GPU is 6 times faster than that without the GPU.

Workload Characteristics-based L1 Data Cache Switching-off Mechanism for GPUs

  • Do, Thuan Cong;Kim, Gwang Bok;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.1-9
    • /
    • 2018
  • Modern graphics processing units (GPUs) have become one of the most attractive platforms in exploiting high thread level parallelism with the support of new programming tools such as CUDA and OpenCL. Recent GPUs has applied cache hierarchy to support irregular memory access patterns; however, L1 data cache (L1D) exhibits poor efficiency in the GPU. This paper shows that the L1D does not always positively affect the applications in terms of performance and energy efficiency for the GPU. The performance of the GPU is even harmed by using the L1D for lots of applications. Our proposed technique exploits the characteristics of the currently-executed applications to predict the performance impact of the L1D on the GPU and then decides whether to continuously use the cache for the application or not. Our experimental results show that the proposed technique improves the GPU performance by 9.4% and saves up to 52.1% of the power consumption in the L1D.

Building a Dynamic Analyzer for CUDA based System.

  • SALAH T. ALSHAMMARI
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.77-84
    • /
    • 2023
  • The utilization of GPUs on general-purpose computers is currently on the rise due to the increase in its programmability and performance requirements. The utility of tools like NVIDIA's CUDA have been designed to allow programmers to code algorithms by using C-like language for the execution process on the graphics processing units GPU. Unfortunately, many of the performance and correctness bugs will happen on parallel programs. The CUDA tool support for the parallel programs has not yet been actualized. The use of a dynamic analyzer to find performance and correctness bugs in CUDA programs facilitates the execution of sophisticated processes, especially in modern computing requirements. Any race conditions bug it will impact of program correctness and the share memory bank conflicts to improve the overall performance. The technique instruments the programs in a way that promotes accessibility of the memory locations accessed by different threads well as to check for any bugs in the code of a program. The instrumented source code will be used initiated directly in the device emulation code of CUDA to send report for the user about all errors. The current degree of automation helps programmers solve subtle bugs in highly complex programs or programs that cannot be analyzed manually.

GPU-ACCELERATED SPECKLE MASKING RECONSTRUCTION ALGORITHM FOR HIGH-RESOLUTION SOLAR IMAGES

  • Zheng, Yanfang;Li, Xuebao;Tian, Huifeng;Zhang, Qiliang;Su, Chong;Shi, Lingyi;Zhou, Ta
    • Journal of The Korean Astronomical Society
    • /
    • v.51 no.3
    • /
    • pp.65-71
    • /
    • 2018
  • The near real-time speckle masking reconstruction technique has been developed to accelerate the processing of solar images to achieve high resolutions for ground-based solar telescopes. However, the reconstruction of solar subimages in such a speckle reconstruction is very time-consuming. We design and implement a new parallel speckle masking reconstruction algorithm based on the Compute Unified Device Architecture (CUDA) on General Purpose Graphics Processing Units (GPGPU). Tests are performed to validate the correctness of our program on NVIDIA GPGPU. Details of several parallel reconstruction steps are presented, and the parallel implementation between various modules shows a significant speed increase compared to the previous serial implementations. In addition, we present a comparison of runtimes across serial programs, the OpenMP-based method, and the new parallel method. The new parallel method shows a clear advantage for large scale data processing, and a speedup of around 9 to 10 is achieved in reconstructing one solar subimage of $256{\times}256pixels$. The speedup performance of the new parallel method exceeds that of OpenMP-based method overall. We conclude that the new parallel method would be of value, and contribute to real-time reconstruction of an entire solar image.

A Study on the Underwater Channel Model based on a High-Order Finite Difference Method using GPUs (그래픽 프로세서를 이용한 고차 유한 차분식 기반 수중채널모델 연구)

  • Bae, Ho Seuk;Kim, Won-Ki;Son, Su-Uk;Ha, Wansoo
    • Journal of the Korea Society for Simulation
    • /
    • v.30 no.1
    • /
    • pp.11-20
    • /
    • 2021
  • As unmanned underwater systems have recently emerged, a high-speed underwater channel modeling technique, which is one of the most important techniques in the system, has received a lot of attention. In this paper, we proposed a high-speed sound propagation model and verified the applicability through quantitative performance analyses. We used a high-order finite difference method (FDM) for wave propagation modeling in the water, and a domain decomposition method was adopted using multiple general-purpose graphics processing units (GPUs) to increase the calculation efficiency. We compared the results of the model we proposed with the analytic solution in the half-infinite media and results of the Virtual Timeseries Experiment (VirTEX) model, which is based on the ray method. Finally, we analyzed the performance of the model quantitatively using numerical examples. Through quantitative analyses of the improvement in computational performance, we confirmed that the computational speed increases linearly as the number of GPUs increases. The computation times are increased by 2 times and 8 times, respectively, when the domain size of computation and the maximum frequency are doubled. We expect that the proposed high-speed underwater channel modeling technique is able to contribute to the enhancement of national defense as an underwater communication channel model and analysis tool to develop the underwater communication technique for the unmanned underwater system.

Multi-Scale Contact Analysis Between Net and Numerous Particles (그물망과 대량입자의 멀티 스케일 접촉해석)

  • Jun, Chul Woong;Sohn, Jeong Hyun
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.38 no.1
    • /
    • pp.17-23
    • /
    • 2014
  • Graphics processing units (GPUs) are ideal for solving problems involving parallel data computations. In this study, the GPU is used for effectively carrying out a multi-body dynamic simulation with particle dynamics. The Hilber-Hushes-Taylor (HHT) implicit integration algorithm is used to solve the integral equations. For detecting collisions among particles, the spatial subdivision algorithm and discrete-element methods (DEM) are employed. The developed program is verified by comparing its results with those of ADAMS. The numerical efficiencies of the serial program using the CPU and the parallel program using the GPU are compared in terms of the number of particles, and it is observed that when the number of particles is greater, more computing time is saved by using the GPU. In the present example, when the number of particles is 1,300, the computational speed of the parallel analysis program is about 5 times faster than that of the serial analysis program.

GPU-accelerated Reliability Analysis Method using Dynamic Reliability Block Diagram based on DEVS Formalism (DEVS 형식론 기반의 Dynamic Reliability Block Diagram과 GPU 가속 기술을 이용한 신뢰도 분석 방법)

  • Ha, Sol;Ku, Namkug;Roh, Myung-Il
    • Journal of the Korea Society for Simulation
    • /
    • v.22 no.4
    • /
    • pp.109-118
    • /
    • 2013
  • This paper adopts the system configuration to assess the reliability instead of making a fault tree (FT), which is a traditional method to analyze reliability of a certain system; this is the reliability block diagram (RBD) method. The RBD method is a graphical presentation of a system diagram connecting the subsystems of components according to their functions or reliability relationships. The equipment model for the reliability simulation is modeled based on the discrete event system specification (DEVS) formalism. In order to make various alternatives of target system, this paper also adopts the system entity structure (SES), an ontological framework that hierarchically represents the elements of a system and their relationships. To enhance the calculation time of reliability analysis, GPU-based accelerations are adopted to the reliability simulation.

AB9: A neural processor for inference acceleration

  • Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.491-504
    • /
    • 2020
  • We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.

Real-Time Quad-Copter Tracking With Multi-Cameras and Ray-based Importance Sampling (복수카메라 및 Ray-based Importance Sampling을 이용한 실시간 비행체 추적)

  • Jin, Longhai;Jeong, Mun-Ho;Lee, Key-Seo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.6
    • /
    • pp.899-905
    • /
    • 2013
  • In this paper, we focus on how to calibrate multi-cameras easily and how to efficiently detect quad-copters with small-numbered particles. Each particle is a six dimensional vector that is composed of 3D position and 3D orientation of a quad-copter in the space. Due to curse of dimensionality, that leads to explosive computational costs with a large amount of high-dimensioned particles. To detect efficiently, we need to put more particles in very promising spaces and few particles in other spaces. Though computational cost is lowered by minimizing particles, in order to track a quad-copter with multiple cameras in real-time, multiple images from the cameras should be synchronized and analyzed. Therefore, lots of the computations still need to be done. Because of this, GPGPU(General-Purpose computing on Graphics Processing Units) is implemented for parallel computing. This method has been successfully tested and gives accurate results in practical situations.