• Title/Summary/Keyword: Cuda

Search Result 295, Processing Time 0.032 seconds

Real-time Stereo Video Generation using Graphics Processing Unit (GPU를 이용한 실시간 양안식 영상 생성 방법)

  • Shin, In-Yong;Ho, Yo-Sung
    • Journal of Broadcast Engineering
    • /
    • v.16 no.4
    • /
    • pp.596-601
    • /
    • 2011
  • In this paper, we propose a fast depth-image-based rendering method to generate a virtual view image in real-time using a graphic processor unit (GPU) for a 3D broadcasting system. Before the transmission, we encode the input 2D+depth video using the H.264 coding standard. At the receiver, we decode the received bitstream and generate a stereo video using a GPU which can compute in parallel. In this paper, we apply a simple and efficient hole filling method to reduce the decoder complexity and reduce hole filling errors. Besides, we design a vertical parallel structure for a forward mapping process to take advantage of the single instruction multiple thread structure of GPU. We also utilize high speed GPU memories to boost the computation speed. As a result, we can generate virtual view images 15 times faster than the case of CPU-based processing.

APBT-JPEG Image Coding Based on GPU

  • Wang, Chengyou;Shan, Rongyang;Zhou, Xiao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.4
    • /
    • pp.1457-1470
    • /
    • 2015
  • In wireless multimedia sensor networks (WMSN), the latency of transmission is an increasingly problem. With the improvement of resolution, the time cost in image and video compression is more and more, which seriously affects the real-time of WMSN. In JPEG system, the core of the system is DCT, but DCT-JPEG is not the best choice. Block-based DCT transform coding has serious blocking artifacts when the image is highly compressed at low bit rates. APBT is used in this paper to solve that problem, but APBT does not have a fast algorithm. In this paper, we analyze the structure in JPEG and propose a parallel framework to speed up the algorithm of JPEG on GPU. And we use all phase biorthogonal transform (APBT) to replace the discrete cosine transform (DCT) for the better performance of reconstructed image. Therefore, parallel APBT-JPEG is proposed to solve the real-time of WMSN and the blocking artifacts in DCT-JPEG in this paper. We use the CUDA toolkit based on GPU which is released by NVIDIA to design the parallel algorithm of APBT-JPEG. Experimental results show that the maximum speedup ratio of parallel algorithm of APBT-JPEG can reach more than 100 times with a very low version GPU, compared with conventional serial APBT-JPEG. And the reconstructed image using the proposed algorithm has better performance than the DCT-JPEG in terms of objective quality and subjective effect. The proposed parallel algorithm based on GPU of APBT also can be used in image compression, video compression, the edge detection and some other fields of image processing.

Parallel Algorithms for Finding δ-approximate Periods and γ-approximate Periods of Strings over Integer Alphabets (정수문자열의 δ-근사주기와 γ-근사주기를 찾는 병렬알고리즘)

  • Kim, Youngho;Sim, Jeong Seop
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.760-766
    • /
    • 2017
  • Repetitive strings have been studied in diverse fields such as data compression, bioinformatics and so on. Recently, two problems of approximate periods of strings over integer alphabets were introduced, finding minimum ${\delta}-approximate$ periods and finding minimum ${\gamma}-approximate$ periods. Both problems can be solved in $O(n^2)$ time when n is the length of the string. In this paper, we present two parallel algorithms for solving the above two problems in O(n) time using $O(n^2)$ threads, respectively. The experimental results show that our parallel algorithms for finding minimum ${\delta}-approximate$ (resp. ${\gamma}-approximate$) periods run approximately 19.7 (resp. 40.08) times faster than the sequential algorithms when n = 10,000.

Ultrahigh-Resolution Spectral Domain Optical Coherence Tomography Based on a Linear-Wavenumber Spectrometer

  • Lee, Sang-Won;Kang, Heesung;Park, Joo Hyun;Lee, Tae Geol;Lee, Eun Seong;Lee, Jae Yong
    • Journal of the Optical Society of Korea
    • /
    • v.19 no.1
    • /
    • pp.55-62
    • /
    • 2015
  • In this study we demonstrate ultrahigh-resolution spectral domain optical coherence tomography (UHR SD-OCT) with a linear-wavenumber (k) spectrometer, to accelerate signal processing and to display two-dimensional (2-D) images in real time. First, we performed a numerical simulation to find the optimal parameters for the linear-k spectrometer to achieve ultrahigh axial resolution, such as the number of grooves in a grating, the material for a dispersive prism, and the rotational angle between the grating and the dispersive prism. We found that a grating with 1200 grooves and an F2 equilateral prism at a rotational angle of $26.07^{\circ}$, in combination with a lens of focal length 85.1 mm, are suitable for UHR SD-OCT with the imaging depth range (limited by spectrometer resolution) set at 2.0 mm. As guided by the simulation results, we constructed the linear-k spectrometer needed to implement a UHR SD-OCT. The actual imaging depth range was measured to be approximately 2.1 mm, and axial resolution of $3.8{\mu}m$ in air was achieved, corresponding to $2.8{\mu}m$ in tissue (n = 1.35). The sensitivity was -91 dB with -10 dB roll-off at 1.5 mm depth. We demonstrated a 128.2 fps acquisition rate for OCT images with 800 lines/frame, by taking advantage of NVIDIA's compute unified device architecture (CUDA) technology, which allowed for real-time signal processing compatible with the speed of the spectrometer's data acquisition.

Simple Spectral Calibration Method and Its Application Using an Index Array for Swept Source Optical Coherence Tomography

  • Jung, Un-Sang;Cho, Nam-Hyun;Kim, Su-Hwan;Jeong, Hyo-Sang;Kim, Jee-Hyun;Ahn, Yeh-Chan
    • Journal of the Optical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.386-393
    • /
    • 2011
  • In this study, we report an effective k-domain linearization method with a pre-calibrated indexed look-up table. The method minimizes k-domain nonlinear characteristics of a swept source optical coherence tomography (SS-OCT) system by using two arrays, a sample position shift index and an intensity compensation array. Two arrays are generated from an interference pattern acquired by connecting a Fabry-Perot interferometer (FPI) and an optical spectrum analyzer (OSA) to the system. At real time imaging, the sample position is modified by location movement and intensity compensation with two arrays for linearity of wavenumber. As a result of evaluating point spread functions (PSFs), the signal to noise ratio (SNR) is increased by 9.7 dB. When applied to infrared (IR) sensing card imaging, the SNR is increased by 1.29 dB and the contrast noise ratio (CNR) value is increased by 1.44. The time required for the linearization and intensity compensation is 30 ms for a multi thread method using a central processing unit (CPU) compared to 0.8 ms for compute unified device architecture (CUDA) processing using a graphics processing unit (GPU). We verified that our linearization method is appropriate for applying real time imaging of SS-OCT.

Parallel Design and Implementation of Shot Boundary Detection Algorithm (샷 경계 탐지 알고리즘의 병렬 설계와 구현)

  • Lee, Joon-Goo;Kim, SeungHyun;You, Byoung-Moon;Hwang, DooSung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.76-84
    • /
    • 2014
  • As the number of high-density videos increase, parallel processing approaches are necessary to process a large-scale of video data. When a processing method of video data requires thousands of simple operations, GPU-based parallel processing is preferred to CPU-based parallel processing by way of reducing the time and space complexities of a given computation problem. This paper studies the parallel design and implementation of a shot-boundary detection algorithm. The proposed shot-boundary detection algorithm uses pixel brightness comparisons and global histogram data among the blocks of frames, and the computation of these data is characterized with the high parallelism for the related operations. In order to maximize these operations in parallel, the computations of the pixel brightness and histogram are designed in parallel and implemented in NVIDIA GPU. The GPU-based shot detection method is tested with 10 videos from the set of videos in National Archive of Korea. In experiments, the detection rate is similar but the computation time is about 10 time faster to that of the CPU-based algorithm.

GPU-ACCELERATED SPECKLE MASKING RECONSTRUCTION ALGORITHM FOR HIGH-RESOLUTION SOLAR IMAGES

  • Zheng, Yanfang;Li, Xuebao;Tian, Huifeng;Zhang, Qiliang;Su, Chong;Shi, Lingyi;Zhou, Ta
    • Journal of The Korean Astronomical Society
    • /
    • v.51 no.3
    • /
    • pp.65-71
    • /
    • 2018
  • The near real-time speckle masking reconstruction technique has been developed to accelerate the processing of solar images to achieve high resolutions for ground-based solar telescopes. However, the reconstruction of solar subimages in such a speckle reconstruction is very time-consuming. We design and implement a new parallel speckle masking reconstruction algorithm based on the Compute Unified Device Architecture (CUDA) on General Purpose Graphics Processing Units (GPGPU). Tests are performed to validate the correctness of our program on NVIDIA GPGPU. Details of several parallel reconstruction steps are presented, and the parallel implementation between various modules shows a significant speed increase compared to the previous serial implementations. In addition, we present a comparison of runtimes across serial programs, the OpenMP-based method, and the new parallel method. The new parallel method shows a clear advantage for large scale data processing, and a speedup of around 9 to 10 is achieved in reconstructing one solar subimage of $256{\times}256pixels$. The speedup performance of the new parallel method exceeds that of OpenMP-based method overall. We conclude that the new parallel method would be of value, and contribute to real-time reconstruction of an entire solar image.

Measurement-based Face Rendering reflecting Positional Scattering Properties (위치별 산란특성을 반영한 측정기반 얼굴 렌더링)

  • Park, Sun-Yong;Oh, Kyoung-Su
    • Journal of Korea Game Society
    • /
    • v.9 no.5
    • /
    • pp.137-144
    • /
    • 2009
  • This paper predicts 6 facial regions that may have sharply different scattering properties, rendering the face more realistically based on their diffusion profiles. The scattering properties are acquired in the form of high dynamic range by photographing the pattern formed around an unit ray incident on facial skin. The acquired data are fitted to a 'linear combination of Gaussian functions', which well approximates the original diffusion profile of skin and has good characteristics as the filter. During the process, to prevent its solutions from converging into local minima, we take advantage of the genetic algorithm to set up the initial value. Each Gaussian term is applied to the irradiance map as a filter, expressing subsurface scattering effect. In this paper, to efficiently handle the maximum 12 Gaussian filterings, we make use of the parallel capacity of CUDA.

  • PDF

Implementation of Viterbi Decoder on Massively Parallel GPU for DVB-T Receiver (DVB-T 수신기를 위한 대규모 병렬처리 GPU 기반의 비터비 복호기 구현)

  • Lee, KyuHyung;Lee, Ho-Kyoung;Heo, Seo Weon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.9
    • /
    • pp.3-11
    • /
    • 2013
  • Recently, a plenty of researches have been conducted using the massively parallel processing of GPU for the implementation of communication system. In this paper, we tried to reduce software simulation time applying GPU with sliding block method to Viterbi decoder in DVB-T system which is one of European DTV standards. First of all, we implement DVB-T system by CPU and estimate cost time whereby the system processes one OFDM symbol. Secondly, we implement Viterbi decoder by software using NVIDIA's massive GPU processor. In our work, stream process method is applied to reduce the overhead for data transfer between CPU and GPU, as well as coalescing method to lower the global memory access time. In addition, data structure design method is used to maximize the shared memory usage. Consequently, our proposed method is approximately 11 times faster in 2K mode and 60 times faster in 8K mode for the process in Viterbi decoder.

Acceleration of the Iterative Physical Optics Using Graphic Processing Unit (GPU를 이용한 반복적 물리 광학법의 가속화에 대한 연구)

  • Lee, Yong-Hee;Chin, Huicheol;Kim, Kyung-Tae
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.26 no.11
    • /
    • pp.1012-1019
    • /
    • 2015
  • This paper shows the acceleration of iterative physical optics(IPO) for radar cross section(RCS) by using two techniques effectively. For the analysis of the multiple reflection in the cavity, IPO uses the near field method, unlike shooting and bouncing rays method which uses the geometric optics(GO). However, it is still far slower than physical optics(PO) and it is needed to accelerate the speed of IPO for practical purpose. In order to address this problem, graphic processing unit(GPU) can be applied to reduce calculation time and adaptive iterative physical optics-change rate(AIPO-CR) method is also applicable effectively to optimize iteration for acceleration of calculation.