• Title/Summary/Keyword: GPU implementation

Search Result 147, Processing Time 0.023 seconds

Status of ASTE Focal Plane Array Development

  • Lee, Jung-Won;Je, Do-Heung;Lee, Bangwon;Kang, Hyunwoo;Wagner, Jan;Kim, Jongsoo;Han, Seog-Tae;Asayama, Shin'ichiro;Kojima, Takafumi;Gonzalez, Alvaro;Kroug, Matthias;Shan, Wenrei;Iguchi, Satoru;Iono, Daisuke
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.41 no.2
    • /
    • pp.59.2-59.2
    • /
    • 2016
  • As an enhancement to increase mapping speed of the current ALMA TP array, development of a focal plane array system working at ultra wide frequency range of 275-500 GHz with GPU-based software spectrometers has been carried out since 2015. Major progresses on such component development as wideband DSB mixers, a profiled corrugated horn, receiver optics, LO system and GPU-based spectrometer are reviewed with brief introduction to implication of ALMA 2030 for technical implementation.

  • PDF

Implementation of DES Algorithm using CUDA (CUDA를 이용한 DES 구현)

  • Kim, Juho;Park, Neungsoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.1086-1087
    • /
    • 2012
  • GPU를 이용하여 병렬 처리 연산을 하는 연구는 활발히 진행되고 있고, 이미 많은 곳에서 사용되고 있다. 본 논문에서는 엔비디아에서 개발한 CUDA를 사용하여 DES 알고리즘을 고속으로 구현하기 위해 CUDA overlapping을 이용했다. 이것은 GPU 에서 연산을 하는 동시에 연산 결과를 바로 Host로 보내어 연산시간과 전송시간을 Overlap 하여 시간을 더 단축 하도록 하는 구현방법이다. 그 결과 Overlap 하기 전보다 약 30%의 성능향상을 확인 할 수 있었다. 향후 DES 뿐만 아니라 3DES, AES, SEED 등 여러 암호화 알고리즘들도 적용할 예정이다.

Parallel Implementation and Performance Evaluation of the SIFT Algorithm Using a Many-Core Processor (매니코어 프로세서를 이용한 SIFT 알고리즘 병렬구현 및 성능분석)

  • Kim, Jae-Young;Son, Dong-Koo;Kim, Jong-Myon;Jun, Heesung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.9
    • /
    • pp.1-10
    • /
    • 2013
  • In this paper, we implement the SIFT(Scale-Invariant Feature Transform) algorithm for feature point extraction using a many-core processor, and analyze the performance, area efficiency, and system area efficiency of the many-core processor. In addition, we demonstrate the potential of the proposed many-core processor by comparing the performance of the many-core processor with that of high-performance CPU and GPU(Graphics Processing Unit). Experimental results indicate that the accuracy result of the SIFT algorithm using the many-core processor was same as that of OpenCV. In addition, the many-core processor outperforms CPU and GPU in terms of execution time. Moreover, this paper proposed an optimal model of the SIFT algorithm on the many-core processor by analyzing energy efficiency and area efficiency for different octave sizes.

Implementation of Scenario-based AI Voice Chatbot System for Museum Guidance (박물관 안내를 위한 시나리오 기반의 AI 음성 챗봇 시스템 구현)

  • Sun-Woo Jung;Eun-Sung Choi;Seon-Gyu An;Young-Jin Kang;Seok-Chan Jeong
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.91-102
    • /
    • 2022
  • As artificial intelligence develops, AI chatbot systems are actively taking place. For example, in public institutions, the use of chatbots is expanding to work assistance and professional knowledge services in civil complaints and administration, and private companies are using chatbots for interactive customer response services. In this study, we propose a scenario-based AI voice chatbot system to reduce museum operating costs and provide interactive guidance services to visitors. The implemented voice chatbot system consists of a watcher object that detects the user's voice by monitoring a specific directory in real-time, and an event handler object that outputs AI's response voice by performing inference by model sequentially when a voice file is created. And Including a function to prevent duplication using thread and a deque, GPU operations are not duplicated during inference in a single GPU environment.

A Study on FPGA utilization For PC-based Full-HD DVR System Implementation (Full-HD급 PC기반 DVR System 구현을 위한 FPGA 활용에 관한 연구)

  • Kim, Ki-Hwa
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.4
    • /
    • pp.2363-2369
    • /
    • 2014
  • The DVR system supports multiple cameras and should be able to receive images at 30 frames per channel in real time. Thus, The system is using Full-HD-grade Multiplexer and Hardware compression codec. In this paper, Describing the design and implementation for the 4-channel Full-HD-grade PC-based DVR using FPGA and GPU inside CPU without Multiplexer and Hardware codec. The existing DVR system for Full-HD-grade has drawbacks to acquire images of about only 20 frames per channel in real time. The system to acquire images of multiple channel in real time was designed using FPGA. The software for the system was implemented using Intel Media SDK. At the result of performance evaluation, It was satisfied all for the required conditions. The practicality of the system was confirmed as implementation the system without using hardware compression.

User-Guidable Abstract Line Drawing of 2D Images (사용자 제어가 용이한 이차원 영상의 추상화된 라인 드로잉 생성)

  • Son, Min-Jung;Lee, Yun-Jin;Kang, Hen-Ry;Lee, Seung-Yong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.2
    • /
    • pp.110-125
    • /
    • 2010
  • We present a novel scheme for generating line drawings from 2D images, aiming to facilitate effective visual communication. In contrast to conventional edge detectors, our technique imitates the human line drawing process to generate lines effectively and intuitively. Our technique consists of three parts: line extraction, line rendering, and user guidance. In line extraction, we extract lines by estimating a likelihood function to effectively find the genuine shape boundaries. In line rendering, we consider the feature scale and the blurriness of lines with which the detail and the focus-level of lines are controlled. We also employ stroke textures to provide a variety of illustration styles. User guidance is allowed to modify the shapes and positions of lines interactively, where immediate response is provided by GPU implementation of most line extraction operations. Experimental results demonstrate that our technique generates various kinds of line drawings from 2D images enabled by the control over detail, focus, and style.

Implementation of Parallel Processing Interpolation Algorithm for Multicore GPU (다중코어 GPU를 위한 병렬처리 보간 알고리즘 구현)

  • Lee, Kwang-Yeob;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.304-309
    • /
    • 2012
  • As resolution for displays is recently more and more increasing, the amount of data abd calculation that graphic hardware needs to process are also increasing. Especially the amount of data processing by rasterizer is rapidly increasing. This paper used an algorism using coordinates in center of gravity and area for triangle instead of using bilinear algorism[1] used by conventional interpolation, which is to make it easier for parallel processing by rasterizer. This paper implemented designed rasterizer under FPGA environment, and compared it with conventional rasterizer and verified it. This rasterizer is proved to have approximately 50% higher performance compared to conventional one.

A Parallel Implementation of JPEG2000 4K Ultra High Definition Image using OpenCL (OpenCL을 이용한 JPEG2000 4K 초고화질 영상처리의 병렬고속화 구현)

  • Park, Daeseung;Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.10 no.1
    • /
    • pp.1-5
    • /
    • 2015
  • With the help of fast growing multimedia technology and high preference for users of large screens, the newest video coding standard, HEVC (High Efficiency Video Coding) high-quality video compression), has been introduced. Therefore, the high definition image services which are four times more clear than conventional HD video, are getting popular. JPEG 2000 also has stated to support 4K and 8K UHD. As a result, it requires fast processing technology to read and write UHD images. This paper introduces a study on fast parallel processing technology for UHD images. For this purpose, first, JPEG 2000 is reviewed and a GPU based parallel implementation is proposed for a preprocessing of color conversion stage. The parallelled algorithm is implemented with OpenCL (Open Computing Language). The simulation results show that the proposed method shows 5 times performance improvements on processing speed for 4K UHD over the method using threads.

Implementation of FPGA-based Accelerator for GRU Inference with Structured Compression (구조적 압축을 통한 FPGA 기반 GRU 추론 가속기 설계)

  • Chae, Byeong-Cheol
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.850-858
    • /
    • 2022
  • To deploy Gate Recurrent Units (GRU) on resource-constrained embedded devices, this paper presents a reconfigurable FPGA-based GRU accelerator that enables structured compression. Firstly, a dense GRU model is significantly reduced in size by hybrid quantization and structured top-k pruning. Secondly, the energy consumption on external memory access is greatly reduced by the proposed reuse computing pattern. Finally, the accelerator can handle a structured sparse model that benefits from the algorithm-hardware co-design workflows. Moreover, inference tasks can be flexibly performed using all functional dimensions, sequence length, and number of layers. Implemented on the Intel DE1-SoC FPGA, the proposed accelerator achieves 45.01 GOPs in a structured sparse GRU network without batching. Compared to the implementation of CPU and GPU, low-cost FPGA accelerator achieves 57 and 30x improvements in latency, 300 and 23.44x improvements in energy efficiency, respectively. Thus, the proposed accelerator is utilized as an early study of real-time embedded applications, demonstrating the potential for further development in the future.

CUDA-based Fast DRR Generation for Analysis of Medical Images (의료영상 분석을 위한 CUDA 기반의 고속 DRR 생성 기법)

  • Yang, Sang-Wook;Choi, Young;Koo, Seung-Bum
    • Korean Journal of Computational Design and Engineering
    • /
    • v.16 no.4
    • /
    • pp.285-291
    • /
    • 2011
  • A pose estimation process from medical images is calculating locations and orientations of objects obtained from Computed Tomography (CT) volume data utilizing X-ray images from two directions. In this process, digitally reconstructed radiograph (DRR) images of spatially transformed objects are generated and compared to X-ray images repeatedly until reasonable transformation matrices of the objects are found. The DRR generation and image comparison take majority of the total time for this pose estimation. In this paper, a fast DRR generation technique based on GPU parallel computing is introduced. A volume ray-casting algorithm is explained with brief vector operations and a parallelization technique of the algorithm using Compute Unified Device Architecture (CUDA) is discussed. This paper also presents the implementation results and time measurements comparing to those from pure-CPU implementation and open source toolkit.