• Title/Summary/Keyword: GPU optimization

Search Result 66, Processing Time 0.034 seconds

Performance Analysis and Optimization of mobile GPU in Android Phone (안드로이드 폰에서의 모바일 GPU 성능 분석 및 최적화)

  • Cho, Chang-Woo;Ashok, Sharma;Kim, Shin-Dug
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2013.07a
    • /
    • pp.1-4
    • /
    • 2013
  • 본 논문에서는 최신 안드로이드 기반 상용 스마트폰의 모바일 GPU 성능 향상을 위한 방법론을 제안한다. 동일 하드웨어를 가지고 스마트폰을 개발하더라도 제조사의 역량에 따라 소프트웨어 최적화의 정도가 달라서 그래픽 성능 차이가 날 수 있다. 그러므로 우리는 시스템 소프트웨어 레벨에서 그래픽 품질에 아무런 영향을 주지 않고 성능 향상을 이끌어낼 수 있는 기법에 대해 소개한다. 이를 위해 A사, B사 안드로이드 스마트폰을 대상으로 안드로이드 커널에 따른 분석을 수행하였고, GPU 디바이스 드라이버에 따른 분석을 수행하였으며, 마지막으로 타사 휴대폰과의 성능 비교를 통해 이 결과를 비교 분석하였다. 결과적으로 GPU 디바이스 드라이버 변경과 안드로이드 커널 변경을 시도함으로써 B사 대비 68%의 성능을 보인 A사 스마트폰의 성능을 96%까지 향상시킬 수 있었다.

  • PDF

Implementation of Particle Swarm Optimization Method Using CUDA (CUDA를 이용한 Particle Swarm Optimization 구현)

  • Kim, Jo-Hwan;Kim, Eun-Su;Kim, Jong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.5
    • /
    • pp.1019-1024
    • /
    • 2009
  • In this paper, particle swarm optimization(PSO) is newly implemented by CUDA(Compute Unified Device Architecture) and is applied to function optimization with several benchmark functions. CUDA is not CPU but GPU(Graphic Processing Unit) that resolves complex computing problems using parallel processing capacities. In addition, CUDA helps one to develop GPU softwares conveniently. Compared with the optimization result of PSO executed on a general CPU, CUDA saves about 38% of PSO running time as average, which implies that CUDA is a promising frame for real-time optimization and control.

A QoS-Aware Energy Optimization Technique for Smartphone GPUs (QoS를 고려한 스마트폰 GPU의 에너지 최적화 기법)

  • Kim, Dohan;Song, Wook;Kim, HyungHoon;Kim, Jihong
    • Journal of KIISE
    • /
    • v.42 no.5
    • /
    • pp.566-572
    • /
    • 2015
  • We proposed a novel energy optimization technique for smartphone GPUs, more aggressively lowering the GPU frequency while obtaining higher energy efficiency with a negligible negative impact on the GPU performance. In order to achieve the Quality of Service (QoS) specified by the smartphone application, the proposed optimization technique employed the minimal acceptable GPU frequency based on average Frames per Second (FPS) for each GPU frequency level. Our experimental results on a smartphone development board showed that the proposed technique can reduce the GPU energy consumption by up to 23% over the default DVFS algorithm with only a 0.45 frame drop.

Filtering and GPU Optimization to Reliably Express the Exaggeration of 3D Triangular Meshes (3차원 삼각형 메쉬의 과장을 안정적으로 표현할 수 있는 필터링과 GPU 최적화)

  • SuBin Lee;Seong-Hyeok Moon;Jong-Hyun Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.349-352
    • /
    • 2023
  • 본 논문에서는 법선벡터를 이용해 3D 삼각형 메쉬의 형태를 안정적으로 과장하고 GPU 기반으로 새롭게 설계하는 프레임워크를 제안한다. 우리는 High-boost 메쉬 필터링 알고리즘에서의 Aliasing 문제를 양방향 필터를 적용하여 노이지를 제거하고, GPU 기반에서 설계해 고속화한다.

  • PDF

Optimal Implementation of Lightweight Block Cipher PIPO on CUDA GPGPU (CUDA GPGPU 상에서 경량 블록 암호 PIPO의 최적 구현)

  • Kim, Hyun-Jun;Eum, Si-Woo;Seo, Hwa-Jeong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.6
    • /
    • pp.1035-1043
    • /
    • 2022
  • With the spread of the Internet of Things (IoT), cloud computing, and big data, the need for high-speed encryption for applications is emerging. GPU optimization can be used to validate cryptographic analysis results or reduced versions theoretically obtained by the GPU in a reasonable time. In this paper, PIPO lightweight encryption implemented in various environments was implemented on GPU. Optimally implemented considering the brute force attack on PIPO. In particular, the optimization implementation applying the bit slicing technique and the GPU elements were used as much as possible. As a result, the implementation of the proposed method showed a throughput of about 19.5 billion per second in the RTX 3060 environment, achieving a throughput of about 122 times higher than that of the previous study.

Optimization of Color Format Conversion of WebCam Images Using the CUDA (CUDA를 이용한 웹캠 영상의 색상 형식 변환 최적화)

  • Kim, Jin-Woo;Jung, Yun-Hye;Park, Jin-Hong;Park, Yong-Jin;Han, Tack-Don
    • Journal of Korea Game Society
    • /
    • v.11 no.1
    • /
    • pp.147-157
    • /
    • 2011
  • Webcam doesn't perform memory-alignment in order to reduce the transmission time of image data. Memory-unaligned image data is unsuitable for the processing on GPU. Accordingly, we convert it to available color format for optimization in high speed image processing. In this paper, we propose a technique that accelerates webcam's color format conversion by using NVDIA CUDA. We propose an optimization which is about memory accesses and thread composition, also evaluate memory and computing performance for verifying a hypothesis which is the performance of the proposed architecture and optimizing degree on low-performance GPU. Following the optimization technique, we show performance improvements over maximum 68 percent.

VDI Performance Optimization with Hybrid Parallel Processing in Thick Client System under Heterogeneous Multi-Core Environment (Heterogeneous 멀티 코어 환경의 Thick Client에서 VDI 성능 최적화를 위한 혼합 병렬 처리 기법 연구)

  • Kim, Myeong-Seob;Huh, Eui-Nam
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.3
    • /
    • pp.163-171
    • /
    • 2013
  • Recently, the requirement of processing High Definition (HD) video or 3D application on low, mobile devices has been expanded and content data has been increased as well. It is becoming a major issue in Cloud computing where a Virtual Desktop Infrastructure (VDI) Service needs efficient data processing ability to provide Quality of Experience (QoE) in Cloud computing. In this paper, we propose three kind of Thick-Thin VDI Service which can share and delegate VDI service based on Thick Client using CPU and GPU. Furthermore, we propose and discuss the VDI Service Optimization Method in mixed CPU and GPU Heterogeneous Environment using CPU Parallel Processing OpenMP and GPU Parallel Processing CUDA.

Implementation of PSO(Particle Swarm Optimization) Algorithm using Parallel Processing of GPU (GPU의 병렬 처리 기능을 이용한 PSO(Particle Swarm Optimization) 알고리듬 구현)

  • Kim, Eun-Su;Kim, Jo-Hwan;Kim, Jong-Wook
    • Proceedings of the KIEE Conference
    • /
    • 2008.10b
    • /
    • pp.181-182
    • /
    • 2008
  • 본 논문에서는 연산 최적화 알고리듬 중 PSO(Particle Swarm Optimization) 알고리듬을 NVIDIA사(社)에서 제공한 CUDA(Compute Unified Device Architecture)를 이용하여 새롭게 구현하였다. CUDA는 CPU가 아닌 GPU(Graphic Processing Unit)의 다양한 병렬 처리 능력을 사용해 복잡한 컴퓨팅 문제를 해결하는 소프트웨어 개발을 가능케 하는 기술이다. 이 기술을 연산 최적화 알고리듬 중 PSO에 적용함으로써 알고리듬의 수행 속도를 개선하였다. CUDA를 적용한 PSO 알고리듬의 검증을 위해 언어 기반으로 프로그래밍하고 다양한 Test Function을 통해 시뮬레이션 하였다. 그리고 기존의 PSO 알고리듬과 비교 분석하였다. 또한 알고리듬의 성능 향상으로 여러 가지 최적화 분야에 적용 할 수 있음을 보인다.

  • PDF

Optimization of Warp-wide CUDA Implementation for Parallel Shifted Sort Algorithm (병렬 Shifted Sort 알고리즘의 Warp 단위 CUDA 구현 최적화)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.18 no.4
    • /
    • pp.739-745
    • /
    • 2017
  • This paper presents and discusses an implementation of the GPU shifted sorting method to find approximate k nearest neighbors which executes within "warp", the minimum execution unit in GPU parallel architecture. Also, this paper presents the comparison results with other two common nearest neighbor searching methods, GPU-based kd-tree and ANN (Approximate Nearest Neighbor) library. The proposed implementation focuses on the cases when k is small, i.e. 2, 4, 8, and 16, which are handled efficiently within warp to consider it is very common for applications to handle small k's. Also, this paper discusses optimization ways to implementation by improving memory management in a loop for the CUB open library and adopting CUDA commands which are supported by GPU hardware. The proposed implementation shows more than 16-fold speed-up against GPU-based other methods in the tests, implying that the improvement would become higher for more larger input data.

A Study on GPGPU Performance Improvement Technique on GCN Architecture Using OpenCL API (GCN 아키텍쳐 상에서의 OpenCL을 이용한 GPGPU 성능향상 기법 연구)

  • Woo, DongHee;Kim, YoonHo
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.1
    • /
    • pp.37-45
    • /
    • 2018
  • The current system upon which a variety of programs are in operation has continuously expanded its domain from conventional single-core and multi-core system to many-core and heterogeneous system. However, existing researches have focused mostly on parallelizing programs based CUDA framework and rarely on AMD based GCN-GPU optimization. In light of the aforementioned problems, our study focuses on the optimization techniques of the GCN architecture in a GPGPU environment and achieves a performance improvement. Specifically, by using performance techniques we propose, we have reduced more then 30% of the computation time of matrix multiplication and convolution algorithm in GPGPU. Also, we increase the kernel throughput by more then 40%.