Search | Korea Science

The Need of Cache Partitioning on Shared Cache of Integrated Graphics Processor between CPU and GPU (내장형 GPU 환경에서 CPU-GPU 간의 공유 캐시에서의 캐시 분할 방식의 필요성)

Sung, Hanul;Eom, Hyeonsang;Yeom, HeonYoung
- KIISE Transactions on Computing Practices
- /
- v.20 no.9
- /
- pp.507-512
- /
- 2014
Recently, Distributed computing processing begins using both CPU(Central processing unit) and GPU(Graphic processing unit) to improve the performance to overcome darksilicon problem which cannot use all of the transistors because of the electric power limitation. There is an integrated graphics processor that CPU and GPU share memory and Last level cache(LLC). But, There is no LLC access rules between CPU and GPU, so if GPU and CPU processes run together at the same time, performance of both processes gets worse because of the contention on the LLC. This Paper gives evidence to prove the need of the Cache Partitioning and is mentioned about the cache partitioning design using page coloring to allocate the L3 Cache space only for the GPU process to guarantee GPU process performance.
https://doi.org/10.5626/KTCP.2014.20.9.507 인용

Analysis of Implementing Mobile Heterogeneous Computing for Image Sequence Processing

BAEK, Aram;LEE, Kangwoon;KIM, Jae-Gon;CHOI, Haechul
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.11 no.10
- /
- pp.4948-4967
- /
- 2017
On mobile devices, image sequences are widely used for multimedia applications such as computer vision, video enhancement, and augmented reality. However, the real-time processing of mobile devices is still a challenge because of constraints and demands for higher resolution images. Recently, heterogeneous computing methods that utilize both a central processing unit (CPU) and a graphics processing unit (GPU) have been researched to accelerate the image sequence processing. This paper deals with various optimizing techniques such as parallel processing by the CPU and GPU, distributed processing on the CPU, frame buffer object, and double buffering for parallel and/or distributed tasks. Using the optimizing techniques both individually and combined, several heterogeneous computing structures were implemented and their effectiveness were analyzed. The experimental results show that the heterogeneous computing facilitates executions up to 3.5 times faster than CPU-only processing.
https://doi.org/10.3837/tiis.2017.10.014 인용 PDF KSCI

Enhancement of H.264/AVC Encoding Speed and Reduction of CPU Load through Parallel Programming Based on CUDA (CUDA 기반의 병렬 프로그래밍을 통한 H.264/AVC 부호화 속도 향상 및 CPU 부하 경감)

Jang, Eun-Been;Ha, Yun-Su
- Journal of Advanced Marine Engineering and Technology
- /
- v.34 no.6
- /
- pp.858-863
- /
- 2010
In order to enhance encoding speed in dynamic image encoding using H.264/AVC, reducing the time for motion estimation which takes a large portion of the processing time is very important. An approach using graphics processing unit(GPU) as a coprocessor to assist the central processing unit(CPU) in computing massive data, will be a way to reduce the processing time. In this paper, we present an efficient block-level parallel algorithm for the motion estimation(ME) on a computer unified device architecture(CUDA) platform developed in general-purpose computation on GPU. Experiments are carried out to verify the effectiveness of the proposed algorithm.
https://doi.org/10.5916/jkosme.2010.34.6.858 인용 PDF KSCI

A study on application of GPU-accelerated kinematic wave rainfall-runoff model (GPU 가속 운동파 강우유출모형의 적용 연구)

Kim, Boram;Yun, Gwan Seon;Kim, Hyeong-Jun;Yoon, Kwang Seok
- Proceedings of the Korea Water Resources Association Conference
- /
- 2020.06a
- /
- pp.323-323
- /
- 2020
그래픽 처리 장치(Graphic Processing Unit: GPU)는 그래픽 처리 작업에 특화된 다수의 산술논리 장치(Arithmetic Logic Unit: ALU)로 구성되어 있어서 중앙 처리 장치(Central Processing Unit: CPU)보다 한 번에 더 많은 연산 수행이 가능하다. 본 연구는 GPU 가속 운동파모형을 실제 유역에 적용하여, GPU 가속 운동파 강우유출모형 결과에 대한 정확성과 연산 소요 시간에 대한 효율성을 확인하였다. GPU 가속 운동파모형은 분포형 강우유출모형의 수치모의 연산시간을 단축시키기 위해 CUDA 포트란을 이용하여 개발되었다. 분포형모형의 지배방정식은 운동파모형과 Green-Ampt모형으로 구성되었고, 운동파모형은 유한체적법을 이용하여 이산화 하였다. GPU 가속 운동파모형을 이용하여 금강의 미호천 유역에서 발생하는 강우유출현상을 모의 하였고, 동일한 유한체적법을 이용한 CPU(Central Processing Unit) 기반의 강우유출모형과 비교하였다. 그 결과 GPU 가속모형의 결과는 미호천 유역 하류단에서 관측한 결과와 유사한 결과를 나타냈다. 또한, 연산소요시간은 CPU 기반의 강우유출모형의 연산소요시간보다 단축되었으며, 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.
PDF

3D Holographic Image Recognition by Using Graphic Processing Unit

Lee, Jeong-A;Moon, In-Kyu;Liu, Hailing;Yi, Faliu
- Journal of the Optical Society of Korea
- /
- v.15 no.3
- /
- pp.264-271
- /
- 2011
In this paper we examine and compare the computational speeds of three-dimensional (3D) object recognition by use of digital holography based on central unit processing (CPU) and graphic processing unit (GPU) computing. The holographic fringe pattern of a 3D object is obtained using an in-line interferometry setup. The Fourier matched filters are applied to the complex image reconstructed from the holographic fringe pattern using a GPU chip for real-time 3D object recognition. It is shown that the computational speed of the 3D object recognition using GPU computing is significantly faster than that of the CPU computing. To the best of our knowledge, this is the first report on comparisons of the calculation time of the 3D object recognition based on the digital holography with CPU vs GPU computing.
https://doi.org/10.3807/JOSK.2011.15.3.264 인용 PDF KSCI

Development of Diffusive Wave Rainfall-Runoff Model Based on CUDA FORTRAN (CUDA FORTEAN기반 확산파 강우유출모형 개발)

Kim, Boram;Kim, Hyeong-Jun;Yoon, Kwang Seok
- Proceedings of the Korea Water Resources Association Conference
- /
- 2021.06a
- /
- pp.287-287
- /
- 2021
본 연구에서는 CUDA(Compute Unified Device Architecture) 포트란을 이용하여 확산파 강우 유출모형을 개발하였다. CUDA 포트란은 그래픽 처리 장치(Graphic Processing Unit: GPU)에서 수행하는 병렬 연산 알고리즘을 포트란 언어를 사용하여 작성할 수 있도록 하는 GPU상의 범용계산(General-Purpose Computing on Graphics Processing Units: GPGPU) 기술이다. GPU는 그래픽 처리 작업에 특화된 다수의 산술 논리 장치(Arithmetic Logic Unit: ALU)로 구성되어 있어서 중앙 처리 장치(Central Processing Unit: CPU)보다 한 번에 더 많은 연산 수행이 가능하다. 이에 따라, CUDA 포트란기반 확산파모형은 분포형 강우유출모형의 수치모의 연산시간을 단축시킬 수 있다. 분포형모형의 지배방정식은 확산파모형과 Green-Ampt모형으로 구성되었고, 확산파모형은 유한체적법을 이용하여 이산화 하였다. CUDA 포트란기반 확산파모형의 정확성은 기존 연구된 수리실험 결과 및 CPU기반 강우유출모형과 비교하였으며, 연산소요시간에 대한 효율성은 CPU기반 확산파모형과 비교하였다. 그 결과 CUDA 포트란기반 확산파모형의 결과는 수리실험 결과 및 CPU기반 강우유출모형의 결과와 유사한 결과를 나타냈다. 또한, 연산소요시간은 CPU 기반 확산파모형의 연산소요시간보다 단축되었으며, 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.
PDF

Accelerating Numerical Analysis of Reynolds Equation Using Graphic Processing Units (그래픽처리장치를 이용한 레이놀즈 방정식의 수치 해석 가속화)

Myung, Hun-Joo;Kang, Ji-Hoon;Oh, Kwang-Jin
- Tribology and Lubricants
- /
- v.28 no.4
- /
- pp.160-166
- /
- 2012
This paper presents a Reynolds equation solver for hydrostatic gas bearings, implemented to run on graphics processing units (GPUs). The original analysis code for the central processing unit (CPU) was modified for the GPU by using the compute unified device architecture (CUDA). The red-black Gauss-Seidel (RBGS) algorithm was employed instead of the original Gauss-Seidel algorithm for the iterative pressure solver, because the latter has data dependency between neighboring nodes. The implemented GPU program was tested on the nVidia GTX580 system and compared to the original CPU program on the AMD Llano system. In the iterative pressure calculation, the implemented GPU program showed 20-100 times faster performance than the original CPU codes. Comparison of the wall-clock times including all of pre/post processing codes showed that the GPU codes still delivered 4-12 times faster performance than the CPU code for our target problem.
https://doi.org/10.9725/kstle.2012.28.4.160 인용 PDF KSCI

Assessment of Parallel Computing Performance of Agisoft Metashape for Orthomosaic Generation (정사모자이크 제작을 위한 Agisoft Metashape의 병렬처리 성능 평가)

Han, Soohee;Hong, Chang-Ki
- Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
- /
- v.37 no.6
- /
- pp.427-434
- /
- 2019
In the present study, we assessed the parallel computing performance of Agisoft Metashape for orthomosaic generation, which can implement aerial triangulation, generate a three-dimensional point cloud, and make an orthomosaic based on SfM (Structure from Motion) technology. Due to the nature of SfM, most of the time is spent on Align photos, which runs as a relative orientation, and Build dense cloud, which generates a three-dimensional point cloud. Metashape can parallelize the two processes by using multi-cores of CPU (Central Processing Unit) and GPU (Graphics Processing Unit). An orthomosaic was created from large UAV (Unmanned Aerial Vehicle) images by six conditions combined by three parallel methods (CPU only, GPU only, and CPU + GPU) and two operating systems (Windows and Linux). To assess the consistency of the results of the conditions, RMSE (Root Mean Square Error) of aerial triangulation was measured using ground control points which were automatically detected on the images without human intervention. The results of orthomosaic generation from 521 UAV images of 42.2 million pixels showed that the combination of CPU and GPU showed the best performance using the present system, and Linux showed better performance than Windows in all conditions. However, the RMSE values of aerial triangulation revealed a slight difference within an error range among the combinations. Therefore, Metashape seems to leave things to be desired so that the consistency is obtained regardless of parallel methods and operating systems.
https://doi.org/10.7848/ksgpc.2019.37.6.427 인용 PDF KSCI

Real-time Ray-tracing Chip Architecture

Yoon, Hyung-Min;Lee, Byoung-Ok;Cheong, Cheol-Ho;Hur, Jin-Suk;Kim, Sang-Gon;Chung, Woo-Nam;Lee, Yong-Ho;Park, Woo-Chan
- IEIE Transactions on Smart Processing and Computing
- /
- v.4 no.2
- /
- pp.65-70
- /
- 2015
In this paper, we describe the world's first real-time ray-tracing chip architecture. Ray-tracing technology generates high-quality 3D graphics images better than current rasterization technology by providing four essential light effects: shadow, reflection, refraction and transmission. The real-time ray-tracing chip named RayChip includes a real-time ray-tracing graphics processing unit and an accelerating tree-building unit. An ARM Ltd. central processing unit (CPU) and other peripherals are also included to support all processes of 3D graphics applications. Using the accelerating tree-building unit named RayTree to minimize the CPU load, the chip uses a low-end CPU and decreases both silicon area and power consumption. The evaluation results with RayChip show appropriate performance to support real-time ray tracing in high-definition (HD) resolution, while the rendered images are scaled to full HD resolution. The chip also integrates the Linux operating system and the familiar OpenGL for Embedded Systems application programming interface for easy application development.
https://doi.org/10.5573/IEIESPC.2015.4.2.065 인용 PDF KSCI

Development of Real-Time Image Processing System Using GPU (GPU를 이용한 실시간 이미지 프로세싱 시스템)

Oh Jae-Hong;Kang Hoon;Lee Ja-Yong
- Journal of Institute of Control, Robotics and Systems
- /
- v.11 no.5
- /
- pp.393-397
- /
- 2005
When a real-time image processing application is implemented with a general-purpose computer, CPU (Central Processing Unit) is usually heavily loaded and in many cases that CPU alone cannot meet the real-time requirement at all. Most modern computers are equipped with powerful Graphics Processing Units (GPUs) to accelerate graphics operations. There is a trend that the power of GPU outgrows that of CPU. If we take advantage of the powerful GPU for more general operations other than pure graphics operations, the processing time can be reduced. In this study, we will present techniques that apply GPU to general operations such as image processing procedures. Our experiment results show that significant speed-up can be achieved by using GPU.
https://doi.org/10.5302/J.ICROS.2005.11.5.393 인용 PDF KSCI

Search Result 70, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)