Search | Korea Science

Performance Comparison of Join Operations Parallelization by using GPGPU (GPGPU 기반 조인 연산 병렬화 성능 비교)

Lee, Jong-Sub;Lee, Sang-Back;Lee, Kyu-Chul
- Database Research
- /
- v.34 no.3
- /
- pp.28-44
- /
- 2018
In a database system, the most expensive operation among relational operations is a join operation. Generally, CPU-based join operations uses parallel processing with either 1 core or 16 cores at most, which does not significantly improve the function. On the other hand, GPGPU(General-Purpose computing on Graphics Processing Units) allows parallel processing through thousands of processing units, greatly reducing the time required to perform join operations. Parallelization of the operation using GPGPU uses NVIDIA's CUDA SDK. In this paper, we implement parallelization of the join operation using GPGPU and compare the performances. The used join operations are Nested Loop Join (NLJ), Sort Merge Join (SMJ) and Hash Join (HJ), and GPGPU equipment uses TITAN Xp, GTX 1080 Ti and GTX 1080. We measure and compare the performance of join operations based on CPU and GPGPU. We compare this performance with the performance of the previous study on the join operation based on GPGPU. The results of experiment show that the performance based on GPGPU is 6~328 times faster than the one based on CPU.

Analysis of the GPGPU Performance for Various Combinations of Workloads Executed Concurrently (동시에 실행되는 워크로드 조합에 따른 GPGPU 성능 분석)

Kim, Dongwhan;Eom, Hyeonsang
- KIISE Transactions on Computing Practices
- /
- v.23 no.3
- /
- pp.165-170
- /
- 2017
Many studies have utilized GPGPU (General-Purpose Graphic Processing Unit) and its high computing power to compute complex tasks. The characteristics of GPGPU programs necessitate the operations of memory copy between the host and device. A high latency period can affect the performance of the program. Thus, it is required to significantly improve the performance of GPGPU programs by optimizations. By executing multiple GPGPU programs simultaneously, the latency hiding effect of memory copy is achieved by overlapping the memory copy and computing operations in GPGPU. This paper presents the results of analyzing the latency hiding effect for memory copy operations. Furthermore, we propose a performance anticipation model and an algorithm for the limitations of using pinned memory, and show that the use of the proposed algorithm results in a 41% performance increase.
https://doi.org/10.5626/KTCP.2017.23.3.165 인용 KSCI

A Study of The GPGPU Performance (범용 그래픽 처리장치 (GPGPU)의 성능에 대한 연구)

Lee, Jongbok
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.18 no.6
- /
- pp.201-206
- /
- 2018
As the artificial intelligence and big data technology has been developed recently, the importance of GPGPU, which is a general purpose graphics processing unit, is emphasized. In addition, by the demand for mining equipment to obtain bit coins, which is a block chain application technology, the price of GPGPU has increased sharply with scarcity. If a GPGPU can be precisely simulated, it is possible to conduct experiments on various GPGPU types and analyze performance without purchasing expensive ones. In this paper, we investigate the configuration of a GPGPU simulator and measure the performance of various benchmark programs using GPGPU-Sim.
https://doi.org/10.7236/JIIBC.2018.18.6.201 인용 PDF KSCI HTML

IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

Son, Dong Oh;Choi, Hong Jun;Kim, Cheol Hong
- Journal of the Korea Society of Computer and Information
- /
- v.25 no.2
- /
- pp.11-19
- /
- 2020
Modern GPU can execute general purpose computation on the graphic processing unit, and provide high performance by exploiting many core on GPU. To run AES algorithm efficiently, parallel computational resources are required. However, computational resource of CPU architecture are not enough to cryptographic algorithm such as AES whereas GPU architecture has mass parallel computation resources. Therefore, this paper reduce the time to execute AES by employing parallel computational resource on GPGPU. Unfortunately, AES cannot utilize computational resource on GPGPU since it isn't suitable to GPGPU architecture. In this paper, IPC based dynamic SM management technique are proposed to efficiently execute AES on GPGPU. IPC based dynamic SM management can increase and decrease the number of active SMs by using IPC in run-time. According to simulation results, proposed technique improve the performance by increasing resource utilization compared to baseline GPGPU architecture. The results show that AES improve the performance by 41.2% on average.
https://doi.org/10.9708/jksci.2020.25.02.011 인용 PDF KSCI

Method of extract eye zone using GPGPU (GPGPU를 이용한 눈 영역 검출 기법)

Park, Young-Jae;Kim, Gye-Young
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2011.01a
- /
- pp.269-272
- /
- 2011
본 논문에서는 GPGPU를 이용한 눈 영역 검출 기법을 제안한다. 영상 전체의 평균과 분산을 기반으로 하여 각 마스크의 평균과 분산값을 비교는 비교적 간단한 알고리즘을 이용하여 눈 영역을 검출한다. 정확도의 경우 명암값의 대비를 이용한 기존의 방법과 비슷한 수준을 보였다. 하지만 연산속도의 경우 병렬처리 구간을 늘려 GPGPU를 사용한 제안된 방법이 우수한 성능을 보였다.
PDF

A CPU-GPGPU Based Multithread File Chunking System (CPU-GPGPU 를 기반으로 멀티스레드 파일청킹 시스템)

Tang, Zhi;Won, You-Jip
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06b
- /
- pp.336-337
- /
- 2011
The popularity of general purpose GPU（GPGPU）makes the CPU-GPGPU heterogeneous architecture normal. Therefore, tradeoff the usage of CPU and GPGPU becomes a way to improve performance of programs. In this work, we exploit the properties of the CPU-GPGPU heterogeneous architecture and use them to accelerate the content based chunking operation of deduplication. We built a prototype system which is able to coordinate CPU and GPGPU to chunk file and has been proven to have a better performance compared to using either CPU or GPGPU alone.

Implementation and Performance Evaluation of the Faddev-Leverrier Algorithm using GPGPU (GPGPU를 이용한 파데브-레브리어 알고리즘 구현 및 성능 분석)

Park, Yong-Hun;Kim, Cheol-Hong;Kim, Jong-Myon
- IEMEK Journal of Embedded Systems and Applications
- /
- v.8 no.3
- /
- pp.171-178
- /
- 2013
In this paper, we implement the Faddev-Leverier algorithm using GPGPU (General-Purpose Graphics Processing Unit) to accelerate singular value decomposition. In addition, we compare the performance of the algorithm using CPU and CPU plus GPGPU for eleven ${\times}n$ matrix sizes in order to decompose singular values, where =4, 8, 16, 32, 64, 128, 256, 512, 1,024, 2,048, and 4,096. Experimental results indicate that CPU achieves better performance than CPU plus GPGPU for $n{\leq}64$ because of a large number of read and write operations between CPU and GPGPU. However, CPU plus GPGPU outperforms CPU exponentially in the execution time for $n{\geq}64$.
https://doi.org/10.14372/IEMEK.2013.8.3.171 인용 PDF KSCI

SimTBS: Simulator For GPGPU Thread Block Scheduling (SimTBS: GPGPU 스레드블록 스케줄링 시뮬레이터)

Cho, Kyung-Woon;Bahn, Hyokyung
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.20 no.4
- /
- pp.87-92
- /
- 2020
Although GPGPU (General-Purpose GPU) can maximize performance by parallelizing a task with tens of thousands of threads, those threads are internally grouped into a thread block, which is a base unit for processing and resource allocation. A thread block scheduler is a specialized hardware gadget whose role is to allocate thread blocks to GPGPU processing hardware in a round-robin manner. However, round-robin is a sequential allocation policy and is not optimized for GPGPU resource utilization. In this paper, we propose a thread block scheduler model which can analyze and quantify performances for various thread block scheduling policies. Experiment results from the implemented simulator of our model show that the legacy hardware thread block scheduling does not behave well when workload becomes heavy.
https://doi.org/10.7236/JIIBC.2020.20.4.87 인용 PDF KSCI HTML

GPGPU Acceleration of SAT Algorithm with Propagation Routine Parallelization (전달 루틴의 병렬화를 통한 SAT 알고리즘의 GPGPU 가속화)

Kang, Hyeong-Ju
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.20 no.10
- /
- pp.1919-1926
- /
- 2016
Because of the enormous processing ability, General-Purpose Graphics Processing Unit(GPGPU) has been applied to many fields including electronics design automation. The SAT algorithm is one of the core algorithm in many electronics design automation tools. There has been some efforts to apply GPGPU to the SAT algorithm, but it is difficult to parallelize the SAT algorithm because of its characteristics. In this paper, I applied GPGPU to the SAT algorithm by parallelizing the propagation routine that is relatively suitable to parallel processing. On the basis of the similarity of the propagation routine to the sparse matrix multiplication, the data structure for the SAT problem is constituted, and the parallel propagation routine is described. To prevent data loss between paralllel threads, atomic operations are exploited. The experimental results for some benchmark SAT problems show that the proposed algorithm is superior to the previous GPGPU-based SAT solver.
https://doi.org/10.6109/jkiice.2016.20.10.1919 인용 PDF KSCI

Design of a High-Performance Mobile GPGPU with SIMT Architecture based on a Small-size Warp Scheduler (작은 크기의 Warp 스케쥴러 기반 SIMT구조 고성능 모바일 GPGPU 설계)

Lee, Kwang-Yeob
- Journal of IKEEE
- /
- v.25 no.3
- /
- pp.479-484
- /
- 2021
This paper proposed and designed a structure to achieve high performance with a small number of cores in GPGPU with SIMT structure. GPGPU for application to mobile devices requires a structure to increase performance compared to power consumption. In order to reduce power consumption, the number of cores decreased, but to improve performance, the size of the warp scheduler for managing threads was set to 4, which was greatly reduced than 32 of general GPGPU. Reducing warp size can reduce the number of idle cycles in pipelines and efficiently apply memory latency to reduce miss penalty when accessing cache memory. The designed GPGPU measured computational performance using a test program that includes floating point operations and measured power consumption through a 28nm CMOS process to obtain 104.5GFlops/Watt as a performance per power. The results of this paper showed about four times better performance per power compared to Tegra K1 of Nvidia
https://doi.org/10.7471/ikeee.2021.25.3.479 인용 PDF KSCI

Search Result 200, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)