IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

Son, Dong Oh;Choi, Hong Jun;Kim, Cheol Hong;

doi:10.9708/jksci.2020.25.02.011

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

제25권2호
/
Pages.11-19
/
2020
/
1598-849X(pISSN)
/
2383-9945(eISSN)

한국컴퓨터정보학회 (Korean Society of Computer Information)

DOI QR Code

IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

Son, Dong Oh (SK Hynix Memory System R&D) ;
Choi, Hong Jun (The Attached Institute of ETRI) ;
Kim, Cheol Hong (School of Electronics and Computer Engineering, Chonnam National University)

투고 : 2019.07.19
심사 : 2019.12.02
발행 : 2020.02.28

https://doi.org/10.9708/jksci.2020.25.02.011 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

최신 GPU는 GPGPU를 활용하여 범용 연산이 가능하다. 뿐만 아니라, GPU는 내장된 다수의 코어를 활용하여 강력한 연산 처리량을 제공한다. AES 알고리즘은 다수의 병렬 연산을 요구하지만 CPU 구조에서는 효율적인 병렬처리가 이뤄지지 않는다. 따라서, 본 논문에서는 강력한 병력 연산 자원을 활용하는 GPGPU 구조에서 AES 알고리즘을 수행함으로써 AES 알고리즘 처리시간을 줄여보았다. 하지만, GPGPU 구조는 AES 알고리즘 같은 암호알고리즘에 최적화되어 있지 않다. 그러므로 AES 알고리즘에 최적화될 수 있도록 재구성 가능한 GPGPU 구조를 제안하고자 한다. 제안된 기법은 SM의 개수를 동적으로 할당하는 IPC 기반 SM 동적 관리 기법이다. IPC 기반 SM 동적 관리 기법은 GPGPU 구조에서 동작하는 AES의 IPC를 실시간으로 반영하여 최적의 SM의 개수를 동적으로 할당한다. 실험 결과에 따르면 제안된 동적 SM 관리 기법은 기존의 GPGPU 구조와 비교하여 하드웨어 자원을 효과적으로 활용하여 성능을 크게 향상시켰다. 일반적인 GPGP 구조와 비교하여, 제안된 기법의 AES의 암호화/복호화는 평균 41.2%의 성능 향상을 보여준다.

Modern GPU can execute general purpose computation on the graphic processing unit, and provide high performance by exploiting many core on GPU. To run AES algorithm efficiently, parallel computational resources are required. However, computational resource of CPU architecture are not enough to cryptographic algorithm such as AES whereas GPU architecture has mass parallel computation resources. Therefore, this paper reduce the time to execute AES by employing parallel computational resource on GPGPU. Unfortunately, AES cannot utilize computational resource on GPGPU since it isn't suitable to GPGPU architecture. In this paper, IPC based dynamic SM management technique are proposed to efficiently execute AES on GPGPU. IPC based dynamic SM management can increase and decrease the number of active SMs by using IPC in run-time. According to simulation results, proposed technique improve the performance by increasing resource utilization compared to baseline GPGPU architecture. The results show that AES improve the performance by 41.2% on average.

키워드

참고문헌

D. Luebke and G. Humphreys. "How GPUs work," Journal of Computer, Vol. 40, No. 2, pp. 96-100, February 2007. DOI: 10.1109/MC.2007.59
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," In Proceedings of Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 777-786, August 2004. DOI: 10.1145/1186562.1015800
CUDA Programming Guide Version 3.0, available at https://developer.nvidia.com/cuda-toolkit-30-downloads
Khronos Group, OpenCL, available at http://www.khronos.org/opencl
ATI Stream SDK, available at http://developer.amd.com/community/blog/2009/08/05/ati-stream-sdk-and-opencl
General-purpose computation on graphics hardware, available at http://www.gpgpu.org
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” Computer Graphics Forum, Vol. 26, No. 1, pp. 21-51, March 2007. DOI: 10.1111/j.1467-8659.2007.01012.x
Y. Yang, P. Xiang, M. Mantor, and H. Zhou, "CPU-assisted GPGPU on fused CPU-GPU architectures," In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, pp. 1-12, March 2012. DOI: 10.1109/HPCA.2012.6168948
NVIDIA TITAN X, available at http://www.nvidia.co.kr/graphicscards/geforce/pascal/kr/titan-x-pascal
X. Zhang, and K.K. Parhi, "High-speed VLSI architectures for the AES algorithm," In Proceedings of IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 957-967, August 2004. DOI: 10.1109/TVLSI.2004.832943
NVIDA Co. Ltd., available at http://www.nvidia.com
AMD(Advanced Micro Devices) Inc., available at http://www.amd.com
NVIDIA's Next Generation CUDA Compute Architecture: Fermi, available at http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," In Proceedings of International Symposium on Microarchitecture, pp. 407-420, December 2007.
J. E. Thornton, "Parallel operation in the control data 6600," In AFIPS Proceedings of FJCC, Part. 2, Vol. 26, pp. 33-40, 1964. DOI: 10.1109/MICRO.2007.12
M. Lee, S. Song, J. Moon, J. Kim, W. Seo, Y. Cho, and S. Ryu, "Improving GPGPU Resource Utilization Through Alternative Thread Block Scheduling," In Proceedings of the International Symposium on High Performance Computer Architecture, pp. 260-271, June 2014. DOI: 10.1109/HPCA.2014.6835937
K. M. Abdalla, L. V. Shah, J. F. Duluk, T. J. Purcell, T. Mandal, and G. Hirota, "Scheduling and Execution of Compute Tasks," US Patent US20130185725, 2013.
H. Choi, D. Son, J. Kim, and C. Kim, “Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization,” Journal of SuperComputing, Vol. 69, No. 1, pp. 330-356, July 2014. DOI: 10.1007/s11227-014-1155-4
D. Son, J. Kim, and C. Kim, “An IPC-based Dynamic Cooperative Thread Array Scheduling Scheme for GPUs,” Journal of The Korea Society of Computer and Information, Vol. 21, No. 2, pp. 9-16, February 2016. DOI: 10.9708/jksci.2016.21.2.009
G. Kim, J. Kim, and C. Kim, “Latency Hiding based Warp Scheduling Policy for High Performance GPUs,” Journal of The Korea Society of Computer and Information, Vol. 24, No. 4, pp. 1-9, April 2019. DOI: 10.9708/jksci.2019.24.04.001
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of International Symposium on Performance Analysis of Systems and Software, pp. 163-174, April 2009. DOI: 10.1109/ISPASS.2009.4919648
S. Li, J. H Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," In Proceedings of the International Symposium on Microarchitecture, pp. 469-480, January 2009. DOI: 10.1145/1669112.1669172
J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling Energy Optimizations in GPGPUs," In Proceedings of the International Symposium Computer Architecture, pp. 487-498, June 2013. DOI: 10.1145/2485922.2485964
GTX480 NVIDIA, available at http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-480
M. Abdel-Majeed, D. Wong, and M. Annavaram, "Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs," In Proceedings of International Symposium on Microarchitecture, pp. 111-122, December 2013. DOI: 10.1145/2540708.2540719

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)