DOI QR코드

DOI QR Code

IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

  • 투고 : 2019.07.19
  • 심사 : 2019.12.02
  • 발행 : 2020.02.28

초록

최신 GPU는 GPGPU를 활용하여 범용 연산이 가능하다. 뿐만 아니라, GPU는 내장된 다수의 코어를 활용하여 강력한 연산 처리량을 제공한다. AES 알고리즘은 다수의 병렬 연산을 요구하지만 CPU 구조에서는 효율적인 병렬처리가 이뤄지지 않는다. 따라서, 본 논문에서는 강력한 병력 연산 자원을 활용하는 GPGPU 구조에서 AES 알고리즘을 수행함으로써 AES 알고리즘 처리시간을 줄여보았다. 하지만, GPGPU 구조는 AES 알고리즘 같은 암호알고리즘에 최적화되어 있지 않다. 그러므로 AES 알고리즘에 최적화될 수 있도록 재구성 가능한 GPGPU 구조를 제안하고자 한다. 제안된 기법은 SM의 개수를 동적으로 할당하는 IPC 기반 SM 동적 관리 기법이다. IPC 기반 SM 동적 관리 기법은 GPGPU 구조에서 동작하는 AES의 IPC를 실시간으로 반영하여 최적의 SM의 개수를 동적으로 할당한다. 실험 결과에 따르면 제안된 동적 SM 관리 기법은 기존의 GPGPU 구조와 비교하여 하드웨어 자원을 효과적으로 활용하여 성능을 크게 향상시켰다. 일반적인 GPGP 구조와 비교하여, 제안된 기법의 AES의 암호화/복호화는 평균 41.2%의 성능 향상을 보여준다.

Modern GPU can execute general purpose computation on the graphic processing unit, and provide high performance by exploiting many core on GPU. To run AES algorithm efficiently, parallel computational resources are required. However, computational resource of CPU architecture are not enough to cryptographic algorithm such as AES whereas GPU architecture has mass parallel computation resources. Therefore, this paper reduce the time to execute AES by employing parallel computational resource on GPGPU. Unfortunately, AES cannot utilize computational resource on GPGPU since it isn't suitable to GPGPU architecture. In this paper, IPC based dynamic SM management technique are proposed to efficiently execute AES on GPGPU. IPC based dynamic SM management can increase and decrease the number of active SMs by using IPC in run-time. According to simulation results, proposed technique improve the performance by increasing resource utilization compared to baseline GPGPU architecture. The results show that AES improve the performance by 41.2% on average.

키워드

참고문헌

  1. D. Luebke and G. Humphreys. "How GPUs work," Journal of Computer, Vol. 40, No. 2, pp. 96-100, February 2007. DOI: 10.1109/MC.2007.59
  2. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," In Proceedings of Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 777-786, August 2004. DOI: 10.1145/1186562.1015800
  3. CUDA Programming Guide Version 3.0, available at https://developer.nvidia.com/cuda-toolkit-30-downloads
  4. Khronos Group, OpenCL, available at http://www.khronos.org/opencl
  5. ATI Stream SDK, available at http://developer.amd.com/community/blog/2009/08/05/ati-stream-sdk-and-opencl
  6. General-purpose computation on graphics hardware, available at http://www.gpgpu.org
  7. J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” Computer Graphics Forum, Vol. 26, No. 1, pp. 21-51, March 2007. DOI: 10.1111/j.1467-8659.2007.01012.x
  8. Y. Yang, P. Xiang, M. Mantor, and H. Zhou, "CPU-assisted GPGPU on fused CPU-GPU architectures," In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, pp. 1-12, March 2012. DOI: 10.1109/HPCA.2012.6168948
  9. NVIDIA TITAN X, available at http://www.nvidia.co.kr/graphicscards/geforce/pascal/kr/titan-x-pascal
  10. X. Zhang, and K.K. Parhi, "High-speed VLSI architectures for the AES algorithm," In Proceedings of IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 957-967, August 2004. DOI: 10.1109/TVLSI.2004.832943
  11. NVIDA Co. Ltd., available at http://www.nvidia.com
  12. AMD(Advanced Micro Devices) Inc., available at http://www.amd.com
  13. NVIDIA's Next Generation CUDA Compute Architecture: Fermi, available at http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf
  14. W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," In Proceedings of International Symposium on Microarchitecture, pp. 407-420, December 2007.
  15. J. E. Thornton, "Parallel operation in the control data 6600," In AFIPS Proceedings of FJCC, Part. 2, Vol. 26, pp. 33-40, 1964. DOI: 10.1109/MICRO.2007.12
  16. M. Lee, S. Song, J. Moon, J. Kim, W. Seo, Y. Cho, and S. Ryu, "Improving GPGPU Resource Utilization Through Alternative Thread Block Scheduling," In Proceedings of the International Symposium on High Performance Computer Architecture, pp. 260-271, June 2014. DOI: 10.1109/HPCA.2014.6835937
  17. K. M. Abdalla, L. V. Shah, J. F. Duluk, T. J. Purcell, T. Mandal, and G. Hirota, "Scheduling and Execution of Compute Tasks," US Patent US20130185725, 2013.
  18. H. Choi, D. Son, J. Kim, and C. Kim, “Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization,” Journal of SuperComputing, Vol. 69, No. 1, pp. 330-356, July 2014. DOI: 10.1007/s11227-014-1155-4
  19. D. Son, J. Kim, and C. Kim, “An IPC-based Dynamic Cooperative Thread Array Scheduling Scheme for GPUs,” Journal of The Korea Society of Computer and Information, Vol. 21, No. 2, pp. 9-16, February 2016. DOI: 10.9708/jksci.2016.21.2.009
  20. G. Kim, J. Kim, and C. Kim, “Latency Hiding based Warp Scheduling Policy for High Performance GPUs,” Journal of The Korea Society of Computer and Information, Vol. 24, No. 4, pp. 1-9, April 2019. DOI: 10.9708/jksci.2019.24.04.001
  21. A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of International Symposium on Performance Analysis of Systems and Software, pp. 163-174, April 2009. DOI: 10.1109/ISPASS.2009.4919648
  22. S. Li, J. H Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," In Proceedings of the International Symposium on Microarchitecture, pp. 469-480, January 2009. DOI: 10.1145/1669112.1669172
  23. J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling Energy Optimizations in GPGPUs," In Proceedings of the International Symposium Computer Architecture, pp. 487-498, June 2013. DOI: 10.1145/2485922.2485964
  24. GTX480 NVIDIA, available at http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-480
  25. M. Abdel-Majeed, D. Wong, and M. Annavaram, "Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs," In Proceedings of International Symposium on Microarchitecture, pp. 111-122, December 2013. DOI: 10.1145/2540708.2540719