매니코어 GPU 구조의 성능 저하 요소 분석과 최신 연구 동향

  • Published : 2014.05.16

Abstract

Keywords

References

  1. NVIDIA Corporation. CUDA Programming Guide, V5.5
  2. Sangpil Lee, Won Woo Ro, “Parallel GPU architecture simulation framework exploiting work allocation unit parallelism,” Performance Analysis of Systems and Software(ISPASS), 2013 IEEE International Symposium on , Vol., No., pp.107,117, 21-23 April 2013
  3. Fung W.W.L., Sham I., Yuan G. and Aamodt T.M., “Dynamic warp formation and scheduling for efficient GPU control flow,” in Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2007.
  4. Fung W.W.L. and Aamodt T.M., “Thread block compaction for efficient SIMT control flow,” in International Symposium on High Performance Computer Architecture, 2011.
  5. Lashgar A., and Baniasadi A., “Performance in GPU architectures: potentials and distances:” in 9th Annual Workshop on Duplicating, Deconstructing, and Debunking, 2011.
  6. Lindholm E., Nickolls J., Oberman S. and Montrym J., “NVIDIA Tesla: A unified graphics and computing architecture,” IEEE micro, Vol. 28, No. 2, pp. 39-55, 2008. https://doi.org/10.1109/MM.2008.31
  7. Rhu M. and Erez M., “Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation,” in Proceedings of the 40th Annual International Symposium on Computer Architecture, New York, NY, USA, 2013.
  8. Vaidya A.S., Shayesteh A., Woo D.H., Saharoy R. and Azimi M., “SIMD divergence optimization through intra-warp Compaction,” in Proceedings of the 40th Annual International Symposium on Computer Architecture, New York, NY, USA, 2013.
  9. Rhu, M., Sullivan M., Leng J. and Erez M., “A localityaware memory hierarchy for energy-efficient GPU architectures,” Proceedings of the 46th Annual IEEE/ ACM International Symposium on Microarchitecture. ACM, 2013.
  10. Jia W., Kelly A.S., and Margaret M., “Characterizing and improving the use of demand-fetched caches in GPUs,” Proceedings of the 26th ACM international conference on Supercomputing. ACM, 2012.
  11. Rogers T.G., Mike O. and Aamodt T.M., “Cache-conscious wavefront scheduling,” Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2012.
  12. Rogers T.G., Mike O. and Aamodt T.M., “Divergenceaware warp scheduling,” Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2013.
  13. Jog A., Kayiran O., Mishra A.K., Kandemir M.T., Mutlu O., Iyer R. and Das C.R.., “Orchestrated scheduling and prefetching for gpgpus,” Proceedings of the 40th Annual International Symposium on Computer Architecture. ACM, 2013.
  14. Naifeng Jing, Yao Shen, Yao Lu, Shrikanth Ganapathy, Zhigang Mao, Minyi Guo, Ramon Canal, and Xiaoyao Liang. 2013. An energy-efficient and scalable eDRAMbased register file architecture for GPGPU. SIGARCH Comput. Archit. News 41, 3(June 2013), 344-355. https://doi.org/10.1145/2508148.2485952
  15. Syed Zohaib Gilani, Nam Sung Kim, and Michael J. Schulte. 2013. Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO- 46).
  16. Mohammad Abdel-Majeed, Daniel Wong, and Murali Annavaram. 2013. Warped gates: gating aware scheduling and power gating for GPGPUs. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO-46)
  17. Gilani, S.Z.; Nam Sung Kim; Schulte, M.J., “Powerefficient computing for compute-intensive GPGPU applications,” High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on , Vol., No., pp.330,341, 23-27 Feb. 2013
  18. Goswami, Nilanjan; Cao, Bingyi; Li, Tao, “Powerperformance co-optimization of throughput core architecture using resistive memory,” High Performance Computer Architecture(HPCA2013), 2013 IEEE 19th International Symposium on, Vol., No., pp.342,353, 23-27 Feb. 2013
  19. Rhu, Minsoo, Erez, Mattan, “The dual-path execution model for efficient GPU control flow,” High Performance Computer Architecture(HPCA2013), 2013 IEEE 19th International Symposium on, Vol., No., pp.591, 602, 23-27 Feb. 2013
  20. Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. “GPUWattch: enabling energy optimizations in GPGPUs,” SIGARCH Comput. Archit. News 41, 3(June 2013), 487-498. https://doi.org/10.1145/2508148.2485964
  21. Onur Kayıran, Adwait Jog, Mahmut Taylan Kandemir, and Chita Ranjan Das. 2013. “Neither more nor less: optimizing thread-level parallelism for GPGPUs,” In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques(PACT '13).
  22. Adwait Jog, Onur Kayiran, Nachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das. 2013. “OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance,” In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems(ASPLOS '13)