매니코어 GPU 구조의 성능 저하 요소 분석과 최신 연구 동향

O, Yun-Ho;Yun, Myeong-Guk;Park, Jong-Hyeon;No, Won-U;

Communications of the Korean Institute of Information Scientists and Engineers (정보과학회지)

Volume 32 Issue 5
/
Pages.22-33
/
2014
/
1229-6821(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

매니코어 GPU 구조의 성능 저하 요소 분석과 최신 연구 동향

오윤호 (연세대학교) ;
윤명국 (연세대학교) ;
박종현 (연세대학교) ;
노원우 (연세대학교)

Published : 2014.05.16

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Keywords

References

NVIDIA Corporation. CUDA Programming Guide, V5.5
Sangpil Lee, Won Woo Ro, “Parallel GPU architecture simulation framework exploiting work allocation unit parallelism,” Performance Analysis of Systems and Software(ISPASS), 2013 IEEE International Symposium on , Vol., No., pp.107,117, 21-23 April 2013
Fung W.W.L., Sham I., Yuan G. and Aamodt T.M., “Dynamic warp formation and scheduling for efficient GPU control flow,” in Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, 2007.
Fung W.W.L. and Aamodt T.M., “Thread block compaction for efficient SIMT control flow,” in International Symposium on High Performance Computer Architecture, 2011.
Lashgar A., and Baniasadi A., “Performance in GPU architectures: potentials and distances:” in 9th Annual Workshop on Duplicating, Deconstructing, and Debunking, 2011.
Lindholm E., Nickolls J., Oberman S. and Montrym J., “NVIDIA Tesla: A unified graphics and computing architecture,” IEEE micro, Vol. 28, No. 2, pp. 39-55, 2008. https://doi.org/10.1109/MM.2008.31
Rhu M. and Erez M., “Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation,” in Proceedings of the 40th Annual International Symposium on Computer Architecture, New York, NY, USA, 2013.
Vaidya A.S., Shayesteh A., Woo D.H., Saharoy R. and Azimi M., “SIMD divergence optimization through intra-warp Compaction,” in Proceedings of the 40th Annual International Symposium on Computer Architecture, New York, NY, USA, 2013.
Rhu, M., Sullivan M., Leng J. and Erez M., “A localityaware memory hierarchy for energy-efficient GPU architectures,” Proceedings of the 46th Annual IEEE/ ACM International Symposium on Microarchitecture. ACM, 2013.
Jia W., Kelly A.S., and Margaret M., “Characterizing and improving the use of demand-fetched caches in GPUs,” Proceedings of the 26th ACM international conference on Supercomputing. ACM, 2012.
Rogers T.G., Mike O. and Aamodt T.M., “Cache-conscious wavefront scheduling,” Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2012.
Rogers T.G., Mike O. and Aamodt T.M., “Divergenceaware warp scheduling,” Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2013.
Jog A., Kayiran O., Mishra A.K., Kandemir M.T., Mutlu O., Iyer R. and Das C.R.., “Orchestrated scheduling and prefetching for gpgpus,” Proceedings of the 40th Annual International Symposium on Computer Architecture. ACM, 2013.
Naifeng Jing, Yao Shen, Yao Lu, Shrikanth Ganapathy, Zhigang Mao, Minyi Guo, Ramon Canal, and Xiaoyao Liang. 2013. An energy-efficient and scalable eDRAMbased register file architecture for GPGPU. SIGARCH Comput. Archit. News 41, 3(June 2013), 344-355. https://doi.org/10.1145/2508148.2485952
Syed Zohaib Gilani, Nam Sung Kim, and Michael J. Schulte. 2013. Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO- 46).
Mohammad Abdel-Majeed, Daniel Wong, and Murali Annavaram. 2013. Warped gates: gating aware scheduling and power gating for GPGPUs. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO-46)
Gilani, S.Z.; Nam Sung Kim; Schulte, M.J., “Powerefficient computing for compute-intensive GPGPU applications,” High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on , Vol., No., pp.330,341, 23-27 Feb. 2013
Goswami, Nilanjan; Cao, Bingyi; Li, Tao, “Powerperformance co-optimization of throughput core architecture using resistive memory,” High Performance Computer Architecture(HPCA2013), 2013 IEEE 19th International Symposium on, Vol., No., pp.342,353, 23-27 Feb. 2013
Rhu, Minsoo, Erez, Mattan, “The dual-path execution model for efficient GPU control flow,” High Performance Computer Architecture(HPCA2013), 2013 IEEE 19th International Symposium on, Vol., No., pp.591, 602, 23-27 Feb. 2013
Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. “GPUWattch: enabling energy optimizations in GPGPUs,” SIGARCH Comput. Archit. News 41, 3(June 2013), 487-498. https://doi.org/10.1145/2508148.2485964
Onur Kayıran, Adwait Jog, Mahmut Taylan Kandemir, and Chita Ranjan Das. 2013. “Neither more nor less: optimizing thread-level parallelism for GPGPUs,” In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques(PACT '13).
Adwait Jog, Onur Kayiran, Nachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das. 2013. “OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance,” In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems(ASPLOS '13)

Communications of the Korean Institute of Information Scientists and Engineers (정보과학회지)

매니코어 GPU 구조의 성능 저하 요소 분석과 최신 연구 동향

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)