References
- W. Jia, K. Shaw and M. Martonosi, "MRPB: Memory Request Prioritization for Massively Parallel Processors," in the IEEE International Symposium on High Performance Computer Architecture (HPCA), pp272-283, 2014.
- NVIDIA, "Whitepaper: NVIDIA's Next Generation CUDA Compute and Graphics Architecture: Fermi," 2009.
- Y. Torres and A. Escribano, "Understanding the Impact of CUDA Tuning Techniques for Fermi," In High Performance Computing and Simulation (HPCS), pp. 631-639, 2011.
- A. Jog, O. Kayiran, N. Nachiappan, A. Mishra, M. Kandermir, O. Mutlu, R. Iyer and C. Das, "OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance," in the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 395-406, 2013.
- S. Lee, A. Arunkumar and C. Wu, "CAWA: Coordinated Warp Scheduling and Cache Prioritization for Critical Warp Acceleration of GPGPU Workloads," in the International Symposium on Computer Architecture (ISCA), pp. 515-527, 2015.
- M. Lee, G. Kim, J. Kim, W. Seo, Y. Cho and S. Ryu., "iPAWS: Instruction-Issue Pattern-based Adaptive Warp Scheduling for GPGPUs," in the IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 370-381, 2016.
- V. Narasiman, M. Shebanow, C. Lee, R. Miftakhutdinov, O. Mutlu and Y. Patt, "Improving GPU Performance via Large Warps and Two-Level Warp Scheduling," in the IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 308-317, 2014.
- M. Gebhart, R. Johnson, D. Tarjan, S. Keckler, W. Dally, E. Lindoholm and K. Skadron, "Energy-efficient Mechanisms for Managing Thread Context in Throughput Processors," in the International Symposium on Computer Architecture (ISCA), pp. 235-246, 2011.
- W. Fung, I. Sham, G. Yuan and T. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," in the IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 407-420, 2007.
- M. Qureshi, A. Jaleel, Y. Patt, S. Steely and J. Emer, "Adaptive Insertion Policies for High Performance Caching," in the International Symposium on Computer Architecture (ISCA), pp. 381-391, 2007.
- C. T. Do, H. J. Choi, J. M. Kim and C. H. Kim, "A New Cache Replacement Algorithm for Last-Level Caches by Exploiting Tag-Distance Correlation of Cache Lines," in Microprocessors and Microsystems, 39(4), pp. 286-295, 2015. https://doi.org/10.1016/j.micpro.2015.05.005
- A. S. Leon, B. Langley and J. L. Shin, "The UltraSPARC T1 Processor: CMT Reliability," In Custom Integrated Circuits Conference, pp. 555-562, 2006.
- T. Rogers, M. O'Connor and T. Aamodt, "Cache-consciou s Wavefront Scheduling," in the IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 72-83, 2012.
- NVIDIA, "NVIDA Tegra Multiprocessor Architecture," 2010.
- Y. Wu, R. Rakvic, L. Chen, C. Miao, G. Chrysos and J. Fang, "Compiler Managed Micro-cache Bypassing for High Performance EPIC Processors," in the IEEE/ACM International Symposium on Microarchitect ure (MICRO), pp. 134-145, 2002.
- T. L. Johnson and W.-M. W. Hwu, "Run-time Adaptive Cache Hierarchy Management via Reference Analysis," in the International Symposium on Computer Architecture (ISCA), pp. 315-326, 1997.
- M. Kharbutli and D. Solihin, "Counter-based Cache Replacement and Bypassing Algorithms," in IEEE Transactions on Computers, 57(4), pp. 433-447, 2008. https://doi.org/10.1109/TC.2007.70816
- A. Bakhola, G. Yuan, W. Fung, H. Wong and T. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," in the International Symposium on Analysis of Systems and Software (ISPASS), pp. 163-174, 2009.
- H. Liu, M. Ferdman, J. Huh, and D. Burger, "Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency," in the IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 222-233, 2008.
- D. Kirk and W. Hwu, "Programming Massively Parallel Processors," 2010.
- C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely Jr. and J. Emer, "SHiP: Signature-based Hit Predictor for High Performance Caching," in the IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 430-441, 2011.
- NVIDA, CUDA SDK http://developer.nvidia.com/gpu-computing-sdk.
- X. Chen, L. Chang, C. Rodrigues, J. Lv, Z. Wang, and W. Hwu, "Adaptive Cache Management for Energy-Efficient GPU Computing," in the IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 343-355, 2014.
- N. Duong, D. Zhao, T. Kim, R. Cammarota, M. Valero, and A. Veidenbaum, "Improving Cache Management Policies Using Dynamic Reuse Distances," in the IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 389-400, 2012.
- X. Xie, Y. Liang, Y. Wang, G. Sun and T. Wang, "Coordinated Static and Dynamic Cache Bypassing for GPUs," in the IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 76-88, 2015.
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee and K. Skadron, "Rodinia: A Benchmark Suite for Heterogeneous Computing," in the IEEE International Symposiumon Workload Characterization, (IISWC), pp. 44-54, 2009.
- S. Hong and H. Kim, "An Integrated GPU Power and Performance Model," in the International Symposium on Computer Architecture (ISCA), pp. 280-289, 2010.
- J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. Kim, T. Aamodt and V. Reddi, "GPUWattch: Enabling Energy Optimizations in GPGPUs," in the International Symposium on Computer Architecture (ISCA), pp. 487-498, 2013.