References
- Che, Shuai, et al. "A performance study of general-purpose applications on graphics processors using CUDA." Journal of parallel and distributed computing 68.10 (2008): 1370-1380. https://doi.org/10.1016/j.jpdc.2008.05.014
- Clarke, Edmund, et al. "Progress on the state explosion problem in model checking." Informatics. Springer, Berlin, Heidelberg, 2001.
- Fang, Jianbin, Ana Lucia Varbanescu, and Henk Sips. "A comprehensive performance comparison of CUDA and OpenCL." Parallel Processing (ICPP), 2011 International Conference on. IEEE, 2011.
- Farooqui, Naila, et al. "A framework for dynamically instrumenting GPU compute applications within GPU Ocelot." Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units. ACM, 2011.
- Kerr, Andrew, Gregory Diamos, and Sudhakar Yalamanchili. "A characterization and analysis of ptx kernels." Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. IEEE, 2009.
- Li, Peng, Guodong Li, and Ganesh Gopalakrishnan. "Parametric flows: automated behavior equivalencing for symbolic analysis of races in CUDA programs." Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 2012.
- Sanders, Jason, and Edward Kandrot. CUDA by Example: An Introduction to General-Purpose GPU Programming, Portable Documents. Addison-Wesley Professional, 2010.
- Yang, Yi, and Huiyang Zhou. "CUDA-NP: realizing nested thread-level parallelism in GPGPU applications." ACM SIGPLAN Notices. Vol. 49. No. 8. ACM, 2014.
- Yang, Zhiyi, Yating Zhu, and Yong Pu. "Parallel image processing based on CUDA." Computer Science and Software Engineering, 2008 International Conference on. Vol. 3. IEEE, 2008.
- Zheng, Mai, et al. "GRace: a low-overhead mechanism for detecting data races in GPU programs." ACM SIGPLAN Notices. Vol. 46. No. 8. ACM, 2011.
- Harris, Mark. "Optimizing cuda." SC07: High Performance Computing With CUDA, 2007.
- Ryoo, Shane, et al. "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA." Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. ACM, 2008.
- Ruetsch, Greg, and Paulius Micikevicius. "Optimizing matrix transpose in CUDA." Nvidia CUDA SDK Application Note 18 (2009).
- Iwai, Keisuke, Takakazu Kurokawa, and Naoki Nisikawa. "AES encryption implementation on CUDA GPU and its analysis." Networking and Computing (ICNC), 2010 First International Conference on. IEEE, 2010.
- S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. ''Eraser: A dynamic data race detector for multithreaded programs''. ACM Transactions on Computer Systems, 15(4):391-411, 1997. https://doi.org/10.1145/265924.265927
- Michael Boyer , Kevin Skadron , Westley Weimer "Automated Dynamic Analysis of CUDA Programs"
- A. Ghanbari, S. Benton, and L. Zhang, "Practical program repair via bytecode mutation," in International Symposium on Software Testing and Analysis. ACM, pp. 19-30, 2019.