References
- Michael Swaine, "New Chip from Intel Gives High-Quality Displays", March 14, 1983, p.16
- NVIDIA, GeForce 256, http://www.nvidia.com/page/geforce256.html
- NVIDIA, CUDA, http://www.nvidia.com/cuda
- Khronos Group, OpenCL, http://www.khronos.org/opencl/
- NVIDIA, "NVIDIA Kepler GK110 Architecture Whitepaper." (2012).
- NVIDIA, "TechBrief Dynamic Parallelism in CUDA." (2012).
- Harris. "UNIFIED MEMORY IN CUDA 6." NVIDIA. 18 Nov. 2013. (Online) http://devblogs.nvidia.com/parallelforall /unified-memory-in-cuda-6/
- Jablin, Thomas B., et al. "Automatic CPU-GPU communication management and optimization." ACM SIGPLAN Notices 46.6 (2011): 142-151.
- Guihot, Herve. "RenderScript." Pro Android Apps Performance Optimization. Apress, 2012. 231-263.
- OpenACC, http://www.openacc-standard.org/
- Gregory, K. Overview and C++ AMP approach. Technical report. Microsoft, Providence, 2011.
- Klockner, Andreas, et al. "PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation." Parallel Computing 38.3 (2012): 157-174. https://doi.org/10.1016/j.parco.2011.09.001
- Han, Tianyi David, and Tarek S. Abdelrahman. "hiCUDA: High-level GPGPU programming." Parallel and Distributed Systems, IEEE Transactions on 22.1 (2011): 78-90. https://doi.org/10.1109/TPDS.2010.62
- Yan, Yonghong, Max Grossman, and Vivek Sarkar. "JCUDA: A programmer-friendly interface for accelerating Java programs with CUDA." Euro-Par 2009 Parallel Processing. Springer Berlin Heidelberg, 2009. 887-899.p
- Stratton, John A., Sam S. Stone, and W. Hwu Wen-mei. "MCUDA: An efficient implementation of CUDA kernels for multicore CPUs." Languages and Compilers for Parallel Computing. Springer Berlin Heidelberg, 2008. 16-30.
- Hong, Chuntao, et al. "MapCG: writing parallel program portable between CPU and GPU." Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 2010.
- Lee, Seyong, Seung-Jai Min, and Rudolf Eigenmann. "OpenMP to GPGPU: a compiler framework for automatic translation and optimization." ACM Sigplan Notices 44.4 (2009): 101-110. https://doi.org/10.1145/1594835.1504194
- Ohshima, Satoshi, Shoichi Hirasawa, and Hiroki Honda. "OMPCUDA: OpenMP execution framework for CUDA based on omni OpenMP compiler." Beyond loop level parallelism in OpenMP: accelerators, tasking and more. Springer Berlin Heidelberg, 2010. 161-173.
- Jacob, Ferosh, et al. "CUDACL: A tool for CUDA and OpenCL programmers." High Performance Computing (HiPC), 2010 International Conference on. IEEE, 2010.
- Bell, Nathan, and Jared Hoberock. "Thrust: A productivity-oriented library for CUDA." GPU Computing Gems 7 (2011).
- Owens, John D., et al. "A Survey of General-Purpose Computation on Graphics Hardware." Computer graphics forum. Vol. 26. No. 1. Blackwell Publishing Ltd, 2007.
- Williams, Lance. "Pyramidal parametrics." Acm siggraph computer graphics. Vol. 17. No. 3. ACM, 1983.
- Hensley, Justin, et al. "Fast Summed-Area Table Generation and its Applications." Computer Graphics Forum. Vol. 24. No. 3. Blackwell Publishing, Inc, 2005.
- Harris, Mark, Shubhabrata Sengupta, and John D. Owens. "Parallel prefix sum (scan) with CUDA." GPU gems 3.39 (2007): 851-876.
- Batcher, Kenneth E. "Sorting networks and their applications." Proceedings of the April 30-May 2, 1968, spring joint computer conference. ACM, 1968.
- Purcell, Timothy J., et al. "Photon mapping on programmable graphics hardware." Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware. Eurographics Association, 2003.