AVX-512를 활용한 인텔 차세대 프로세서에서의 효과적인 프로그래밍 방법

  • Published : 2018.01.31

Abstract

Keywords

References

  1. Goto, K., van de Geijn, R.A. "Anatomy of high-performance matrix multiplication", ACM Transactions on Mathematical Software (TOMS) 34(3), 12 (2008) https://doi.org/10.1145/1356052.1356053
  2. Gunnels, J.A., Henry, G.M., Van De Geijn, R.A. "A family of highperformance matrix multiplication algorithms.", In: International Conference on Computational Science, pp. 51-60. Springer (2001)
  3. Heinecke, A., Vaidyanathan, K., Smelyanskiy, M., Kobotov, A., Dubtsov, R., Henry, G., Shet, A.G., Chrysos, G., Dubey, P. "Design and implementation of the linpack benchmark for single and multi-node systems based on Intel Xeon Phi Coprocessor" In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pp.126-137. IEEE (2013)
  4. "Intel Intrinsics Guide." Software.intel.com. (2018). [online] Available at: https://software.intel.com/sites/landingpage/IntrinsicsGuide/ [Accessed 22 Mar. 2018].
  5. Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann (2016)
  6. Lim, R., Lee, Y., Kim, R., Choi, J. "An Implementation of matrix-matrix multiplication on the Intel KNL processor with AVX-512." In: Cluster Computing (Submitted)
  7. Peyton, J.L. "Programming dense linear algebra kernels on vectorized architectures." Master's thesis, The University of Tennessee, Knoxville (2013)
  8. Van Zee, F. G., van de Geijn, R. A. "BLIS: A Framework for Rapidly Instantiating BLAS Functionality" In: ACM Trans. Math. Softw., 41(3), pp.1-33. ACM (2015)
  9. Xianyi, Z., Qian, W., Yunquan, Z. "Model-driven level 3 BLAS performance optimization on Loongson 3A processor" In: Parallel and Distributed Systems, 2012 IEEE 18th International Conference, pp. 684-691. IEEE (2012)