DOI QR코드

DOI QR Code

A Study on Machine Learning Compiler and Modulo Scheduler

머신러닝 컴파일러와 모듈로 스케쥴러에 관한 연구

  • Doosan Cho (Dept. of Electronics Engineering, Sunchon National Univ.)
  • 조두산 (전자공학과, 국립순천대학교)
  • Received : 2024.01.09
  • Accepted : 2024.01.23
  • Published : 2024.02.28

Abstract

This study is on modulo scheduling algorithms for multicore processor in machine learning applications. Machine learning algorithms are designed to perform a large amount of operations such as vectors and matrices in order to quickly process large amounts of data stream. To support such large amounts of computations, processor architectures to support applications such as artificial intelligence, neural networks, and machine learning are designed in the form of parallel processing such as multicore. To effectively utilize these multi-core hardware resources, various compiler techniques are being used and studied. In this study, among these compiler techniques, we analyzed the modular scheduler, which is especially important in one core's computation pipeline. This paper looked at and compared the iterative modular scheduler and the swing modular scheduler, which are the most widely used and studied. As a result, both schedulers provided similar performance results, and when measuring register pressure as an indicator, it was confirmed that the swing modulo scheduler provided slightly better performance. In this study, a technique that divides recurrence edge is proposed to improve the minimum initiation interval of the modulo schedulers.

Keywords

References

  1. Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen, "Cambricon: an instruction set architecture for neural networks," SIGARCH Comput. Archit. News 44, 3, 393-405, (2016).  https://doi.org/10.1145/3007787.3001179
  2. ARM DynamiQ data sheet, [online] https://www.arm.com/technologies/dynamiq, (2018). 
  3. Ramakrishna Rau., "Iterative modulo scheduling: an algorithm for software pipelining loops," In Proceedings of the 27th annual international symposium on Microarchitecture (MICRO 27), 63-74., (1994). 
  4. J. Llosa, A. Gonzalez, E. Ayguade and M. Valero, "Swing module scheduling: a lifetime-sensitive approach," Proceedings of the Conference on Parallel Architectures and Compilation Technique, pp. 80-86, (1996) 
  5. Huang, J.C. and Tan Siek Leng. "Generalized loop-unrolling: a method for program speedup." Proceedings 1999 IEEE Symposium on Application-Specific Systems and Software Engineering and Technology. ASSET'99, 244-248, (1999). 
  6. G. J. Chaitin, "Register allocation & spilling via graph coloring," In Proceedings of the 1982 SIGPLAN symposium on Compiler construction (SIGPLAN '82). pp. 98-105, (1982) 
  7. R.A. Huff, "Lifetime-sensitive modulo scheduling," In Proc. of the ACM SIGPLAN'93 Confer ence on Programming Language, Design and Implementation, pages 258-267, (1993). 
  8. Josep M. Codina, Josep Llosa, and Antonio Gonzalez, "A comparative study of modulo scheduling techniques," In Proceedings of the 16th international conference on Supercomputing (ICS '02),, 97-106, (2002). 
  9. Freescale semiconductor, SC140 DSP core reference manual, [online] https://www.nxp.com/docs/en/reference-manual/MNSC140CORE.pdf. (2005).