DOI QR코드

DOI QR Code

Bounding Worst-Case Performance for Multi-Core Processors with Shared L2 Instruction Caches

  • Yan, Jun (Mathworks) ;
  • Zhang, Wei (Department of Electrical and Computer Engineering Virginia Commonwealth University)
  • Received : 2011.01.05
  • Accepted : 2011.02.23
  • Published : 2011.03.31

Abstract

As the first step toward real-time multi-core computing, this paper presents a novel approach to bounding the worst-case performance for threads running on multi-core processors with shared L2 instruction caches. The idea of our approach is to compute the worst-case instruction access interferences between different threads based on the program control flow information of each thread, which can be statically analyzed. Our experiments indicate that the proposed approach can reasonably estimate the worst-case shared L2 instruction cache misses by considering the inter-thread instruction conflicts. Also, the worst-case execution time (WCET) of applications running on multi-core processors estimated by our approach is much better than the estimation by simply assuming all L2 instruction accesses are misses.

Keywords

References

  1. ANDERSON, J. H., CALANDRINO, J. M., AND DEVI, U. C. 2006. Real-time scheduling on multicore platforms. In Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium, 179-190. https://doi.org/10.1109/RTAS.2006.35
  2. BERG, C., ENGBLOM, J., AND WILHELM, R. 2004. Requirements for and design of a processor with predictable timing. In Proceedings of the Dagstuhl Perspectives Workshop on Design of Systems with Predictable Behavior.
  3. CALANDRINO, J. M., ANDERSON, J. H., AND BAUMBERGER, D. P. 2007. A hybrid real-time scheduling approach for large-scale multicore platforms. In Proceedings of the 19th Euromicro Conference on Real-Time Systems (ECRTS'07), 247-258. https://doi.org/10.1109/ECRTS.2007.81
  4. CALANDRINO, J. M., BAUMBERGER, D., TONG, L., HAHN, S., AND ANDERSON, J. H. 2007. Soft real-time scheduling on performance asymmetric multi-core platforms. In Proceedings of the 13th IEEE Real Time and Embedded Technology and Applications Symposium, 101-112. https://doi.org/10.1109/RTAS.2007.35
  5. CHANG, J. AND SOHI, G. S. 2006. Cooperative cache partitioning for chip multiprocessors. In Proceedings of the 33rd Annual International Symposium on Computer Architecture, 242-252.
  6. Chronos: a timing analyzer for embedded software. http://www.comp.nus.edu.sg/-rpembed/chronos/.
  7. FERDINAND, C. AND WILHELM, R. 1999. Efficient and precise cache behavior Prediction for real-time systems. Real-Time Systems 17, 2-3, 131-181. https://doi.org/10.1023/A:1008186323068
  8. HARDY, D. AND PUAUT, I. 2008. WCET analysis of multi-level non-inclusive set-associative instruction caches. In Proceedings of the 29th IEEE Real-Time Systems Symposium, 456-466. https://doi.org/10.1109/RTSS.2008.10
  9. HEALY, C. A., WHALLEY, D. B., AND HARMON, M. G. 1995. Integrating the timing analysis of pipelining and instruction caching. In Proceedings of the 16th IEEE Real-Time Systems Symposium, 288-297. https://doi.org/10.1109/REAL.1995.495218
  10. IBM. IBM ILOG CPLEX optimizer. http://www.ilog.com/products/cplex/.
  11. LI, Y.-T. S. AND MALIK, S. 1995. Performance analysis of embedded software using implicit path enumeration. In Proceedings of the 32nd Design Automation Conference, 456-461. https://doi.org/10.1109/DAC.1995.249991
  12. LI, Y. T. S., MALIK, S., AND WOLFE, A. 1996. Cache modeling and path analysis for real-time software: beyond direct mapped instruction caches. In Proceedings of the 17th IEEE Real-Time Systems Symposium, 254. https://doi.org/10.1109/REAL.1996.563722
  13. LIU, C., SIVASUBRAMANIAM, A., AND KANDEMIR, M. 2004. Organizing the last line of defense before hitting the memory wall for CMPs. In Proceedings of the 10th International Symposium on High Performance Computer Architecture, 176-185. https://doi.org/10.1109/HPCA.2004.10017
  14. LUNDQVIST, T. AND STENSTROM, P. 1999a. A method to improve the estimated worst-case performance of data caching. In Proceedings of the 6th International Conference on Real-Time Computing Systems and Applications. https://doi.org/10.1109/RTCSA.1999.811244
  15. LUNDQVIST, T. AND STENSTROM, P. 1999b. Timing anomalies in dynamically scheduled microprocessors. In Proceedings of the 20th IEEE Real-Time Systems Symposium, 12-21. https://doi.org/10.1109/REAL.1999.818824
  16. MALARDALEN REAL-TIME RESEARCH CENTER. Worst-Case Execution Time (WCET) Benchmarks. http://www.mrtc.mdh.se/projects/wcet/benchmarks.html.
  17. OTTOSSON, G. AND SJODIN, M. 1997. Worst-case execution time analysis for modern hardware architectures. In Proceedings of ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems.
  18. PUSCHNER, P. AND BURNS, A. 2000. Guest editorial: a review of worst-case execution-time analysis. Real-Time Systems 18, 2, 115-128. https://doi.org/10.1023/A:1008119029962
  19. RAMAPRASAD, H. AND MUELLER, F. 2005. Bounding worst-case data cache behavior by analytically deriving cache reference patterns. In Proceedings of the 11th IEEE Real-Time and Embedded Technology and Applications Symposium, 148-157. https://doi.org/10.1109/RTAS.2005.12
  20. RENAU, J. 2007. SESC: cycle accurate architectural simulator. http://sesc.sourceforge.net.
  21. ROSEN, J., ANDREI, A., ELES, P., AND PENG, Z. 2007. Bus access optimization for predictable implementation of real-time applications on multiprocessor systems-on-chip. In Proceedings of the 28th IEEE International Real-Time Systems Symposium, 49-60. https://doi.org/10.1109/RTSS.2007.24
  22. STAPPERT, F., ERMEDAHL, A., AND ENGBLOM, J. 2001. Efficient longest execution path search for programs with complex flows and pipeline effects. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Atlanta, GA.
  23. STASCHULAT, J. AND ERNST, R. 2006. Worst case timing analysis of input dependent data cache behavior. In Proceedings of the 18th Euromicro Conference on Real-Time Systems, Dresden, Germany, 227-236.
  24. STOHR, J., VON B LOW, A., AND F RBER, G. 2005. Bounding worst-case access times in modern multiprocessor systems. In Proceedings of the 17th Euromicro Conference on Real-Time Systems, Palma de Mallorca, Balearic Islands, 189-198. https://doi.org/10.1109/ECRTS.2005.10
  25. TIAN, T. AND SHIH, C. P. 2011. Software techniques for shared-cache multi-core systems. http://software.intel.com/en-us/articles/software-techniques-for-shared-cache-multi-core-systems/.
  26. WHITE, R. T., MUELLER, F., HEALY, C. A., WHALLEY, D. B., AND HARMON, M. G. 1997. Timing analysis for data caches and set-associative caches. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium, Montreal, QC, Canada, 192-202. https://doi.org/10.1109/RTTAS.1997.601358
  27. WILHELM, R., ENGBLOM, J., ERMEDAHL, A., HOLSTI, N., THESING, S., WHALLEY, D., BERNAT, G., FERDINAND, C., HECKMANN, R., MITRA, T., MUELLER, F., PUAUT, I., PUSCHNER, P., STASCHULAT, J., AND STENSTR M, P. 2008. The worst-case executiontime problem-overview of methods and survey of tools. Transactions on Embedded Computing Systems 7, 3, 36. https://doi.org/10.1145/1347375.1347389

Cited by

  1. A Technique for Fast Process Creation Based on Creation Location vol.5, pp.4, 2011, https://doi.org/10.5626/JCSE.2011.5.4.283
  2. Demand-based schedulability analysis for real-time multi-core scheduling vol.89, 2014, https://doi.org/10.1016/j.jss.2013.09.029
  3. Counter-Based Approaches for Efficient WCET Analysis of Multicore Processors with Shared Caches vol.7, pp.4, 2013, https://doi.org/10.5626/JCSE.2013.7.4.285