DOI QR코드

DOI QR Code

Counter-Based Approaches for Efficient WCET Analysis of Multicore Processors with Shared Caches

  • Ding, Yiqiang (Department of Electrical and Computer Engineering, Virginia Commonwealth University) ;
  • Zhang, Wei (Department of Electrical and Computer Engineering, Virginia Commonwealth University)
  • Received : 2013.08.21
  • Accepted : 2013.10.28
  • Published : 2013.12.30

Abstract

To enable hard real-time systems to take advantage of multicore processors, it is crucial to obtain the worst-case execution time (WCET) for programs running on multicore processors. However, this is challenging and complicated due to the inter-thread interferences from the shared resources in a multicore processor. Recent research used the combined cache conflict graph (CCCG) to model and compute the worst-case inter-thread interferences on a shared L2 cache in a multicore processor, which is called the CCCG-based approach in this paper. Although it can compute the WCET safely and accurately, its computational complexity is exponential and prohibitive for a large number of cores. In this paper, we propose three counter-based approaches to significantly reduce the complexity of the multicore WCET analysis, while achieving absolute safety with tightness close to the CCCG-based approach. The basic counter-based approach simply counts the worst-case number of cache line blocks mapped to a cache set of a shared L2 cache from all the concurrent threads, and compares it with the associativity of the cache set to compute the worst-case cache behavior. The enhanced counter-based approach uses techniques to enhance the accuracy of calculating the counters. The hybrid counter-based approach combines the enhanced counter-based approach and the CCCG-based approach to further improve the tightness of analysis without significantly increasing the complexity. Our experiments on a 4-core processor indicate that the enhanced counter-based approach overestimates the WCET by 14% on average compared to the CCCG-based approach, while its averaged running time is less than 1/380 that of the CCCG-based approach. The hybrid approach reduces the overestimation to only 2.65%, while its running time is less than 1/150 that of the CCCG-based approach on average.

Keywords

References

  1. J. Yan and W. Zhang, "WCET analysis for multi-core processors with shared instruction caches," in Proceedings of 14th IEEE Real-Time and Embedded Technology and Applications Symposium, St. Louis, MO, 2008, pp. 80-89.
  2. J. Yan and W. Zhang, "Bounding worst-case performance for multi-core processors with shared L2 instruction caches," Journal of Computing Science and Engineering, vol. 5, no. 1, pp. 1-18, 2011. https://doi.org/10.5626/JCSE.2011.5.1.001
  3. W. Zhang and J. Yan, "Accurately estimating worst-case execution time for multicore processors with shared directmapped instruction caches," in Proceedings of the 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, Beijing, China, 2009, pp. 455-463.
  4. Y. Li, V. Suhendra, Y. Liang, T. Mitra, and A. Roychoudhury, "Timing analysis of concurrent programs running on shared cache multi-cores," in Proceedings of the 30th IEEE International Real-Time System Symposium, Washington, DC, 2009, pp. 57-67.
  5. M. Lv, W. Yi, N. Guan, and G. Yu, "Combining Abstract interpretation with model checking for timing analysis of multicore software," in Proceedings of the 31st IEEE International Real-Time System Symposium, San Diego, CA, 2010, pp. 339-349.
  6. W. Zhang and J. Yan, "A unified timing analysis approach for shared caches of multicores," in Proceedings of the 17th IEEE Real-Time and Embedded Technology and Applications Symposium (Work-in-Progress session), Chicago, IL, 2011.
  7. W. Zhang and J. Yan, "Static timing analysis of shared caches for multicore processors," Journal of Computing Science and Engineering, vol. 6, no. 4, pp. 267-278, 2012. https://doi.org/10.5626/JCSE.2012.6.4.267
  8. H. Theiling and C. Ferdinand, Combining abstract interpretation and ILP for microarchitecture modelling and program path analysis," in Proceedings of 19th IEEE Real-Time Systems Symposium, Madrid, Spain, 1998, pp. 144-153.
  9. J. Gustafsson, A. Ermedahl, C. Sandberg, and B. Lisper, "Automatic derivation of loop bounds and infeasible paths for WCET analysis using abstract execution," in Proceedings of 27th IEEE International Real-Time Systems Symposium, Rio de Janeiro, Brazil, 2006, pp. 57-66.
  10. M. Paolieri, E. Quinones, F. J. Cazorla, G. Bernat, and M. Valero, "Hardware support for WCET analysis of hard realtime multicore systems," in Proceedings of the 36th Annual International Symposium on Computer Architecture, Austin, TX, 2009, pp. 57-68.
  11. Y. S. Li, S. Malik, and A. Wolfe, "Performance estimation of embedded software with instruction cache modeling," ACM Transactions on Design Automation of Electronic Systems, vol. 4, no. 3, pp. 257-279, 1999. https://doi.org/10.1145/315773.315778
  12. SimpleScalar, http://www.simplescalar.com/.
  13. R. Arnold, F. Mueller, D. Whalley, and M. Harmon, "Bounding worst-case instruction cache performance," in Proceedings of 15th IEEE International Real-Time Systems Symposium, San Juan, Puerto Rico, 1994, pp. 172-181.
  14. C. A. Healy, R. D. Arnold, F. Mueller, D. Whalley, and M. G. Harmon, "Bounding pipeline and instruction cache performance," IEEE Transactions on Computers, vol. 48, no. 1, pp. 53-70, 1999. https://doi.org/10.1109/12.743411
  15. CPLEX, http://www.ilog.com/products/cplex/.
  16. J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper, "The Malardalen WCET benchmarks: past, present and future," in Proceedings of the 10th International Workshop on Worst-Case Execution Time Analysis, Brussels, Belgium, 2010, pp. 137-147.