DOI QR코드

DOI QR Code

Bounding Worst-Case DRAM Performance on Multicore Processors

  • Ding, Yiqiang (Department of Electrical and Computer Engineering, Virginia Commonwealth University) ;
  • Wu, Lan (Department of Electrical and Computer Engineering, Virginia Commonwealth University) ;
  • Zhang, Wei (Department of Electrical and Computer Engineering, Virginia Commonwealth University)
  • Received : 2013.02.12
  • Accepted : 2013.03.01
  • Published : 2013.03.30

Abstract

Bounding the worst-case DRAM performance for a real-time application is a challenging problem that is critical for computing worst-case execution time (WCET), especially for multicore processors, where the DRAM memory is usually shared by all of the cores. Typically, DRAM commands from consecutive DRAM accesses can be pipelined on DRAM devices according to the spatial locality of the data fetched by them. By considering the effect of DRAM command pipelining, we propose a basic approach to bounding the worst-case DRAM performance. An enhanced approach is proposed to reduce the overestimation from the invalid DRAM access sequences by checking the timing order of the co-running applications on a dual-core processor. Compared with the conservative approach, which assumes that no DRAM command pipelining exists, our experimental results show that the basic approach can bound the WCET more tightly, by 15.73% on average. The experimental results also indicate that the enhanced approach can further improve the tightness of WCET by 4.23% on average as compared to the basic approach.

Keywords

References

  1. R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Walley et al., "The worst-case execution time problem: overview of methods and survey of tools," ACM Transactions on Embedded Computing Systems, vol. 7, no. 3, article no. 36, 2008.
  2. C. A. Healy, D. B. Whalley, and M. G. Harmon, "Integrating the timing analysis of pipelining and instruction caching," in Proceedings the 16th IEEE Real-Time Systems Symposium, Pisa, Italy, 1995, pp. 288-297.
  3. F. Stappert, A. Ermedahl, and J. Engblom, "Efficient longest executable path search for programs with complex flows and pipeline effects," in Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Atlanta, GA, 2001, pp. 132-140.
  4. Y. T. S. Li and S. Malik. "Performance analysis of embedded software using implicit path enumeration," in Proceedings of the 32nd Annual ACM/IEEE Design Automation Conference, San Francisco, CA, 2005, pp. 456-461.
  5. Y. T. S. Li, S. Malik, and A. Wolfe, "Cache modeling for real-time software: beyond direct mapped instruction caches," in Proceedings of the 17th IEEE Real-Time Systems Symposium, Washington, DC, 1996, p. 254.
  6. G. Ottosson and M. Sjodin, "Worst-case execution time analysis for modern hardware architectures," in Proceedings of ACM/SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, Las Vegas, NV, 1997.
  7. T. Moscibroda and O. Mutlu, "Memory performance attacks: denial of memory service in multi-core systems," in Proceedings of the 16th USENIX Security Symposium, Boston, MA, 2007.
  8. J. H. Ahn, M. Erez, and W. J. Dally, "The design space of data-parallel memory systems," in Proceedings of the ACM/ IEEE Conference on Supercomputing, Tampa, FL, 2006, article no. 80.
  9. G. L. Yuan and T. M. Aamodt, "A hybrid analytical DRAM performance model," in Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, Austin, TX, 2009.
  10. I. Y. Bucher and D. A. Calahan, "Models of access delays in multiprocessor memories," IEEE Transactions on Parallel Distributed Systems, vol. 3, no. 3, pp. 270-280, 1992. https://doi.org/10.1109/71.139201
  11. H. Choi, J. Lee and W. Sung, "Memory access pattern-aware DRAM performance model for multi-core systems," in Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, 2011, pp. 66-75.
  12. R. Bourgade, C. Ballabriga, H. Casse, C. Rochange, and P. Sainrat, "Accurate analysis of memory latencies for WCET estimation," in Proceedings of the 16th International Conference on Real-Time and Network Systems, Rennes, France, 2008, pp. 161-170.
  13. B. Jacob, S. W. Ng, and D. T. Wang, Memory Systems: Cache, DRAM, Disk, Amsterdam: Elsevier, 2008.
  14. E. Ipek, O. Mutlu, J. F. Martinez, and R. Caruana, "Selfoptimizing memory controllers: a reinforcement learning approach," in Proceedings of the 35th International Symposium on Computer Architecture, Beijing, China, 2008, pp. 39-50.
  15. K. J. Nesbit, N. Aggarwal, J. P. Laudon, and J. E. Smith, "Fair queuing memory systems," in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, FL, 2006, pp. 208-222.
  16. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, "Memory access scheduling," in Proceedings of the 27th Annual International Symposium on Computer Architecture, Vancouver, Canada, 2000, pp. 128-138.
  17. J. Yan and W. Zhang, "WCET analysis for multi-core processors with shared L2 instruction caches," in Proceedings of 14th IEEE Real-Time and Embedded Technology and Applications Symposium, St. Louis, MO, 2008, pp. 80-89.
  18. J. C. Tiernan, "An efficient search algorithm to find the elementary circuits of graph," Communication of the ACM, vol. 13, no. 12, pp. 722-726, 1970. https://doi.org/10.1145/362814.362819
  19. SimpleScalar, http://www.simplescalar.com.
  20. DRAMSim2, http://www.ece.umd.edu/dramsim/.
  21. IBM ILOG CPLEX Optimizer, http://www.ilog.com/products/ cplex/.
  22. D. Hardy and I. Puaut, "WCET analysis of multi-level noninclusive set-associative instruction caches," in Proceedings of the 29th Real-Time Systems Symposium, Barcelona, Spain, 2008, pp. 456-466.
  23. J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper, "The Malardalen WCET benchmarks: past, present and future," in Proceedings of the 10th International Workshop on Worst- Case Execution Time Analysis, Brussels, Belgium, 2010, pp. 137-147.
  24. P. Atanassov and P. Puschner, "Impact of DRAM refresh on the execution time of real-time tasks," in Proceedings of IEEE International Workshop on Application of Reliable Computing and Communication, Seoul, Korea, 2001.

Cited by

  1. Thread-level priority assignment in global multiprocessor scheduling for DAG tasks vol.113, 2016, https://doi.org/10.1016/j.jss.2015.12.004
  2. GPU-SAM: Leveraging multi-GPU split-and-merge execution for system-wide real-time support vol.117, 2016, https://doi.org/10.1016/j.jss.2016.02.009