DOI QR코드

DOI QR Code

WARP: Memory Subsystem Effective for Wrapping Bursts of a Cache

  • Jang, Wooyoung (Department of Electronics and Electrical Engineering, Dankook University)
  • Received : 2016.10.02
  • Accepted : 2017.02.14
  • Published : 2017.06.01

Abstract

State-of-the-art processors require increasingly complicated memory services for high performance and low power consumption. In particular, they request transfers within a burst in a wrap-around order to minimize the miss penalty of a cache. However, synchronous dynamic random access memories (SDRAMs) do not always generate transfers in the wrap-round order required by the processors. Thus, a memory subsystem rearranges the SDRAM transfers in the wrap-around order, but the rearrangement process may increase memory latency and waste the bandwidth of on-chip interconnects. In this paper, we present a memory subsystem that is effective for the wrapping bursts of a cache. The proposed memory subsystem makes SDRAMs generate transfers in an intermediate order, where the transfers are rearranged in the wrap-around order with minimal penalties. Then, the transfers are delivered with priority, depending on the program locality in space. Experimental results showed that the proposed memory subsystem minimizes the memory performance loss resulting from wrapping bursts and, thus, improves program execution time.

Keywords

References

  1. ITRS, "International Technology Roadmap for Semiconductors," 2013.
  2. JEDEC, DDR 1, 2, 3, and 4 SDRAM Standard, Accessed 2016. http://www.jedec.org
  3. ARM, ARM Processor Architecture, Accessed 2015. http://www.arm.com
  4. C. Bienia et al., "The PARSEC Benchmark Suite: Characterization and Architectural Implications," Int. Conf. Parallel Archit. Compilation Techn., Toronto, Canada, Oct. 25-29, 2008, pp. 72-81.
  5. A. Patel et al., "MARSS: a Full System Simulator for Multicore x86 CPUs," ACM/IEEE Des. Autom. Conf., San Diego, CA, USA, June 5-10, 2011, pp. 1050-1055.
  6. P. Rosenfeld, E. Cooper-Balis, and B. Jacob, "DRAMSim2: a Cycle Accurate Memory System Simulator," Comput. Archit. Lett., vol. 10, no. 1, Jan. 2011, pp. 16-19. https://doi.org/10.1109/L-CA.2011.4
  7. Y.H. Son et al., "Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations," IEEE Int. Symp. Comput. Archit., Tel Aviv, Israel, June 23-27, 2013, pp. 380-391.
  8. D. Lee et al., "Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case," IEEE Int. Symp. High Performance Comput. Archit., San Francisco, CA, USA, Feb. 7-11, 2015, pp. 489-501.
  9. D. Lee et al., "Tiered-Latency DRAM: a Low Latency and Low Cost DRAM Architecture," IEEE Int. Symp. High Performance Comput. Archit., Shenzhen, China, Feb. 23-27, 2013, pp. 615-626.
  10. T. Zhang et al., "Half-DRAM: a High-Bandwidth and Low-Power DRAM Architecture form the Rethinking of Fine-Grained Activation," IEEE/ACM Int. Symp. Comput. Archit., Minneapolis, MN, USA, June 14-18, 2014, pp. 349-360.
  11. Y. Kim et al., "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," Annu. Int. Symp. Comput. Archit., Portland, OR, USA, June 9-13, 2012, pp. 368-379.
  12. Y.H. Son et al., "CiDRA: a Cache-Inspired DRAM Resilience Architecture," IEEE Int. Symp. High Performance Comput. Archit., San Francisco, CA, USA, Feb. 7-11, 2015, pp. 502-513.
  13. J. Yu and W. Jang, "FDRAM: DRAM Architecture Flexible in Successive Row and Column Accesses," IEEE Int. Conf. Comput. Des., New York, USA, Oct. 18-21, 2015, pp. 480-483.
  14. T.G. Rogers, M. O'Connor, and T.M. Aamodt, "Cache-Conscious Wavefront Scheduling," IEEE/ACM Int. Symp. Microarchit., Cambridge, UK, Dec. 1-5, 2014, pp. 72-83.
  15. W.J. Starke et al., "The Cache and Memory Subsystems of the IBM POWER8 Processor," IBM J. Res. Develop., vol. 59, no. 1, Jan./Feb. 2015, pp. 3:1-3:13.
  16. M. Hashemi et al., "Accelerating Dependent Cache Misses with an Enhanced Memory Controller," Annu. Int. Symp. Comput. Archit., Seoul, Rep. of Korea, June 18-22, 2016, pp. 444-455.
  17. S. Wasly and R. Pellizzoni. "Hiding Memory Latency Using Fixed Priority Scheduling," IEEE Real-Time Embedded Technol. Applicat. Symp., Berlin, Germany, Apr. 15-17, 2014, pp. 75-86.
  18. C.H. Hahm et al., "Memory Access Scheduling for a Smart TV," IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 2, Feb. 2016, pp. 399-411. https://doi.org/10.1109/TCSVT.2015.2389414
  19. E. Ebrahimi et al., "Parallel Application Memory Scheduling," IEEE/ACM Int. Symp. Microarchit., Porto Alegre, Brazil, Dec. 3-7, 2011, pp. 362-373.
  20. W. Zhang, F. Liu, and R. Fan, "Cache Matching: Thread Scheduling to Maximize Data Reuse," Proc. High Performance Comput. Symp., Tampa, FL, USA, Apr. 13-16, 2014, pp. 47-54.