WARP: Memory Subsystem Effective for Wrapping Bursts of a Cache

Jang, Wooyoung;

doi:10.4218/etrij.17.0116.0710

ETRI Journal

Volume 39 Issue 3
/
Pages.428-436
/
2017
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

WARP: Memory Subsystem Effective for Wrapping Bursts of a Cache

Jang, Wooyoung (Department of Electronics and Electrical Engineering, Dankook University)

Received : 2016.10.02
Accepted : 2017.02.14
Published : 2017.06.01

https://doi.org/10.4218/etrij.17.0116.0710 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

State-of-the-art processors require increasingly complicated memory services for high performance and low power consumption. In particular, they request transfers within a burst in a wrap-around order to minimize the miss penalty of a cache. However, synchronous dynamic random access memories (SDRAMs) do not always generate transfers in the wrap-round order required by the processors. Thus, a memory subsystem rearranges the SDRAM transfers in the wrap-around order, but the rearrangement process may increase memory latency and waste the bandwidth of on-chip interconnects. In this paper, we present a memory subsystem that is effective for the wrapping bursts of a cache. The proposed memory subsystem makes SDRAMs generate transfers in an intermediate order, where the transfers are rearranged in the wrap-around order with minimal penalties. Then, the transfers are delivered with priority, depending on the program locality in space. Experimental results showed that the proposed memory subsystem minimizes the memory performance loss resulting from wrapping bursts and, thus, improves program execution time.

Keywords

References

ITRS, "International Technology Roadmap for Semiconductors," 2013.
JEDEC, DDR 1, 2, 3, and 4 SDRAM Standard, Accessed 2016. http://www.jedec.org
ARM, ARM Processor Architecture, Accessed 2015. http://www.arm.com
C. Bienia et al., "The PARSEC Benchmark Suite: Characterization and Architectural Implications," Int. Conf. Parallel Archit. Compilation Techn., Toronto, Canada, Oct. 25-29, 2008, pp. 72-81.
A. Patel et al., "MARSS: a Full System Simulator for Multicore x86 CPUs," ACM/IEEE Des. Autom. Conf., San Diego, CA, USA, June 5-10, 2011, pp. 1050-1055.
P. Rosenfeld, E. Cooper-Balis, and B. Jacob, "DRAMSim2: a Cycle Accurate Memory System Simulator," Comput. Archit. Lett., vol. 10, no. 1, Jan. 2011, pp. 16-19. https://doi.org/10.1109/L-CA.2011.4
Y.H. Son et al., "Reducing Memory Access Latency with Asymmetric DRAM Bank Organizations," IEEE Int. Symp. Comput. Archit., Tel Aviv, Israel, June 23-27, 2013, pp. 380-391.
D. Lee et al., "Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case," IEEE Int. Symp. High Performance Comput. Archit., San Francisco, CA, USA, Feb. 7-11, 2015, pp. 489-501.
D. Lee et al., "Tiered-Latency DRAM: a Low Latency and Low Cost DRAM Architecture," IEEE Int. Symp. High Performance Comput. Archit., Shenzhen, China, Feb. 23-27, 2013, pp. 615-626.
T. Zhang et al., "Half-DRAM: a High-Bandwidth and Low-Power DRAM Architecture form the Rethinking of Fine-Grained Activation," IEEE/ACM Int. Symp. Comput. Archit., Minneapolis, MN, USA, June 14-18, 2014, pp. 349-360.
Y. Kim et al., "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," Annu. Int. Symp. Comput. Archit., Portland, OR, USA, June 9-13, 2012, pp. 368-379.
Y.H. Son et al., "CiDRA: a Cache-Inspired DRAM Resilience Architecture," IEEE Int. Symp. High Performance Comput. Archit., San Francisco, CA, USA, Feb. 7-11, 2015, pp. 502-513.
J. Yu and W. Jang, "FDRAM: DRAM Architecture Flexible in Successive Row and Column Accesses," IEEE Int. Conf. Comput. Des., New York, USA, Oct. 18-21, 2015, pp. 480-483.
T.G. Rogers, M. O'Connor, and T.M. Aamodt, "Cache-Conscious Wavefront Scheduling," IEEE/ACM Int. Symp. Microarchit., Cambridge, UK, Dec. 1-5, 2014, pp. 72-83.
W.J. Starke et al., "The Cache and Memory Subsystems of the IBM POWER8 Processor," IBM J. Res. Develop., vol. 59, no. 1, Jan./Feb. 2015, pp. 3:1-3:13.
M. Hashemi et al., "Accelerating Dependent Cache Misses with an Enhanced Memory Controller," Annu. Int. Symp. Comput. Archit., Seoul, Rep. of Korea, June 18-22, 2016, pp. 444-455.
S. Wasly and R. Pellizzoni. "Hiding Memory Latency Using Fixed Priority Scheduling," IEEE Real-Time Embedded Technol. Applicat. Symp., Berlin, Germany, Apr. 15-17, 2014, pp. 75-86.
C.H. Hahm et al., "Memory Access Scheduling for a Smart TV," IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 2, Feb. 2016, pp. 399-411. https://doi.org/10.1109/TCSVT.2015.2389414
E. Ebrahimi et al., "Parallel Application Memory Scheduling," IEEE/ACM Int. Symp. Microarchit., Porto Alegre, Brazil, Dec. 3-7, 2011, pp. 362-373.
W. Zhang, F. Liu, and R. Fan, "Cache Matching: Thread Scheduling to Maximize Data Reuse," Proc. High Performance Comput. Symp., Tampa, FL, USA, Apr. 13-16, 2014, pp. 47-54.

ETRI Journal

WARP: Memory Subsystem Effective for Wrapping Bursts of a Cache

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)