A Study on Design and Cache Replacement Policy for Cascaded Cache Based on Non-Volatile Memories

비휘발성 메모리 시스템을 위한 저전력 연쇄 캐시 구조 및 최적화된 캐시 교체 정책에 대한 연구

  • Juhee Choi (Dept. of Smart Information Communication Engineering, Sangmyung University)
  • 최주희 (상명대학교 스마트정보통신공학과)
  • Received : 2023.09.08
  • Accepted : 2023.09.15
  • Published : 2023.09.30

Abstract

The importance of load-to-use latency has been highlighted as state-of-the-art computing cores adopt deep pipelines and high clock frequencies. The cascaded cache was recently proposed to reduce the access cycle of the L1 cache by utilizing differences in latencies among banks of the cache structure. However, this study assumes the cache is comprised of SRAM, making it unsuitable for direct application to non-volatile memory-based systems. This paper proposes a novel mechanism and structure for lowering dynamic energy consumption. It inserts monitoring logic to keep track of swap operations and write counts. If the ratio of swap operations to total write counts surpasses a set threshold, the cache controller skips the swap of cache blocks, which leads to reducing write operations. To validate this approach, experiments are conducted on the non-volatile memory-based cascaded cache. The results show a reduction in write operations by an average of 16.7% with a negligible increase in latencies.

Keywords

Acknowledgement

본 연구는 2021년도 과학기술정보통신부의 재원으로 한국연구재단의 지원을 받은 기초연구사업 연구임(NRF-2021R1G1A1004340).

References

  1. J. L. Hennessy and D. A. Patterson, "Computer Organization and Design RISC-V Edition: The Hardware Software Interface", Morgan Kaufmann, 2017.
  2. Shen, John Paul, and Mikko H. Lipasti, "Modern processor design: fundamentals of superscalar processors", Waveland Press, 2013.
  3. Arm Developer, "Cortex-R7 processor," 2021; https://developer.arm.com/ip-products/processors/cortex-r/cortex-r7.
  4. Intel, "Developers, Tools, Instruction Set Architecture Extensions, Memory Performance," https://www.intel.com/content/www/us/en/developer/articles/technical/memory-performance-in-a-nutshell.html., 2023.
  5. C. Kim, D. Burger, and S. W. Keckler, "Nonuniform cache architectures for wire-delay dominated on-chip caches," IEEE Micro, vol. 23, no. 6, pp. 99-107, 2003. https://doi.org/10.1109/MM.2003.1261393
  6. M. Rapp, A. Pathania, T. Mitra, and J. Henkel, "Neural network-based performance prediction for task migration on S-NUCA many-cores," IEEE Transactions on Computers, vol. 70, no. 10, pp. 1691-1704, 2021.
  7. Choi, Juhee, and Heemin Park. "Cascaded Cache Based on Recently Used Order for Latency Optimization for IoT." Journal of Computing Science and Engineering 15.3 (2021): 107-114. https://doi.org/10.5626/JCSE.2021.15.3.107
  8. Fantini, Paolo. "Phase change memory applications: the history, the present and the future." Journal of Physics D: Applied Physics 53.28 (2020): 283002.
  9. Wang, K. L., J. G. Alzate, and P. Khalili Amiri. "Low-power non-volatile spintronic memory: STT-RAM and beyond." Journal of Physics D: Applied Physics 46.7 (2013): 074003.
  10. Pan, Xiao, and T. P. Ma. "Retention mechanism study of the ferroelectric field effect transistor." Applied Physics Letters 99.1 (2011).
  11. S. Tehrani et al., "Magnetoresistive random access memory using magnetic tunnel junctions," Proceedings of the IEEE, vol. 91, No. 5, pp. 703-714, 2003. https://doi.org/10.1109/JPROC.2003.811804
  12. J. Liu, et al. "Voltage-induced magnetization switching method utilizing dipole coupled magnetic tunnel junction," Journal of Magnetism and Magnetic Materials, Vol. 513, pp. 167105. 2020.
  13. J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood, "gem5-gpu: A heterogeneous cpu-gpu simulator," IEEE Computer Architecture Letters, Vol. 14, No. 1, pp. 34-36, 2015. https://doi.org/10.1109/LCA.2014.2299539
  14. J. Henning, "Spec cpu2006 benchmark descriptions," SIGARCH Comput. Archit. News, Vol. 34, No. 4, pp. 1-17, 2006. https://doi.org/10.1145/1186736.1186737