A Hardware Cache Prefetching Scheme for Multimedia Data with Intermittently Irregular Strides

단속적(斷續的) 불규칙 주소간격을 갖는 멀티미디어 데이타를 위한 하드웨어 캐시 선인출 방법

  • 전영숙 (충북대학교 컴퓨터과학과) ;
  • 문현주 (나사렛대학교 정보과학부) ;
  • 전중남 (충북대학교 전기전자컴퓨터공학부) ;
  • 김석일 (충북대학교 컴퓨터과학과)
  • Published : 2004.12.01

Abstract

Multimedia applications are required to process the huge amount of data at high speed in real time. The memory reference instructions such as loads and stores are the main factor which limits the high speed execution of processor. To enhance the memory reference speed, cache prefetch schemes are used so as to reduce the cache miss ratio and the total execution time by previously fetching data into cache that is expected to be referenced in the future. In this study, we present an advanced data cache prefetching scheme that improves the conventional RPT (reference prediction table) based scheme. We considers the cache line size in calculation of the address stride referenced by the same instruction, and enhances the prefetching algorithm so that the effect of prefetching could be maintained even if an irregular address stride is inserted into the series of uniform strides. According to experiment results on multimedia benchmark programs, the cache miss ratio has been improved 29% in average compared to the conventional RPT scheme while the bus usage has increased relatively small amount (0.03%).

멀티미디어 응용 프로그램은 방대한 양의 데이타를 실시간으로 고속 처리해야 한다. 적재/저장과 같은 메모리 참조 명령어는 프로세서의 고속 수행을 방해하는 주요인이다. 메모리 참조 속도를 향상시키기 위하여, 다음에 참조될 것으로 예상되는 데이타를 미리 캐시로 인출함으로써, 캐시 미스율과 전체 수행시간을 감소시키는 캐시 선인출 방법이 활용되고 있다. 본 연구에서는 기존의 참조예측표(RPT: Reference Prediction Table)를 사용하는 방법을 개선한 데이타 캐시 선인출 방법을 제시한다. 동일한 명령어가 참조하는 데이타의 주소간격을 계산할 때 캐시의 라인크기 단위의 주소간격을 사용하고, 규칙적인 주소간격에 불규칙한 간격이 하나 포함하더라도 선인출 효과를 유지할 수 있도록 선인출 알고리즘을 개선하였다. 일반적으로 많이 사용되는 멀티미디어 프로그램에 대하여 실험한 결과, 기존의 RPT 방식에 비하여 버스 사용량은 약 0.03% 증가한 반면에 캐시 미스율은 평균적으로 29% 정도 향상되었다.

Keywords

References

  1. A. J. Smith, 'Cache Memories,' ACM Computing Surveys, 14:473-530, Sep, 1982 https://doi.org/10.1145/356887.356892
  2. N. P. Jouppi, 'Improving directed-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers,' Proc. of the 17th Annual International Symposium on Computer Architecture, pp. 364-373, May 1990 https://doi.org/10.1109/ISCA.1990.134547
  3. D. Joseph and D. Grunwald, 'Prefetching Using Markov Predictors,' IEEE Trans. on computers, Vol. 48, No 2, Feb. 1999 https://doi.org/10.1109/12.752653
  4. D. Joshep and D. Grunwald, 'Prefetching Using Markov Predictors,' in proc. Of the 24th Annual Intl. Symp. On Computer Architecture, pp. 252-263, June 1997 https://doi.org/10.1145/264107.264207
  5. J. Kim, K. V. Palem and W-F. Wong, 'A Framework for Data Prefetching using Off-line Training of Markovian Predictors,' in Proc. IEEE Intl. Conf. on Computer Design(ICCD), pp. 340-347. Sep. 2002 https://doi.org/10.1109/ICCD.2002.1106792
  6. H. G, A. R, and A R. Omondi, 'DSTRlDE : Data-cache miss-address-based stride prefetching scheme for multimedia processors,' 6th Australasian Computer Systems Architecture Conference (AustCSAC'0l), pp. 62-70, Jan. 29-30, 2001. https://doi.org/10.1109/ACAC.2001.903360
  7. R. Cucchiara, M. Piccardi and A. Prati, 'Hardware Prefetching Technique for Cache Memories in Multimedia Applications,' in proc. Of IEEE Intl. Workshop on Computer Architectures for Machine Perception (CAMP), 2000 https://doi.org/10.1109/CAMP.2000.875990
  8. R. Cucchiara, M. Piccardi and A. Prati, 'Temporal Analysis of cache Prefetching Strategies for Multimedia Applications,' in Proc. Of IEEE Intl. Performance, Computing and Communications Conf.(IPCCC), pp. 311-318, Apr. 2001 https://doi.org/10.1109/IPCCC.2001.918668
  9. R. Cucchiara, A. Prati, M. Piccardi, 'Data-type dependent cache prefetching for MPEG applications,' in Proc. Of IEEE Intl. Performance, Computing and communications Conf. (IPCCC), pp. 115-122, Apr. 2002 https://doi.org/10.1109/IPCCC.2002.995142
  10. J. L. Baer and T-Fu Chen, 'An effective on-chip preloading scheme to reduce data access penalty,' In Proceedings of Supercomputing '91, pp. 176-186, Nov. 1991 https://doi.org/10.1145/125826.125932
  11. T-Fu Chen and J-L, Baer, 'Effective Hardware-Based data prefetching for High-Performance Processors,' IEEE Trans. Computers, Vol. 44, No. 5, pp. 609-623, May 1995 https://doi.org/10.1109/12.381947
  12. H. J. Moon, 'A Cache Managing Strategy for Fast Media Data Access,' Ph.D. thesis. Computer Science Department Chungbuk, National University, Feb. 2003
  13. A. Srivastava and A. Eustace, 'ATOM: A System for Building Customized Program Analysis Tools,' Proceedings of the ACM SIGPLAN 94, pp. 196-205, 1994 https://doi.org/10.1145/178243.178260
  14. M. D. Hill, 'Dinero lll Cache Simulator,' Technical Report, Department Computer Science, University of Wisconsin, Madison, 1990
  15. J. H. Lee, S. W. Jeong, S. D. Kim and C. C. Weems, 'An Intelligent Cache System with Hardware Prefetching for High Performance,' IEEE Trans. on computers, Vol. 52, No 5, May. 2003 https://doi.org/10.1109/TC.2003.1197127
  16. J. M. Mulder, N. T. Quach, and M. J. Flynn, 'An Area Model for On-Chip Memories and its Applications,' IEEE Journal of Solid State Circuits, Vol. 26, No 2, pp. 98-106, Feb. 1991 https://doi.org/10.1109/4.68123
  17. J. L. Baer and T-Fu Chen, 'An Effective on-Chip Preloading Scheme to Reduce data Access Penalty,' ACM, pp. 176-186, 1991 https://doi.org/10.1145/125826.125932
  18. K.I. Farkas and N.P. Jouppi, 'Complexity/ Performance Tradeoffs Architecture,' Proc. of the Int. Symp. on computer architecture, pp. 211-222, Apr. 1994