다중 프로세서에서의 캐시접근 실패율을 위한 경험적 모델링

Empirical Modeling for Cache Miss Rates in Multiprocessors

  • 이강우 (동국대학교 정보통신공학과) ;
  • 양기주 (동국대학교 정보통신공학과) ;
  • 박춘식 (동국대학교 정보통신공학과)
  • 발행 : 2006.02.01

초록

본 논문에서는, 소규모 시뮬레이션을 통하여 수집된 표본에 통계적인 추정방법을 적용하여 모델을 구하는 경험적 모델링 기법을 제안한다. 이 기법을 이용하여 대칭형 구조를 갖는 다중프로세서 시스템에서의 캐시접근실패율을 위한 두 종류의 모델을 구하였다. 목표시스템의 사양이 고정되었을 때 입력데이타의 크기변화에 따르는 모델과, 입력데이타의 크기가 고정되었을 때 목표시스템의 프로세서 수의 변화에 따르는 모델이다. 모델의 정확성을 제고하기 위하여 한 프로그램에 존재하는 공유데이타들에 대하여 종류별 캐시접근실패에 대한 개별적인 모델들을 구한 후 이들을 종합함으로써 최종적인 모델을 구하였다. 또한 최소 제곱 추정법과 로버스트 추정법을 병용하여 이탈점으로 인한 왜곡을 최소화함으로써 모델의 정확도를 향상시켰다. 경험적 모델링은 표본에 대한 분석이 필요 없으면서도 모델의 정확도가 매우 높다. 또한 소규모의 시뮬레이션만 수행하면 되고, 실험을 통하여 일련의 표본을 수집할 수만 있으면 모든 분야의 연구에 적용할 수 있다. 경험적 모델을 이용한 24가지 경우의 예측시도 중 17번의 경우에는 $1\%$ 미만의 예측오차율을 보였으며, 나머지 경우에도 매우 높은 정확도를 보였다. 특히 프로그램의 실행양식이 불규칙하거나, 표본의 수가 충분하기에는 부족한 경우에도 좋은 결과를 보여준다.

This paper introduces an empirical modeling technique. This technique uses a set of sample results which are collected from a few small scale simulations. Empirical models are developed by applying a couple of statistical estimation techniques to these samples. We built two types of models for cache miss rates in Symmetric Multiprocessor systems. One is for the changes of input data set size while the specification of target system is fixed. The other is for the changes of the number of processors in target system while the input data set size is fixed. To develop accurate models, we built individual model for every kind of cache misses for each shared data structure in a program. The final model is then obtained by integrating them. Besides, combined use of Least Mean Squares and Robust Estimations enhances the quality of models by minimizing the distortion due to outliers. Empirical modeling technique produces extremely accurate models without analysis on sample data. In addition, since only snail scale simulations are necessary, once a set of samples can be collected, empirical method can be adopted in any research areas. In 17 cases among 24 trials, empirical models present extremely low prediction errors below $1\%$. In the remaining cases, the accuracy is excellent, as well. The models sustain high quality even when the behavioral characteristics of programs are irregular and the number of samples are barely enough.

키워드

참고문헌

  1. M. Brorsson, F. Dahlgren, H. Nilsson, and P. Stenstrom, 'The CacheMire Test Bench--A Flexible and Effective Approach for Simulation of Multiprocessors,' Proc. of 26th Annual IEEE International Simulation Symposium, pp. 41-49, Apr. 1993
  2. D. Culler, J. P. Singh, A. Gupta, 'Parallel Computer Architecture: A Hardware/Software Approach,' Morgan Kaufmann, 1998
  3. M. Dubois, and J. C. Wang, 'Shared Block Contention in a Cache Coherence protocol,' IEEE Transactions on Computers, Vol. 40 No.5, pp. 317-328, May 1991 https://doi.org/10.1109/12.88487
  4. A. Gupta, W. Weber, 'Cache Invalidation Patterns in Shared Memory Multiprocessors,' IEEE Trans. on Computers 41(7): pp. 794-810, Jul. 1992 https://doi.org/10.1109/12.256449
  5. E. Rothberg, J. P. Singh, A. Gupta, 'Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors,' Proc. of 20th Ann. Int'l. Symp, on Computer Architecture, pp. 14-25, May 1993 https://doi.org/10.1145/165123.165126
  6. G. Weerasinghe, I. Antonios, L. Lipsky, 'An Analytic Performance Model of Parallel Systems that Perform N Tasks Using P Processors that can Fail,' IEEE Int'l. Symp. on Network Computing and Applications, pp. 310-319, 2001
  7. R. E. Matick, 'Comparison of Analytic Performance Models using Closed Mean-value Analysis versus Open-Queueing Theory for Estimating Cycles per Instruction of Memory Hierarchies,' IBM Journal of Research and Development, Vol. 47, Issue 4, pp. 495-517, Jul.2003 https://doi.org/10.1147/rd.474.0495
  8. A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, A. Purkastha, 'A Framework for Performance Modeling and Prediction,' Conference on High Performance Networking and Computing, Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pp. 1-17, 2002
  9. D. Kerbyson, A. Hoisie, H. Wasserman, 'Modeling the Performance of Large-Scale Systems,' IEEE Proc. on Software, 150(4), pp. 214-221, Aug. 2003 https://doi.org/10.1049/ip-sen:20030808
  10. I. Gluhovsky, B. O'Krafta, 'Comprehensive Multiprocessor Cache Miss Rate Generation using Multivariate Models,' ACM Trans. on Computer Systems, Vol. 23, No.2, pp. 111-145, May 2005 https://doi.org/10.1145/1062247.1062248
  11. G. Simsion, G. Witt, 'Data Modeling Essentials (The Morgan Kaufmann Series in Data Management Systems) 3rd Edition,' Morgan Kaufmann, 2004
  12. R. Hogg, 'Adaptive Robust Estimation,' Journal of American Statistics Association, 69, pp. 909-927, 1974 https://doi.org/10.2307/2286160
  13. R. L. Launer, G. N. Wilkinson, 'Robustness in Statistics,' Academy Press, 1978
  14. W. Press, B. Flannery, S. Teukolsky, W. Vetterling, 'Numerical Recipes,' Cambridge University Press, 1986
  15. M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, P. Stenstrom, 'Detection and Elimination of Useless Misses in Multiprocessors,' Proc. of 20th Annual International Symp. on Computer Architecture, pp. 88-97, May 1993 https://doi.org/10.1145/165123.165145
  16. M. McLaughlin, 'Market Share: Servers, Worldwide, 2004,' Gartner Research, 2004
  17. G. Weiss, M. Chuba, 'The Future of the Server: A Five-year Outlook,' Gartner Research, Jul. 2003
  18. www.intel.com, 'Transitioning to the Intel Itanium Architecture,' Nov. 2003
  19. M. I. Hubley, M. Ricjardson, 'Windows 2000 Symmetric Multiprocessing(SMP): Perspective,' Gartner Research, Oct, 2002
  20. J. P. Singh, W-D. Weber, A. Gupta, 'SPLASH: Stanford Parallel Applications for SharedMemory,' Compo Arch. News, 20(1):5-44 Mar. 1992 https://doi.org/10.1145/130823.130824
  21. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, A. Gupta, 'The SPLASH-2 Programs: Characterization and Methodological Consideration,' Proc. of 22nd Ann. Int. Symp. on Computer Architecture, pp. 24-36, May 1995 https://doi.org/10.1145/223982.223990
  22. V. A. Aho, J. E. Hopcroft, J. D. Ullman, 'Data Structures and Algorithms,' Addison-Wesley Publishing Company, 1983
  23. M. Parashar, S. Hariri, 'Compile-Time Performance Prediction of HPF/Fortran 90D,' IEEE Performance Evaluation, pp. 57-73, Spring 1996 https://doi.org/10.1109/88.481665