DOI QR코드

DOI QR Code

Runtime Prediction Based on Workload-Aware Clustering

병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구

  • Kim, Eunhye (IT Convergence Technology Research Laboratory, ETRI) ;
  • Park, Ju-Won (Korea Institute of Science and Technology Information, KISTI)
  • 김은혜 (한국전자통신연구원 융합기술연구소) ;
  • 박주원 (한국과학기술정보연구원 슈퍼컴퓨팅본부)
  • Received : 2015.06.05
  • Accepted : 2015.09.04
  • Published : 2015.09.30

Abstract

Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

Keywords

References

  1. Agarwal, B. and Mittal, N., Text classification using machine learning methods-A survey. Advances in Intelligent Systems and Computing, 2014, Vol. 236, pp. 701-709. https://doi.org/10.1007/978-81-322-1602-5_75
  2. Andrews, N.O. and Fox, E.A., Recent developments in document clustering, Technical report, TR-07-35, Department of Computer Science, Virginia Tech, 2007.
  3. Chandola, V., Banerjee, A., and Kumar, V., Anomaly detection : A survey. ACM Computing Surveys, 2009, Vol. 41, No. 3, pp. 15-57.
  4. Deelman, E., Gannon, D., Shields, M., and Taylor, I., Workflows and e-science : An overview of workflow system features and capabilities. Future Generation Computer Systems, 2009, Vol. 25, No. 5, pp. 528-540. https://doi.org/10.1016/j.future.2008.06.012
  5. Downey, A.B. and Feitelson, D.G., "The elusive goal of workload characterization. ACM SIGMETRICS Performance Evaluation Review, 1999, Vol. 26, pp. 14-29. https://doi.org/10.1145/309746.309750
  6. Downey, A.B., Using queue time predictions for processor allocation, in Proc. of the Workshop on Job Scheduling Strategies for Parallel Processing. Lecture Notes in Computer Science, 1997, Vol. 1291, pp. 35-57.
  7. Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., and Vapnik, V., Support vector regression machines. Neural Information Processing Systems, 1997, pp. 155-161, MIT Press.
  8. Feitelson, D., Parallel workloads archive and standard workload format, [Online]. Available : http://www.cs.huji.ac.il/labs/parallel/workload.
  9. Feitelson, D.G. and Nitzberg, B., Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860, in Proc. of the Workshop on Job Scheduling Strategies for Parallel Processing. Lecture Notes in Computer Science, 1995, Vol. 949, pp. 337-360.
  10. Gibbons, R., A historical application profiler for use by parallel schedulers. Lecture Notes on Computer Science, 1997, Vol. 1297, pp. 58-75.
  11. Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., and Myers, J., Examining the challenges of scientific workflows. IEEE Computer, 2007, Vol. 40, No. 12, pp. 24-32.
  12. Hardle, W.K. and Simar, L., Applied Multivariate Statistical Analysis, Springer-Verlag, Berlin, 2012.
  13. Kaufman, L. and Rousseeuw, P.J., Finding Groups in Data : An Introduction to Cluster Analysis, John Wiley and Sons, 2009.
  14. Kohonen, T., Essentials of the self-organizing map. Neural Networks, 2013, Vol. 37, pp. 52-65. https://doi.org/10.1016/j.neunet.2012.09.018
  15. Kohonen, T., Self-Organizing Map, Springer-Verlag, Berlin, 2001.
  16. Lifka, D.A., The ANL/IBM scheduling system, in Proc. of the Workshop on Job Scheduling Strategies for Parallel Processing. Lecture Notes in Computer Science, 1995, Vol. 949, pp. 295-303.
  17. Minh, T.N. and Wolters, L., Using historical data to predict application runtimes on backfilling parallel systems, in Proc. of 18th Euromicro Conference on Parallel. Distributed and Network-based Processing, 2010, pp. 246-252.
  18. Mu'alem, A.W. and Feitelson, D.G., Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Transactions on Parallel and Distributed Systems, 2001, Vol. 12, No. 6.
  19. R Development Core Team, R : A language and environment for statistical computing, [Online]. Available : http://www.r-project.org.
  20. Senger, L.J., Santana, M.J., and Santana, R.H.C., An instance-based learning approach for predicting execution times of parallel applications, in Proc. of International Information and Telecommunication Technologies Symposium, 2005, pp. 9-15.
  21. Smith, W., Foster, I., and Taylor, V., Predicting application run times with historical information. Journal of Parallel and Distributed Computing, 2004, Vol. 64, No. 9, pp. 1007-1016. https://doi.org/10.1016/j.jpdc.2004.06.008
  22. Tsafrir, D., Etsion, Y., and Feitelson, D., Backfilling using system-generated predictions rather than user runtime estimates. IEEE Transactions on Parallel and Distributed Systems, 2007, Vol. 18, No. 6.
  23. Vapnik, V., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995.
  24. Weinland, D., Ronfard, R., and Boyer, E., A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision and Image Understanding, 2011, Vol. 115, No. 2, pp. 224-241. https://doi.org/10.1016/j.cviu.2010.10.002

Cited by

  1. 소스코드의 취약점 이력 학습을 이용한 소프트웨어 보안 취약점 분석 시스템 vol.18, pp.11, 2015, https://doi.org/10.5762/kais.2017.18.11.46