Design and Implementation of the Central Queue Based Loop Scheduling Method

중앙 큐 기반의 루프 스케쥴링 기법의 설계 및 구현

  • Published : 2001.09.25

Abstract

In this paper, we present a new scheduling method called CDSS(Carried-Dependence Self-Scheduling) for efficiently execution of the loop with intra dependency between iterations based on the central queue. We also implemented it on shared memory system using Java language. Also, we study the modification that converts the existing self-scheduling method based on the central task queue for parallel loops onto the same form applied to loop with loop-carried dependences. The proposed method is self scheduling and assigns the loops in three-level considering the synchronization point according to the dependence distance of the loops. To adapt the proposed scheme and modified methods into various platforms, including a uni-processor system, we use threads for implementation. Compared to other assignment algorithms with various changes of application and system parameters, CDSS is found to be more efficient than other methods in overall execution time including scheduling overheads. CDSS shows improved performance over modified SS, Factoring, GSS and CSS by about 0.02, 40.5, 46.1 and 53.6%, respectively. In CDSS, we achieve the best performance on varying application programs using a few threads, which equal the dependence distance.

본 논문에서는 루프의 반복들간에 종속 관계가 존재하는 루프의 효율적 수행을 위한 중앙 큐 기반의 새로운 할당 기법 CDSS(Carreid-Dependence Self Scheduling)를 제안하며, 이를 공유 메모리 환경에서 Java 언어로 구현하였다. 또한, 중앙 작업 큐 기반의 병렬 루프를 위한 셀프 스케쥴링(self-scheduling) 기법들을 루프 캐리 종속성(loop-carried dependence)을 가진 루프의 할당에 적용하기 위한 그들의 변형에 대해 알아본다. 제안된 기법은 종속 거리에 따른 동기화 시점을 고려하여 루프를 세 단계별로 할당하는 셀프 스케쥴링 기법이다. 단일처리기 시스템을 포함한 여러 플랫폼에 적용하기 위해 제안된 방법과 변형된 기법들을 스레드 레벨로 구현하였다. 응용 프로그램과 시스템 파라메터 값을 다양하게 하여 변형된 기법들과 비교 분석한 결과, 제안된 기법은 변형된 다른 기법들에 비해 스케쥴링 오버헤드를 포함한 전체 루프의 수행 시간을 줄여 효율적이다. 변형된 SS, Factoring, GSS, CSS에 대해 각각 0.02, 40.5, 46.1, 53.6%의 성능 향상을 보였다. 그리고, CDSS 기법으로 다양한 응용 프로그램에 대해 종속 거리에 해당하는 적은 수의 스레드를 사용하여 최대의 성능을 얻을 수 있다.

Keywords

References

  1. M. J. Quinn, Parallel Computing -Theory and Practice, McGraw Hill, 1994
  2. M. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, 1996
  3. H. E. Bal and M. Haines, 'Approaches for Integrating Task and Data Parallelism,' IEEE Concurrency, vol.6, no.3, pp.74-81, 1998 https://doi.org/10.1109/4434.708258
  4. C. P. Kruskal and A. Weiss, 'Allocating independent subtasks on parallel processors', IEEE Trans, Software Eng. vol.11, pp.1001-1016, 1985 https://doi.org/10.1109/TSE.1985.231547
  5. P. Tang and P. C. Yew, 'Processor Self-Scheduling for multiple nested parallel loops,' Proc, 1986 Int. Conf. Parallel Processing, pp.528-535, 1986
  6. Z. Fang, P. Tang, P. C. Yew, and C. Q. Zhu, 'Dynamic Processor Self-Scheduling for General Parallel Nested Loops,' IEEE Trans. on Computers, vol.39, no.7, pp.919-929, 1990 https://doi.org/10.1109/12.55693
  7. C. D. Polychronopoulos and D. Kuck, 'Guided Self-Scheduling : A Practical Scheme for Parallel Supercomputers,' IEEE Trans. on Computers, vol.36, no. 12, pp.1425-1439, 1987
  8. D. L. Eager and J. Zahorjan, 'Adaptive guided self-scheduling,' Tech. Rep. 92-01-01. Dept. of Comput. Sci. and Eng,, univ. of Wash., 1992
  9. S. E. Hummel, E. Schonberg, and L. E. Flynn, 'Factoring: A Method for Scheduling Parallel Loops,' Comm. ACM, vol.35, no.8, pp.90-101, 1992 https://doi.org/10.1145/135226.135232
  10. S. Lucco, 'A Dynamic Scheduling Method for irregular parallel Programs,' Proc. ACM SIGPLAN '92 Conf. Programming Language Design and Implementation, pp.200-211, 1992 https://doi.org/10.1145/143095.143134
  11. T. H. Tzen and L. M. Ni, 'Trapezoid self-scheduling : A practical scheduling scheme for parallel computer,' IEEE Trans. on Parallel and Distributed Syst., vo1.4, pp.87-98, 1993 https://doi.org/10.1109/71.205655
  12. E. P. Markatos and T. J. LeBlanc, 'Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors,' IEEE Trans. on Parallel and Distributed Syst, vo1.5, no.4, pp.379-400, 1994 https://doi.org/10.1109/71.273046
  13. S. Subramaniam and D. L. Eager, 'Affinity Scheduling of Unbalanced Workloads,' Proc, Supercomputing '94, pp.214-226, 1994 https://doi.org/10.1109/SUPERC.1994.344281
  14. Y. Yan, C. Jin and X. Zhang, 'Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems,' IEEE Trans, on Parallel and Distributed Syst., vol.8, no.l , pp.70-81, 1997 https://doi.org/10.1109/71.569656
  15. M. Campione, The Java Tutorial, Addison Wesley, 1999