DOI QR코드

DOI QR Code

Performance Improvement Through Aggressive Instruction Packing

적극적인 명령어 압축을 통한 성능향상

  • 지승현 (천안외국어대학교 컴퓨터정보과) ;
  • 김석일 (충북대학교 컴퓨터과학과)
  • Published : 2002.06.01

Abstract

This paper proposes balancing scheduling effort more evenly between the compiler and the processor, by introducing independently scheduled VLIW instructions. Aggressively Packed VLIW (APVLIW) processor is aimed specifically at independent scheduling Very Long Instruction Word(VLIW) instructions with dependency information. The APVLIW processor independently schedules earth instruction within long instructions using functional unit and dynamic scheduler pairs. Every dynamic scheduler dynamically checks far data dependencies and resource collisions while scheduling each instruction. This scheduling is especially effective in applications containing loops. We simulate the architecture and show that the APVLIW processor performs significantly better than the VLIW processor for a wide range of cache sizes and across various numerical benchmark applications.

본 논문에서는 독립적으로 스케쥴링할 수 있는 VLIW 명령어들을 소개함으로써, 컴파일러와 프로세서에서의 스케줄링 작업을 더욱 균등하게 분배할 수 있는 프로세서 구조를 제안하였다. 제안한 APVLIW(Aggressively Packed VLIW) 프로세서의 목표는 자료종속성을 포함한 VLIW 명령어들을 독립적으로 스케줄링이다. APVLIW 프로세서는 기존의 VLIW 코드로부터 대부분의 NOP(No Operations)과 LNOP(Long NOPs) 명령어들을 제거함으로써 압축된 형태의 긴명령어 그룹을 생성한다. 본 논문에서 제안된 APVLIW 프로세서는 여러 개의 연산처리기와 동적 스케줄러의 쌍들과 자료종속성 정보를 사용하여 긴명령어내의 각 명령어를 독립적으로 스케줄링할 수 있다. 이러한 스케줄링 기법은 특히 루프를 포함한 프로그램을 실행할 때 효과적이다. 실험 결과를 통해서 캐시크기의 변화와 벤치마크 프로그램에 상관없이 APVLIW 프로세서가 VLIM 프로세서에 비하여 성능이 향상됨을 확인하였다.

Keywords

References

  1. Ken Sakamura, '21st-century microprocessors,' IEEE Micro, pp.10-11, July/Aug., 2000 https://doi.org/10.1109/MM.2000.10025
  2. Michael J. Flynn, Computer Architecture, Jones & Bartlett Publishers, 1995
  3. P. P. Chang, D. M. Lavery, S. A. Mahlhe, W. Y. Chen, and Wen-Mei.W. Hwu, 'The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors,' IEEE Transactions on Computers, Vol.44, No.3, pp.353-370, 1995 https://doi.org/10.1109/12.372029
  4. Shyh-Kwei Chen, W. Kent Fuchs, and Wen-Mei W. Hwu, 'An analytical approach to scheduling code for superscalar and VLIW architectures,' Proc. International Conference on Parallel Processing, pp.1258-1292 https://doi.org/10.1109/ICPP.1994.50
  5. J. A. Fisher, 'The VLIW machine : A multiprocessor for compiling scientific code,' IEEE Transactions on Computers, pp.45-53, July, 1984 https://doi.org/10.1109/MC.1984.1659185
  6. Barry Fagin, 'Partial Resolution in Branch Target Buffers,' IEEE Computers, Vol.46, No.10, October, 1997 https://doi.org/10.1109/12.628399
  7. Joseph, A. Fisher, 'Trace Scheduling : A Technique for Global Microcode Compaction,' IEEE Transactions on Computers, Vol.C-30, No. 7, pp.478-490, July, 1981 https://doi.org/10.1109/TC.1981.1675827
  8. Roger Espasa and Mateo Valero, 'Exploiting instruction and data-level parallelism,' IEEE Micro, Vol.17, No.5, Sept/Oct., 1997 https://doi.org/10.1109/40.621210
  9. S. A. Mahlke, R. E. Hank, J. E. M. McCormick, D. I. August, and W.W. Hwu, 'A Comparison of Full and Partial Predicated Execution Support for ILP Processors,' Proceedings of the 22th international Symposium on Computer Architecture, pp.138-150, 1995
  10. Thomas M. Conte and Sumedh W. Sathaye, 'Dynamic Rescheduling : A technique for object code compatibility in VLIW architecture,' Proceedings of 28th International Symposium on Microarchitecture, March, 1995 https://doi.org/10.1109/MICRO.1995.476828
  11. Arthur Abnous and Nader Bagherzadeh, 'Pipelining and bypassing in a VLIW processor,' Transactions on Parallel and Distributed Systems, Vol.5, No.6, pp.658-664, June, 1994 https://doi.org/10.1109/71.285612
  12. T. M. Conte and S. W. Sathaye, Dynamic rescheduling ; A technique for object code compatibility in VLIW architecture, Proceedings of the 28th Annual International Symposium on Microarchitecture, pp.208-218, March, 1995 https://doi.org/10.1109/MICRO.1995.476828
  13. Kevin W. Rudd and Michael J. Flynn, 'Instruction-level parallel processors-dynamic and static scheduling tradeoffs,' Proc. The Second AIZU International Symposium on Parallel Algorithms/Architecture Synthesis, pp.74-80, Mar., 1997 https://doi.org/10.1109/AISPAS.1997.581630
  14. Shusuke Okamoto and Masahiro Sowa, 'Hybrid processor based on VLIW and PN-Superscalar,' Proc. PDPTA'96 International Conference, pp.623-632, 1996
  15. Sunghyun Jee and Sukil Kim, 'Performance analysis of caching instructions on SVLIW processor and VLIW processor,' Journal IEEE Korea Council, Vol.1, No.1, Dec. 1997
  16. Susan J. Eggers, Joel S. Emer, Henry M. Levy, and Jack L. Lo, 'Simultaneous multithreading,' IEEE Micro, Vol.17, No.5, Sep., 1997 https://doi.org/10.1109/40.621209
  17. MIPS R4000 Microprocessor User's Manual, MIPS Computer Systems Inc., 1991
  18. B. R. Rau, Dynamically scheduled VLIW processors, Proceedings the 26th Annual International Symposium on Microarchitecture, pp.138-148, March, 1997
  19. T. Hara and H. Ando, 'Performance comparison of ILP machines with cycle time evaluation,' Proceedings of the 23rd Annual International Symposium on Computer Architecture, pp.213-224, March, 1996 https://doi.org/10.1109/ISCA.1996.10015
  20. A. F.de Souza and P. Rounce, Dynamically Scheduling VLIW instructions, Journal of Parallel and Distributed Computing, pp.1480-1511, 2000 https://doi.org/10.1006/jpdc.2000.1661
  21. Sunghyun Jee and Sukil Kim, A Design of A Processor Architecture for Codes With Explicit data Dependencies, Proceedings of the tenth SIAM Conference on Parallel Processing for Scientific Computing 2001, March, 2001
  22. Erik R. Altman, R.Govindarajan and Guang R.Gao, A Unified Framework for Instruction Scheduling and Mapping for Functional Units with Structual Hazards, Journal of Parallel and Distributed Computing, pp.259-293, 1998 https://doi.org/10.1006/jpdc.1998.1442
  23. Sunghyun Jee and Kannappan Palaniappan, 'Dynamically Scheduling VLIW Instructions with Dependency Information,' Proceedings of the 6th Annual Workshop on Interaction between Compilers and Computer Architecture, IEEE Press, 2002 https://doi.org/10.1109/INTERA.2002.995839
  24. Sunghyun Jee and Kannappan Palaniappan, Performance Evaluation For a Compressed-VLIW Processor, Proceedings of the 17th ACM Symposium Applied Computing, March 2002 https://doi.org/10.1145/508791.508967