A Parallel-Architecture Processor Design for the Fast Multiplication of Homogeneous Transformation Matrices

Homogeneous Transformation Matrix의 곱셈을 위한 병렬구조 프로세서의 설계

  • 권두올 (중앙대 공대 전자전기공학부) ;
  • 정태상 (중앙대 공대 전자전기공학부)
  • Published : 2005.12.01

Abstract

The $4{\times}4$ homogeneous transformation matrix is a compact representation of orientation and position of an object in robotics and computer graphics. A coordinate transformation is accomplished through the successive multiplications of homogeneous matrices, each of which represents the orientation and position of each corresponding link. Thus, for real time control applications in robotics or animation in computer graphics, the fast multiplication of homogeneous matrices is quite demanding. In this paper, a parallel-architecture vector processor is designed for this purpose. The processor has several key features. For the accuracy of computation for real application, the operands of the processors are floating point numbers based on the IEEE Standard 754. For the parallelism and reduction of hardware redundancy, the processor takes column vectors of homogeneous matrices as multiplication unit. To further improve the throughput, the processor structure and its control is based on a pipe-lined structure. Since the designed processor can be used as a special purpose coprocessor in robotics and computer graphics, additionally to special matrix/matrix or matrix/vector multiplication, several other useful instructions for various transformation algorithms are included for wide application of the new design. The suggested instruction set will serve as standard in future processor design for Robotics and Computer Graphics. The design is verified using FPGA implementation. Also a comparative performance improvement of the proposed design is studied compared to a uni-processor approach for possibilities of its real time application.

Keywords

References

  1. K. Diefendorff and P. K. Dubey, 'How multimedia workloads will change processor design,' IEEE Computer Magazine, vol. 30, no. 9, pp. 43-45, September 1997 https://doi.org/10.1109/2.612247
  2. C. E. Kozyrakis and D. A. Patterson, 'A new direction for computer architecture research,' IEEE Computer Magazine, vol. 31, no. 11, pp. 24-32, November 1998 https://doi.org/10.1109/2.730733
  3. Ponnuswamy Sadayappan, et al, 'A Restructurable VLSI Robotics Vector Processor Architecture for Real-Time Control,' IEEE Transactions on Robotics and Automation, vol. 5, no. 5, October 1989 https://doi.org/10.1109/70.88078
  4. IEEE Standards Board, 'IEEE Standard for Binary Floating-Point Arithmetic,' ANSI/IEEE Std 754-1985
  5. Saeed B. Niku, Introduction to Robotics: Analysis, Systems, Applications, Prentice Hall, p. 38, 2001
  6. G. J. Myeres, Digital System Design with LSI Bit-Slice Logic, John Wiley & Sons, New York, 1980
  7. B. Parhami, Computer Arithmetic: Algorithms and hardware Designs, Oxford University Press, New York, 2000
  8. Mark R. Santoro, et al, 'Rounding Algorithm for IEEE Multipliers,' Proceedings of the 9th Symposium on Computer Arithmetic, 1989 https://doi.org/10.1109/ARITH.1989.72824
  9. Gary W. Bewick, Fast Multiplication: Algorithms and Implementation, Ph.D. dissertation, Stanford University, February 1994
  10. Nobuhiro Ide, et al, 'A 320-MFLOPS CMOS Floating-Point Processing Unit for Superscalar Processors,' IEEE Journal of Solid-State Circuits, vol. 28, no. 3, March 1993 https://doi.org/10.1109/4.210003
  11. Nhon T. Quach and Michael J. Flynn, 'An Improved Algorithm for High-Speed Floating-Point Addition,' Technical Report: CSL-TR-90-442, Stanford University, August 1990
  12. Nhon T. Quach and Michael J. Flynn, 'Design and Implementation of the SNAP Floating-Point Adder,' Technical Report: CSL-TR-91-501, Stanford University, December 1991