DOI QR코드

DOI QR Code

Efficient Matrix Multiplication Algorithms and its Application to Development of a High Performance Embedded System

효율적인 행렬 곱 알고리즘 및 이를 활용한 고성능 임베디드 시스템 개발

  • Received : 2018.05.29
  • Accepted : 2018.12.04
  • Published : 2019.01.01

Abstract

In the recent aerospace and defence industries, it is required to develop small and low cost embedded systems. Based on a high speed digital signal processor (DSP), this paper first presents the development of an embedded system. To reduce the computation time of the high precision algorithm such as flight control, we also propose two algorithms for matrix multiplication. Validation results show that, compared to the performance using the $2{\times}2$ unit method, the performance of the proposed method 1 is improved, when the size of matrices is small. The proposed method 2 generally outperforms the $2{\times}2$ unit method.

최근 항공우주 및 방위산업 분야에서 다양한 임베디드 시스템들의 소형화 및 저가화 개발이 많이 요구되고 있다. 본 논문에서는 고속 DSP를 이용하여 임베디드 시스템을 개발한다. 또한 비행 조종과 같은 고정밀 알고리즘의 연산시간을 줄이기 위해 행렬 곱을 위한 두 가지 알고리즘을 제안하고 구현한다. 개발한 임베디드 시스템을 이용하여 성능을 검증한 결과, $2{\times}2$ 단위 계산방법을 이용한 기존 방법과 성능을 비교했을 때 첫 번째 제안방법은 행렬의 사이즈가 작을 때 성능이 개선된다. 두 번째 제안방법은 $2{\times}2$ 단위 계산방법보다 전체적으로 성능이 우세하다.

Keywords

References

  1. Strassen, V., "Gaussian Elimination is not Optimal," Numerische Mathematik, Vol. 13, No. 4, August 1969, pp.354-356. https://doi.org/10.1007/BF02165411
  2. Le Gall, F., "Power of Tensors and Fast Matrix Multiplication," in Proceeding of ISSAC, July 2014, pp.296-303.
  3. Texas Instruments, "TMS320C67X DSP Library Programmer's Reference Guide (SPRU657C)," User Guide, January 2010.
  4. Kim, W., Jeon, W., and Gong, M., "Implementation of the Matrix Multiplication Function Library for an Embedded DSP based Computer," in Proceeding of The Korean Society for Aeronautical and Space Sciences Fall Conference, November 2016, pp.915-916.
  5. Texas Instruments, "Hardware Design Guide for Keystone I Devices (SPRABI2C)," Application Report, August 2013.
  6. Texas Instruments, "Keystone Architecture DSP Bootloader User Guide (SPRUGY5C)," User Guide July 2013.
  7. Dahnoun, N., Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC. WILEY, 2018.
  8. Texas Instruments, "Multicore Fixed and Floating-Point Digital Signal Processor (SPRS708E)," Application Report, March 2014.
  9. Texas Instruments, "DDR3 Design Requirements for Keystone Devices (SPRABI1B)," Application Report, May 2014.
  10. Texas Instruments, "Keystone Architecture DDR3 Memory Controller User's Guide (SPRUGV8E)," User Guide, November 2010.
  11. Lawson, C. L., Hanson, R. J., Kincaid, R. J., and Krogh, F. T., "Basic Linear Algebra Subprograms for Fortran Usage," ACM Transactions on Mathematical Software, Vol. 5, No. 3, September 1979, pp.308-323. https://doi.org/10.1145/355841.355847
  12. Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J. D., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., and Sorensen, D., "LAPACK User's Guide," SIAM, Philadelphia, PA, 1992.