DOI QR코드

DOI QR Code

A New Overlap Save Algorithm for Fast Convolution

고속 컨벌루션을 위한 새로운 중첩보류기법

  • Kuk, Jung-Gap (Department of Electrical Engineering and Computer Science, Seoul National University, Institute of New Media & Communications) ;
  • Cho, Nam-Ik (Department of Electrical Engineering and Computer Science, Seoul National University, Institute of New Media & Communications)
  • 국중갑 (서울대학교 전기컴퓨터공학부, 뉴미디어 통신공동연구소) ;
  • 조남익 (서울대학교 전기컴퓨터공학부, 뉴미디어 통신공동연구소)
  • Published : 2009.09.30

Abstract

The most widely used block convolution method is the overlap save algorithm (OSA), where a block of M data to be convolved with a filter is concatenated with the previous block and 2M-point FFT and multiplications are performed for this overlapped block. By discarding half of the results, we obtain linear convolution results from the circular convolution. This paper proposes a new transform which reduces the block size to only M for the block convolution. The proposed transform can be implemented as the M multiplications followed by M-point FFT Hence, existing efficient FFT libraries and hardware can be exploited for the implementation of proposed method. Since the required transform size is half that of the conventional method, the overall computational complexity is reduced. Also the reduced transform size results in the reduction of data access time and cash miss-hit ratio, and thus the overall CPU time is reduced. Experiments show that the proposed method requires less computation time than the conventional OSA.

가장 많이 사용되는 변환영역 컨벌루션 알고리즘인 중칩보류기법의 경우 데이터를 M개 단위로 처리하고자 할 때 현재 M개의 데이터를 앞의 M개의 데이터와 연결시킨 2M개의 데이터에 대하여 길이 2M FFT와 주파수 영역 곱셈을 수행하고 뒤의 M개의 데이터를 취함으로써 선형 컨벌루션 결과를 얻는다. 본 논문에서는 새로운 변환을 제시하고 이를 이용하여 M개의 데이터에 대하여 길이 M의 변환을 수행하면 되는 새로운 컨벌루션 알고리즘을 소개한다. 새로 제안된 변환은 M개의 곱셈과 고속 푸리에 변환의 계산으로 이루어지므로 기존의 FFT 라이브러리 또는 하드웨어를 그대로 사용할 수 있다. 또한 기존의 중첩보류기법에 비하여 약간의 계산량 감소가 있고 다루어야 하는 데이터의 길이가 반이므로 데이터 이동 및 기타 처리에서도 이득이 있어서 전체적인 CPU 시간이 줄어든다.

Keywords

References

  1. A. V. Oppenheim, R. W. Schafer and J. R. Buck, Discrete-Time Signal Processing, Upper Saddle River, New Jersey: Prentice Hall, 1998
  2. R. C. Agarwal, C. S. Burus, 'Number theoretic transforms to implement fast digital convolution,' Proceedings of IEEE, Vol. 63, No. 4, pp. 550 - 560, Apr. 1978 https://doi.org/10.1109/PROC.1975.9791
  3. J. L. Vernet 'Improved fourier and hartley transform algorithm: application to cyclic convolution of real data,' IEEE trans. on Acoustics, Speech and Signal Processing, vol. ASSP-35, No. 6, June 1987 https://doi.org/10.1109/TASSP.1987.1165218
  4. P. Duhamel, 'Implementation of 'Split-radix' FFT algorithms for complex, real, and real-symmetric data,' IEEE Trans. on Acoust, Speech and Signal Processing, vol. 34, no.2, pp.285-295, Apr. 1986 https://doi.org/10.1109/TASSP.1986.1164811
  5. Z. Wang, 'Fast algorithms for the discrete W transform and for the discrete fourier transform,' IEEE trans. on Acoustics, Speech and Signal Processing, vol. 32, pp. 803 - 816, Aug. 1984 https://doi.org/10.1109/TASSP.1984.1164399
  6. Intel Performance Libraries. Intel Integrated Performance Primitives website. http://www3.intel.com/cd/software/products/asmona/eng/per-flib/ 302910.htm
  7. R. Matusiak, 'Implementing fast fourier transform algorithms of real-valued sequences with the TMS320 DSP family' Application Report of Texas Instruments, 1997