Design of Vector Register Architecture in DSP Processor for Efficient Multimedia Processing

  • Wu, Chou-Pin (Dept. of Electrical Engineering, National Tsing Hua University) ;
  • Wu, Jen-Ming (Dept. of Electrical Engineering, National Tsing Hua University)
  • Published : 2007.12.31


In this paper, we present an efficient instruction set architecture using vector register file hardware to accelerate operation of general matrix-vector operations in DSP microprocessor. The technique enables in-situ row-access as well as column access to the register files. It can reduce the number of memory access significantly. The technique is especially useful for block-based video signal processing kernels such as FFT/IFFT, DCT/IDCT, and two-dimensional filtering. We have applied the new instruction set architecture to in-loop deblocking filter processing in H.264 decoder. Performance comparisons show that the required load/store operations for the in-loop deblocking filter can be reduced about 42%. The architecture would improve the processing speed, and code density in DSP microprocessor especially for video signal processing substantially.



  1. 'Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264 I ISO/IEC 14496-10 AVC),' NT G050, 2003
  2. lain E. G. Richardson,'H.264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia', John Wiley and Sons, 2003
  3. Philip P. Dang, 'An Efficient Implementation of In-loop Deblocking Filters for H.264 Using VLIW Architecture and Predication,' STMicroelectronics Inc., IEEE Intl Conf on Consumer Electronics, 812 Jan. 2005 pp.291-292
  4. Peter List, Anthony Joch, Jani Lainema, Gisle Bjntegarrd, and Marta Karczewicz, 'Adaptive Deblocking Filter,' IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No.7, July 2003
  5. Shen-Yu Shih ,Cheng-Ru Chang and Yung-Long Lin 'A Near Optimal Deblocking Filter for H.264 Advanced Video Coding' IEEE Asia and South Pacific Conferenceon Design Automation, Jan. 2006
  6. Yu-Wen Hung, To-Wei Chen, Bing-Yu Wang, and Liang-Gee Chen, 'Architecture Design For Deblockong Filter In H.264/JVT/AVC,' IEEE Int'l Conf. On Multimedia Expo(ICME), Vol. 1,6-9 July 2003, pp. 693-6
  7. B. Hanounik X. Hu, 'Linear-time Matrix Transpose Algorithms Using Vector Register File With Diagonal Registers' Proc. IEEE 15th Int'l Conf. on Parallel and Distributed Processing, April,2001
  8. Yoochang Jung, Stefan G. Berg, Donglok Kim, and Yongmin Kim, 'A Register File with Transposed Access Mode,' IEEE Int'l Conf On Computer Design, Sept. 2000, pp. 559-560
  9. Fan, K.; Clark, N.; Chu, M.; Manjunath, K.Y.; Rajiv Ravindran; Smelyanskiy, M.; Mahlke, S,' Systematic register bypass customization for application-specific processors,' IEEE Int' Confon Application-Specific Systems, Architectures, and Processors, 24-26 June 2003
  10. A Shrivastava, E. Earlie, N.D Dutt, A. Nicolau, 'Retargetable pipeline hazard detection for partially bypassed processors,' IEEE Trans. on Very Large Scale Integration (VLSI) Systems, Vol. 14, Issue 8, Aug. 2006 pp.791-801
  11. A. Shrivastava, S. Park, E. Earlie, N.D. Dutt,A. Nicolau, Y. Paek, Y, 'Automatic Design Space Exploration of Register Bypasses in Embedded Processors,' IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Vol. 26, Issue 12, Dec. 2007 pp. 2102 - 2115