• Title/Summary/Keyword: 5-stage pipeline

Search Result 73, Processing Time 0.022 seconds

A VLSI Design of High Performance H.264 CAVLC Decoder Using Pipeline Stage Optimization (파이프라인 최적화를 통한 고성능 H.264 CAVLC 복호기의 VLSI 설계)

  • Lee, Byung-Yup;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.12
    • /
    • pp.50-57
    • /
    • 2009
  • This paper proposes a VLSI architecture of CAVLC hardware decoder which is a tool eliminating statistical redundancy in H.264/AVC video compression. The previous CAVLC hardware decoder used four stages to decode five code symbols. The previous CAVLC hardware architectures decreased decoding performance because there was an unnecessary idle cycle in between state transitions. Likewise, the computation of valid bit length includes an unnecessary idle cycle. This paper proposes hardware architecture to eliminate the idle cycle efficiently. Two methods are applied to the architecture. One is a method which eliminates an unnecessary things of buffers storing decoded codes and then makes efficient pipeline architecture. The other one is a shifter control to simplify operations and controls in the process of calculating valid bit length. The experimental result shows that the proposed architecture needs only 89 cycle in average for one macroblock decoding. This architecture improves the performance by about 29% than previous designs. The synthesis result shows that the design achieves the maximum operating frequency at 140Mhz and the hardware cost is about 11.5K under a 0.18um CMOS process. Comparing with the previous design, it can achieve low-power operation because this design is implemented with high throughputs and low gate count.

A 10-bit 40-MS/s Low-Power CMOS Pipelined A/D Converter Design (10-bit 40-MS/s 저전력 CMOS 파이프라인 A/D 변환기 설계)

  • Lee, Sea-Young;Yu, Sang-Dae
    • Journal of Sensor Science and Technology
    • /
    • v.6 no.2
    • /
    • pp.137-144
    • /
    • 1997
  • In this paper, the design of a 10-bit 40-MS/s pipelined A/D converter is implemented to achieve low static power dissipation of 70 mW at the ${\pm}2.5\;V$ or +5 V power supply environment for high speed applications. A 1.5 b/stage pipeline architecture in the proposed ADC is used to allow large correction range for comparator offset and perform the fast interstage signal processing. According to necessity of high-performance op amps for design of the ADC, the new op amp with gain boosting based on a typical folded-cascode architecture is designed by using SAPICE that is an automatic design tool of op amps based on circuit simulation. A dynamic comparator with a capacitive reference voltage divider that consumes nearly no static power for this low power ADC was adopted. The ADC implemented using a $1.0{\mu}m$ n-well CMOS technology exhibits a DNL of ${\pm}0.6$ LSB, INL of +1/-0.75 LSB and SNDR of 56.3 dB for 9.97 MHz input while sampling at 40 MHz.

  • PDF

Design and Implementation of Stream Cipher based on SHACAL-2 Superior in the Confidentiality and Integrity (기밀성과 무결성이 우수한 SHACAL-2 기반 스트림 암호 설계 및 구현)

  • Kim, Gil Ho;Cho, Gyeong Yeon
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.12
    • /
    • pp.1427-1438
    • /
    • 2013
  • We have developed a 128-bit stream cipher algorithm composed of the 5-stage pipeline, capable of real-time processing, confidentiality and integrity. The developed stream cipher is a stream cipher algorithm that makes the final 128-bit ciphers through a whitening process after making the ASR 277 bit and SHACAL-2 and applying them to the CFB mode. We have verified the hardware performance of the proposed stream cipher algorithm with Modelsim 6.5d and Quartus II 12.0, and the result shows that the hardware runs at 33.34Mhz(4.27Gbps) at worst case. According to the result, the new cipher algorithm has fully satisfied the speed requirement of wireless Internet and sensor networks, and DRM environment. Therefore, the proposed algorithm with satisfaction of both confidentiality and integrity provides a very useful ideas.

Hardware Design and Implementation of a Parallel Processor for High-Performance Multimedia Processing (고성능 멀티미디어 처리용 병렬프로세서 하드웨어 설계 및 구현)

  • Kim, Yong-Min;Hwang, Chul-Hee;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.5
    • /
    • pp.1-11
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements (PEs) and operates on a 3-stage pipelining. Experimental results indicated that the proposed parallel processor outperforms conventional parallel processors in terms of performance. In addition, our proposed parallel processor outperforms commercial high-performance TI C6416 DSP in terms of performance (1.4-31.4x better) and energy efficiency (5.9-8.1x better) with same 130nm technology and 720 clock frequency. The proposed parallel processor was developed with verilog HDL and verified with a FPGA prototype system.

A Design of Fractional Motion Estimation Engine with 4×4 Block Unit of Interpolator & SAD Tree for 8K UHD H.264/AVC Encoder (8K UHD(7680×4320) H.264/AVC 부호화기를 위한 4×4블럭단위 보간 필터 및 SAD트리 기반 부화소 움직임 추정 엔진 설계)

  • Lee, Kyung-Ho;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.6
    • /
    • pp.145-155
    • /
    • 2013
  • In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Fractional Motion Estimation in 8K UHD($7680{\times}4320$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $10{\times}10$ reference data for interpolation, we design 2D cache buffer which consists of the $10{\times}10$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The gate count is 436.5Kgates. The proposed H.264/AVC Fractional Motion Estimation can support 8K UHD at 30 frames per second by running at 187MHz.

A Design of 4×4 Block Parallel Interpolation Motion Compensation Architecture for 4K UHD H.264/AVC Decoder (4K UHD급 H.264/AVC 복호화기를 위한 4×4 블록 병렬 보간 움직임보상기 아키텍처 설계)

  • Lee, Kyung-Ho;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.5
    • /
    • pp.102-111
    • /
    • 2013
  • In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Motion Compensation in 4K UHD($3840{\times}2160$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $9{\times}9$ reference data for interpolation, we design 2D cache buffer which consists of the $9{\times}9$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The maximum operation frequency is 150MHz. The gate count is 161Kgates. The proposed H.264/AVC Motion Compensation can support 4K UHD at 72 frames per second by running at 150MHz.

Optimized Hardware Design of Deblocking Filter for H.264/AVC (H.264/AVC를 위한 디블록킹 필터의 최적화된 하드웨어 설계)

  • Jung, Youn-Jin;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.1
    • /
    • pp.20-27
    • /
    • 2010
  • This paper describes a design of 5-stage pipelined de-blocking filter with power reduction scheme and proposes a efficient memory architecture and filter order for high performance H.264/AVC Decoder. Generally the de-blocking filter removes block boundary artifacts and enhances image quality. Nevertheless filter has a few disadvantage that it requires a number of memory access and iterated operations because of filter operation for 4 time to one edge. So this paper proposes a optimized filter ordering and efficient hardware architecture for the reduction of memory access and total filter cycles. In proposed filter parallel processing is available because of structured 5-stage pipeline consisted of memory read, threshold decider, pre-calculation, filter operation and write back. Also it can reduce power consumption because it uses a clock gating scheme which disable unnecessary clock switching. Besides total number of filtering cycle is decreased by new filter order. The proposed filter is designed with Verilog-HDL and functionally verified with the whole H.264/AVC decoder using the Modelsim 6.2g simulator. Input vectors are QCIF images generated by JM9.4 standard encoder software. As a result of experiment, it shows that the filter can make about 20% total filter cycles reduction and it requires small transposition buffer size.

High Performance Coprocessor Architecture for Real-Time Dense Disparity Map (실시간 Dense Disparity Map 추출을 위한 고성능 가속기 구조 설계)

  • Kim, Cheong-Ghil;Srini, Vason P.;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.14A no.5
    • /
    • pp.301-308
    • /
    • 2007
  • This paper proposes high performance coprocessor architecture for real time dense disparity computation based on a phase-based binocular stereo matching technique called local weighted phase-correlation(LWPC). The algorithm combines the robustness of wavelet based phase difference methods and the basic control strategy of phase correlation methods, which consists of 4 stages. For parallel and efficient hardware implementation, the proposed architecture employs SIMD(Single Instruction Multiple Data Stream) architecture for each functional stage and all stages work on pipelined mode. Such that the newly devised pipelined linear array processor is optimized for the case of row-column image processing eliminating the need for transposed memory while preserving generality and high throughput. The proposed architecture is implemented with Xilinx HDL tool and the required hardware resources are calculated in terms of look up tables, flip flops, slices, and the amount of memory. The result shows the possibility that the proposed architecture can be integrated into one chip while maintaining the processing speed at video rate.

Implementation of a Parallel Viterbi Decoder for High Speed Multimedia Communications (멀티미디어 통신용 병렬 아키텍쳐 고속 비터비 복호기 설계)

  • Lee, Byeong-Cheol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.2
    • /
    • pp.78-84
    • /
    • 2000
  • The Viterbi decoders can be classified into serial Viterbi decoders and parallel Viterbi decoders. Parallel Viterbi decoders can handle higher data rates than serial Viterbl decoders. This paper designs and implements a fully parallel Viterbi decoder for high speed multimedia communications. For high speed operations, the ACS (Add-Compare-Select) module consisting of 64 PEs (Processing Elements) can compute one stage in a clock. In addition, the systolic away structure with 32 pipeline stages is developed for the TB (traceback) module. The implemented Viterbi decoder can support code rates 1/2, 2/3, 3/4, 5/6 and 7/8 using punctured codes. We have developed Verilog HDL models and performed logic synthesis. The 0.6 ${\mu}{\textrm}{m}$ SAMSUNG KG75000 SOG cell library has been used. The implemented Viterbi decoder has about 100,400 gates, and is running at 70 MHz in the worst case simulation.

  • PDF

ASIC Design of OpenRISC-based Multimedia SoC Platform (OpenRISC 기반 멀티미디어 SoC 플랫폼의 ASIC 설계)

  • Kim, Sun-Chul;Ryoo, Kwang-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.281-284
    • /
    • 2008
  • This paper describes ASIC design of multimedia SoC Platform. The implemented Platform consists of 32-bit OpenRISC1200 Microprocessor, WISHBONE on-chip bus, VGA Controller, Debug Interface, SRAM Interface and UART. The 32-bit OpenRISC1200 processor has 5 stage pipeline and Harvard architecture with separated instruction/data bus. The VGA Controller can display RCB data on a CRT or LCD monitor. The Debug Interface supports a debugging function for the Platform. The SRAM Interface supports 18-bit address bus and 32-bit data bus. The UART provides RS232 protocol, which supports serial communication function. The Platform is design and verified on a Xilinx VERTEX-4 XC4VLX80 FPGA board. Test code is generated by a cross compiler' and JTAG utility software and gdb are used to download the test code to the FPGA board through parallel cable. Finally, the Platform is implemented into a single ASIC chip using Chatered 0.18um process and it can operate at 100MHz clock frequency.

  • PDF