• Title/Summary/Keyword: Pipelined architecture

Search Result 176, Processing Time 0.021 seconds

Design of Pipelined Floating-Point Arithmetic Unit for Mobile 3D Graphics Applications

  • Choi, Byeong-Yoon;Ha, Chang-Soo;Lee, Jong-Hyoung;Salclc, Zoran;Lee, Duck-Myung
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.6
    • /
    • pp.816-827
    • /
    • 2008
  • In this paper, two-stage pipelined floating-point arithmetic unit (FP-AU) is designed. The FP-AU processor supports seventeen operations to apply 3D graphics processor and has area-efficient and low-latency architecture that makes use of modified dual-path computation scheme, new normalization circuit, and modified compound adder based on flagged prefix adder. The FP-AU has about 4-ns delay time at logic synthesis condition using $0.18{\mu}m$ CMOS standard cell library and consists of about 5,930 gates. Because it has 250 MFLOPS execution rate and supports saturated arithmetic including a number of graphics-oriented operations, it is applicable to mobile 3D graphics accelerator efficiently.

  • PDF

Hardware Implementation for SEED Cipher Processor of Pipeline Architecture (Pipeline 구조의 SEED 암호화 프로세서 구현 및 설계)

  • 채봉수;김기용;조용범
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.125-128
    • /
    • 2002
  • This paper designed a cipher process, which used SEED-Algorithm that is totally domestic technique. This cipher processor is implemented by using SEED-cipher-Algorithm and pipeline scheduling architecture. The cipher is 16-round Feistel architecture but we show just 16-round Feistel architecture for brevity in this thesis. Of course, we can get the result of the 16-round processing by addition of control part simply. Furthermore, it has pipelined architecture, so the speed of cipher process is the faster than others when we performed a cipher a lot of data. The schedule-function can performed the two-cipher process simultaneously, such as using two-cipher processors.

  • PDF

A Study on a VLSI Architecture for Reed-Solomon Decoder Based on the Berlekamp Algorithm (Berlekamp 알고리즘을 이용한 Reed-Solomon 복호기의 VLSI 구조에 관한 연구)

  • 김용환;정영모;이상욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.30B no.11
    • /
    • pp.17-26
    • /
    • 1993
  • In this paper, a VlSI architecture for Reed-Solomon (RS) decoder based on the Berlekamp algorithm is proposed. The proposed decoder provided both erasure and error correcting capability. In order to reduc the chip area, we reformulate the Berlekamp algorithm. The proposed algorithm possesses a recursive structure so that the number of cells for computing the errata locator polynomial can be reduced. Moreover, in our approach, only one finite field multiplication per clock cycle is required for implementation, provided an improvement in the decoding speed, and the overall architecture features parallel and pipelined structure, making a real time decoding possible. From the performance evaluation, it is concluded that the proposed VLSI architecture is more efficient in terms of VLSI implementation than the rcursive architecture based on the Euclid algorithm.

  • PDF

VLSI Architecture for Computer-Generated Hologram (컴퓨터 생성 홀로그램을 위한 VLSI 구조)

  • Seo, Young-Ho;Choi, Hyun-Jun;Kim, Dong-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.7C
    • /
    • pp.540-547
    • /
    • 2008
  • In this paper, we proposed a new VLSI architecture which can generate computer-generated hologram (CGH) in real-time and implemented to hardware. The modified algorithm for high-performance CGH was introduced and re-analyzed (or designing hardware. from both numerical and visual analysis, the infernal number system of hardware was decided. CGH algorithm and precision analysis enabled to propose a new cell architecture for CGH. The operational sequence was analyzed with the architecture of CGH cell and the characteristics of the modified CGH algorithm, and finally the pipelined architecture and the operational timing were proposed.

Design of a Pipelined Deblocking Filter with efficient memory management for high performance H.264 decoders (효율적인 메모리 관리 구조를 갖는 H.264용 고성능 디블록킹 필터 설계)

  • Yu, Yong-Hoon;Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.1
    • /
    • pp.64-70
    • /
    • 2008
  • The H.264 standard is widely used due to the high compression rate and quality. The deblocking filter of the H.264 standard improves the quality of images by eliminating blocking artifacts of pictures, and it requires a lot of computation. We propose a new hardware architecture for the deblocking filter with pipelined architecture, 1-D filters which support both horizontal and vertical filtering and efficient memory management. Four memory blocks are configured for the efficient storage and access of the current macroblock and adjacent referenced sub-macroblocks, and the pixel data from the motion compensation unit can be transferred without waiting during the computation cycles of the deblocking filter. The number of computation cycles and the hardware area are reduced using the proposed architecture, and the performance of the H.264 decoder is improved. We design the deblocking filter using Verilog-HDL and implement using an FPGA. The designed deblocking filter can be used for decoding HD quality images at 77 MHz.

A Design of Pipelined-parallel CABAC Decoder Adaptive to HEVC Syntax Elements (HEVC 구문요소에 적응적인 파이프라인-병렬 CABAC 복호화기 설계)

  • Bae, Bong-Hee;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.5
    • /
    • pp.155-164
    • /
    • 2015
  • This paper describes a design and implementation of CABAC decoder, which would handle HEVC syntax elements in adaptively pipelined-parallel computation manner. Even though CABAC offers the high compression rate, it is limited in decoding performance due to context-based sequential computation, and strong data dependency between context models, as well as decoding procedure bin by bin. In order to enhance the decoding computation of HEVC CABAC, the flag-type syntax elements are adaptively pipelined by precomputing consecutive flag-type ones; and multi-bin syntax elements are decoded by processing bins in parallel up to three. Further, in order to accelerate Binary Arithmetic Decoder by reducing the critical path delay, the update and renormalization of context modeling are precomputed parallel for the cases of LPS as well as MPS, and then the context modeling renewal is selected by the precedent decoding result. It is simulated that the new HEVC CABAC architecture could achieve the max. performance of 1.01 bins/cycle, which is two times faster with respect to the conventional approach. In ASIC design with 65nm library, the CABAC architecture would handle 224 Mbins/sec, which could decode QFHD HEVC video data in real time.

A real-time high speed full search block matching motion estimation processor (고속 실시간 처리 full search block matching 움직임 추정 프로세서)

  • 유재희;김준호
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.12
    • /
    • pp.110-119
    • /
    • 1996
  • A novel high speed VLSI architecture and its VLSI realization methodologies for a motion estimation processor based on full search block matching algorithm are presentd. The presented architecture is designed in order to be suitable for highly parallel and pipelined processing with identical PE's and adjustable in performance and hardware amount according to various application areas. Also, the throughput is maximized by enhancing PE utilization up to 100% and the chip pin count is reduced by reusing image data with embedded image memories. Also, the uniform and identical data processing structure of PE's eases VLSI implementation and the clock rate of external I/O data can be made slower compared to internal clock rate to resolve I/O bottleneck problem. The logic and spice simulation results of the proposed architecture are presented. The performances of the proposed architecture are evaluated and compared with other architectures. Finally, the chip layout is shown.

  • PDF

Pipelined VLSI Architectures for the Hierarchical Block-Matching Algorithm (계층적 블록매칭 알고리즘을 위한 파이프라인식 VLSI 아키텍쳐)

  • Kim, Hyeong-Cheol;Maeng, Seung-Ryeol
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.7
    • /
    • pp.1691-1716
    • /
    • 1998
  • 본 논문에서는 계층적 블록매칭 알고리즘(HBMA)을 위한 두 가지 병렬 VLSI 아키텍쳐를 제안한다. HBMA는 계층에 따른 반복수행과 공간 인터폴레이션을 기반으로 수행되며, 이러한 수행 특성은 병렬처리의 장애요소인 데이터 종속성을 내재하고 있다. 제안된 아키텍쳐는 HBMA의 계층간 데이터 종속성을 해결하기 위하여 기본적으로 파이프라인 구조를 채택하고 있으며, HBMA에서 주어진 매개변수에 따라 세 단계의 스테이지로 구성된다. 제안된 아키텍쳐는 입력 프레임 데이터의 흐름을 제어하는 방식에 따라 두 가지 종류로 구분된다. U-Architecture는 단방향 스캔 순서를 따르도록 설계되었으며, B-Architecture는 양방향 스캔 수서를 따르도록 설계되었다. 각 아키텍쳐의 내부 메모리와 인터폴레이션 모듈은 해당 스캔 순서에 따라 동기적으로 동작할 수 있는 구조를 가진다. 성능분석의 결과로서 본 논문에서 제안한 두 가지 아키텍쳐가 모두 방송용 비디오 포맷을 실시간으로 처리할 수 있음을 보이고, HDTV 포맷은 가까운 장래의 VLSI 기술로 실시간 성능을 얻을 수 있음을 보였다. 또한, B-Architecture는 공간 연결성 내부 메모리 구조를 채택함으로써 입력 데이터의 재활용도를 높이고, 이에 따라 Q-Architecture에 비해서 데이터 입출력 핀의 개수를 약 반정도 줄일 수 있는 특성을 보이고 있다.

  • PDF

Implementation of H.264/AVC Deblocking Filter on 1-D CGRA (1-D CGRA에서의 H.264/AVC 디블록킹 필터 구현)

  • Song, Sehyun;Kim, Kichul
    • Journal of IKEEE
    • /
    • v.17 no.4
    • /
    • pp.418-427
    • /
    • 2013
  • In this paper, we propose a parallel deblocking filter algorithm for H.264/AVC video standard. The deblocking filter has different filter processes according to boundary strength (BS) and each filter process requires various conditional calculations. The order of filtering makes it difficult to parallelize deblocking filter calculations. The proposed deblocking filter algorithm is performed on PRAGRAM which is a 1-D coarse grained reconfigurable architecture (CGRA). Each filter calculation is accelerated using uni-directional pipelined architecture of PRAGRAM. The filter selection and the conditional calculations are efficiently performed using dynamic reconfiguration and conditional reconfiguration. The parallel deblocking filter algorithm uses 225 cycles to process a macroblock and it can process a full HD image at 150 MHz.

Design of Advanced Multiplicative Inverse Operation Circuit for AES Encryption (AES 암호화를 위한 개선된 곱셈 역원 연산기 설계)

  • Kim, Jong-Won;Kang, Min-Sup
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.1-6
    • /
    • 2020
  • This paper proposes the design of an advanced S-Box for calculating multiplicative inverse in AES encryption process. In this approach, advanced S-box module is first designed based on composite field, and then the performance evaluation is performed for S-box with multi-stage pipelining architecture. In the proposed S-Box architecture, each module for multiplicative inverse is constructed using combinational logic for realizing both small-area and high-speed. Through logic synthesis result, the designed 3-stage pipelined S-Box shows speed improvement of about 28% compared to the conventional method. The proposed advanced AES S-Box is performed modelling at the mixed level using Verilog-HDL, and logic synthesis is also performed on Spartan 3s1500l FPGA using Xilinx ISE 14.7 tool.