• Title/Summary/Keyword: Pipelined architecture

Search Result 176, Processing Time 0.022 seconds

A Design of High-speed Phase Calculator for 3D Depth Image Extraction from TOF Sensor Data (TOF 센서용 3차원 Depth Image 추출을 위한 고속 위상 연산기 설계)

  • Koo, Jung-Youn;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.2
    • /
    • pp.355-362
    • /
    • 2013
  • A hardware implementation of phase calculator for extracting 3D depth image from TOF(Time-Of-Flight) sensor is described. The designed phase calculator, which adopts a pipelined architecture to improve throughput, performs arctangent operation using vectoring mode of CORDIC algorithm. Fixed-point MATLAB modeling and simulations are carried out to determine the optimized bit-widths and number of iteration. The designed phase calculator is verified by FPGA-in-the-loop verification using MATLAB/Simulink, and synthesized with a TSMC 0.18-${\mu}m$ CMOS cell library. It has 16,000 gates and the estimated throughput is about 9.6 Gbps at 200Mhz@1.8V.

FPGA Implementation of Differential CORDIC-based high-speed phase calculator for 3D Depth Image Extraction (3차원 Depth Image 추출용 Differential CORDIC 기반 고속 위상 연산기의 FPGA 구현)

  • Koo, Jung-youn;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.10a
    • /
    • pp.350-353
    • /
    • 2013
  • In this paper, a hardware implementation of phase calculator for extracting 3D depth image from TOF(Time-Of-Flight) sensor is proposed. The designed phase calculator, which adopts redundant binary number systems and a pipelined architecture to improve throughput and speed, performs arctangent operation using vectoring mode of DCORDIC algorithm. Fixed-point MATLAB simulations are carried out to determine the optimized bit-widths and number of iteration. The designed phase calculator is verified by emulating the restoration of virtual 3D data using MATLAB/Simulink and FPGA-in-the-loop verification, and the estimated performance is about 7.5 Gbps at 469 MHz clock frequency.

  • PDF

An FPGA-based Parallel Hardware Architecture for Real-time Eye Detection

  • Kim, Dong-Kyun;Jung, Jun-Hee;Nguyen, Thuy Tuong;Kim, Dai-Jin;Kim, Mun-Sang;Kwon, Key-Ho;Jeon, Jae-Wook
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.12 no.2
    • /
    • pp.150-161
    • /
    • 2012
  • Eye detection is widely used in applications, such as face recognition, driver behavior analysis, and human-computer interaction. However, it is difficult to achieve real-time performance with software-based eye detection in an embedded environment. In this paper, we propose a parallel hardware architecture for real-time eye detection. We use the AdaBoost algorithm with modified census transform(MCT) to detect eyes on a face image. We parallelize part of the algorithm to speed up processing. Several downscaled pyramid images of the eye candidate region are generated in parallel using the input face image. We can detect the left and the right eye simultaneously using these downscaled images. The sequential data processing bottleneck caused by repetitive operation is removed by employing a pipelined parallel architecture. The proposed architecture is designed using Verilog HDL and implemented on a Virtex-5 FPGA for prototyping and evaluation. The proposed system can detect eyes within 0.15 ms in a VGA image.

A 10-bit 40-MS/s Low-Power CMOS Pipelined A/D Converter Design (10-bit 40-MS/s 저전력 CMOS 파이프라인 A/D 변환기 설계)

  • Lee, Sea-Young;Yu, Sang-Dae
    • Journal of Sensor Science and Technology
    • /
    • v.6 no.2
    • /
    • pp.137-144
    • /
    • 1997
  • In this paper, the design of a 10-bit 40-MS/s pipelined A/D converter is implemented to achieve low static power dissipation of 70 mW at the ${\pm}2.5\;V$ or +5 V power supply environment for high speed applications. A 1.5 b/stage pipeline architecture in the proposed ADC is used to allow large correction range for comparator offset and perform the fast interstage signal processing. According to necessity of high-performance op amps for design of the ADC, the new op amp with gain boosting based on a typical folded-cascode architecture is designed by using SAPICE that is an automatic design tool of op amps based on circuit simulation. A dynamic comparator with a capacitive reference voltage divider that consumes nearly no static power for this low power ADC was adopted. The ADC implemented using a $1.0{\mu}m$ n-well CMOS technology exhibits a DNL of ${\pm}0.6$ LSB, INL of +1/-0.75 LSB and SNDR of 56.3 dB for 9.97 MHz input while sampling at 40 MHz.

  • PDF

The VLSI implementation of RS Decoder using the Modified Euclidean Algorithm (변형 유클리디안 알고리즘을 이용한 리드 - 솔로몬 디코더의 VLSI 구현)

  • 최광석;김수원
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.679-682
    • /
    • 1998
  • This paper presents the VLSI implementation of RS(reed-solomon) decoder using the Modified Euclidean Algorithm(hereafter MEA) for DVD(Digital Versatile Disc) and CD(Compact Disc). The decoder has a capability of correcting 8-error or 16-erasure for DVD and 2-error or 4-erasure for CD. The technique of polynomial evaluation is introduced to realize syndrome calculation and a polynomial expansion circuit is developed to calculate the Forney syndrome polynomial and the erasure locator polynomial. Due to the property of our system with buffer memory, the MEA architecture can have a recursive structure which the number of basic operating cells can be reduced to one. We also proposed five criteria to determine an uncorrectable codeword in using the MEA. The overall architecture is a simple and regular and has a 4-stage pipelined structure.

  • PDF

A Design of RS Decoder for MB-OFDM UWB (MB-OFDM UWB 를 위한 RS 복호기 설계)

  • Choi, Sung-Woo;Shin, Cheol-Ho;Choi, Sang-Sung
    • Proceedings of the Korea Electromagnetic Engineering Society Conference
    • /
    • 2005.11a
    • /
    • pp.131-136
    • /
    • 2005
  • UWB is the most spotlighted wireless technology that transmits data at very high rates using low power over a wide spectrum of frequency band. UWB technology makes it possible to transmit data at rate over 100Mbps within 10 meters. To preserve important header information, MB-OFDM UWB adopts Reed-Solomon(23,17) code. In receiver, RS decoder needs high speed and low latency using efficient hardware. In this paper, we suggest the architecture of RS decoder for MB-OFDM UWB. We adopts Modified-Euclidean algorithm for key equation solver block which is most complex in area. We suggest pipelined processing cell for this block and show the detailed architecture of syndrome, Chien search and Forney algorithm block. At last, we show the hardware implementation results of RS decoder for ASIC implementation.

  • PDF

A Continuous Versatile Reed-Solomon Decoder with Variable Code Rate and Block Length (가변 부호율과 블록 길이를 갖는 연속 가변형 리드솔로몬 복호기)

  • 공민한;송문규
    • Proceedings of the IEEK Conference
    • /
    • 2003.07a
    • /
    • pp.549-552
    • /
    • 2003
  • In this paper, an efficient architecture of a versatile Reed-Solomon (RS) decoder is designed, where the message length k as well as the block length n can be variable. The decoder permits 3-step pipelined processing based on the modified Euclid's algorithm(MEA). A new architecture for the MEA is designed for variable values of error correcting capability t. To maintain the throughput rate with less circuitry, the MEA block uses both the recursive and the overclocking technique. The decoder can decode a codeword received not only in a burst mode, but also in a continuous mode. It can be used in a wide range of applications due to its versatility. A versatile RS decoder over GF(2$^{8}$ ) having the error-correcting capability of up to 10 has been designed in VHDL, and successfully synthesized in an FPGA chip.

  • PDF

Design of an Area-efficient DCME Algorithm for High-speed Reed-Solomon Decoder (고속 Reed-Solomon 복호기를 위한 면적 효율적인 DCME 알고리즘 설계)

  • Kang, Sung Jin
    • Journal of the Semiconductor & Display Technology
    • /
    • v.13 no.4
    • /
    • pp.7-13
    • /
    • 2014
  • In this paper, an area-efficient degree-computationless modified Euclidean (DCME) algorithm is presented and implemented for high-speed Reed-Solomon (RS) decoder. The DCME algorithm can be used to solve the key equation in Reed-Solomon decoder to get the error location polynomial and the error value polynomial. A pipelined recursive structure is adopted for reducing the area of key equation solver (KES) block with sacrifice of an amount of decoding latency. For comparisons, KES block for RS(255,239,8) decoder with the proposed architecture is implemented using Verilog HDL and synthesized using Synopsys design tool and 65nm CMOS technology. The synthesis results show that the proposed architecture can be implemented with less gate counts than other existing DCME architectures.

Design and Implementation of Depth Image Based Real-Time Human Detection

  • Lee, SangJun;Nguyen, Duc Dung;Jeon, Jae Wook
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.2
    • /
    • pp.212-226
    • /
    • 2014
  • This paper presents the design and implementation of a pipelined architecture and a method for real-time human detection using depth image from a Time-of-Flight (ToF) camera. In the proposed method, we use Euclidean Distance Transform (EDT) in order to extract human body location, and we then use the 1D, 2D scanning window in order to extract human joint location. The EDT-based human extraction method is robust against noise. In addition, the 1D, 2D scanning window helps extracting human joint locations easily from a distance image. The proposed method is designed using Verilog HDL (Hardware Description Language) as the dedicated hardware architecture based on pipeline architecture. We implement the dedicated hardware architecture on a Xilinx Virtex6 LX750 Field Programmable Gate Arrays (FPGA). The FPGA implementation can run 80 MHz of maximum operating frequency and show over 60fps of processing performance in the QVGA ($320{\times}240$) resolution depth image.

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

  • Kim, Yong-Hwan;Kim, Dong-Hyeok;Yi, Joo-Young;Kim, Je-Woo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.1
    • /
    • pp.1-9
    • /
    • 2014
  • This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.