• Title/Summary/Keyword: Pipelined architecture

Search Result 176, Processing Time 0.027 seconds

VLSI Design of a 2048 Point FFT/IFFT by Sequential Data Processing for Digital Audio Broadcasting System (순차적 데이터 처리방식을 이용한 디지틀 오디오 방송용 2048 Point FFT/IFFT의 VLSI 설계)

  • Choe, Jun-Rim
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.5
    • /
    • pp.65-73
    • /
    • 2002
  • In this paper, we propose and verify an implementation method for a single-chip 2048 complex point FFT/IFFT in terms of sequential data processing. For the sequential processing of 2048 complex data, buffers to store the input data are necessary. Therefore, DRAM-like pipelined commutator architecture is used as a buffer. The proposed structure brings about the 60% chip size reduction compared with conventional approach by using this design method. The 16-point FFT is a basic building block of the entire FFT chip, and the 2048-point FFT consists of the cascaded blocks with five stages of radix-4 and one stage of radix-2. Since each stage requires rounding of the resulting bits while maintaining the proper S/N ratio, the convergent block floating point (CBFP) algorithm is used for the effective internal bit rounding and their method contributed to a single chip design of digital audio broadcasting system.

Efficient Parallel IP Address Lookup Architecture with Smart Distributor (스마트 분배기를 이용한 효율적인 병렬 IP 주소 검색 구조)

  • Kim, Junghwan;Kim, Jinsoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.44-51
    • /
    • 2013
  • Routers should perform fast IP address lookup for Internet to provide high-speed service. In this paper, we present a hybrid parallel IP address lookup structure composed of four-stage pipeline. It achieves parallelism at low cost by using multiple SRAMs in stage 2 and partitioned TCAMs in stage 3, and improves the performance through pipelining. The smart distributor in stage 1 does not transfer any IP address identical to previous one toward the next stage, but only uses the result of the previous lookup. So it improves throughput of lookup by caching effects, and decreases the access conflict to TCAM bank in stage 3 as well. In the last stage, the reorder buffer rearranges the completed IP addresses according to the input order. We evaluate the performance of our parallel pipelined IP lookup structure comparing with previous hybrid structure, using the real routing table and traffic distributions generated by Zipf's law.

An 8b 200 MHz 0.18 um CMOS ADC with 500 MHz Input Bandwidth (500 MHz의 입력 대역폭을 갖는 8b 200 MHz 0.18 um CMOS A/D 변환기)

  • 조영재;배우진;박희원;김세원;이승훈
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.5
    • /
    • pp.312-320
    • /
    • 2003
  • This work describes an 8b 200 MHz 0.18 urn CMOS analog-to-digital converter (ADC) based on a pipelined architecture for flat panel display applications. The proposed ABC employs an improved bootstrapping technique to obtain wider input bandwidth than the sampling tate of 200 MHz. The bootstrapuing technique improves the accuracy of the input sample-and-hold amplifier (SHA) and the fast fourier transform (FFT) analysis of the SHA outputs shows the 7.2 effective number of bits with an input sinusoidal wave frequency of 500 MHz and the sampling clock of 200 MHz at a 1.7 V supply voltage. Merged-capacitor switching (MCS) technique increases the sampling rate of the ADC by reducing the number of capacitors required in conventional ADC's by 50 % and minimizes chip area simultaneously. The simulated ADC in a 0.18 um n-well single-poly quad-metal CMOS technology shows an 8b resolution and a 73 mW power dissipation at a 200 MHz sampling clock and a 1.7 V supply voltage.

Design of Transformation Engine for Mobile 3D Graphics (모바일 3차원 그래픽을 위한 기하변환 엔진 설계)

  • Kim, Dae-Kyoung;Lee, Jee-Myong;Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.49-54
    • /
    • 2007
  • As digital contents based on 3D graphics are increased, the requirement for low power 3D graphic hardware for mobile devices is increased. We design a transformation engine for mobile 3D graphic processor. We propose a simplified transformation engine for mobile 3D graphic processor. The area of the transformation engine is reduced by merging a mapping transformation unit into a projective transformation unit and by replacing a clipping unit with a selection unit. It consists of a viewing transformation unit a projective transformation unit a divide by w nit, and a selection unit. It can process 32 bit floating point format of the IEEE-754 standard or a reduced 24 bit floating point format. It has a pipelined architecture so that a vertex is processed every 4 cycles except for the initial latency. The RTL code is verified using an FPGA.

Differential CORDIC-based High-speed Phase Calculator for 3D Depth Image Extraction from TOF Sensor (TOF 센서용 3차원 깊이 영상 추출을 위한 차동 CORDIC 기반 고속 위상 연산기)

  • Koo, Jung-Youn;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.3
    • /
    • pp.643-650
    • /
    • 2014
  • A hardware implementation of phase calculator for extracting 3D depth image from TOF(Time-Of-Flight) sensor is described. The designed phase calculator adopts redundant binary number systems and a pipelined architecture to improve throughput and speed. It performs arctangent operation using vectoring mode of DCORDIC(Differential COordinate Rotation DIgital Computer) algorithm. Fixed-point MATLAB simulations are carried out to determine the optimal bit-widths and number of iteration. The phase calculator has ben verified by FPGA-in-the-loop verification using MATLAB/Simulink. A test chip has been fabricated using a TSMC $0.18-{\mu}m$ CMOS process, and test results show that the chip functions correctly. It has 82,000 gates and the estimated throughput is 400 MS/s at 400Mhz@1.8V.

A full-Hardwired Low-Power MPEG4@SP Video Encoder for Mobile Applications (모바일 향 저전력 동영상 압축을 위한 고집적 MPEG4@SP 동영상 압축기)

  • Shin, Sun Young;Park, Hyun Sang
    • Journal of Broadcast Engineering
    • /
    • v.10 no.3
    • /
    • pp.392-400
    • /
    • 2005
  • Highly integrated MPEG-4@SP video compression engine, VideoCore, is proposed for mobile application. The primary components of video compression require the high memory bandwidth since they access the external memory frequently. They include motion estimation, motion compensation, quantization, discrete cosine transform, variable length coding, and so on. The motion estimation processor adopted in VideoCore utilizes the small-size local memories such that the video compression system accesses external memory as less frequently as possible. The entire video compression system is divided into two distinct sub-systems: the integer-unit motion estimation part and the others, and both operate concurrently in a pipelined architecture. Thus the VideoCore enables the real-time high-quality video compression with a relatively low operation frequency.

Design and Evaluation of 32-Bit RISC-V Processor Using FPGA (FPGA를 이용한 32-Bit RISC-V 프로세서 설계 및 평가)

  • Jang, Sungyeong;Park, Sangwoo;Kwon, Guyun;Suh, Taeweon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.1
    • /
    • pp.1-8
    • /
    • 2022
  • RISC-V is an open-source instruction set architecture which has a simple base structure and can be extensible depending on the purpose. In this paper, we designed a small and low-power 32-bit RISC-V processor to establish the base for research on RISC-V embedded systems. We designed a 2-stage pipelined processor which supports RISC-V base integer instruction set except for FENCE and EBREAK instructions. The processor also supports privileged ISA for trap handling. It used 1895 LUTs and 1195 flip-flops, and consumed 0.001W on Xilinx Zynq-7000 FPGA when synthesized using Vivado Design Suite. GPIO, UART, and timer peripherals are additionally used to compose the system. We verified the operation of the processor on FPGA with FreeRTOS at 16MHz. We used Dhrystone and Coremark benchmarks to measure the performance of the processor. This study aims to provide a low-power, high-efficiency microprocessor for future extension.

Trace-Back Viterbi Decoder with Sequential State Transition Control (순서적 역방향 상태천이 제어에 의한 역추적 비터비 디코더)

  • 정차근
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.40 no.11
    • /
    • pp.51-62
    • /
    • 2003
  • This paper presents a novel survivor memeory management and decoding techniques with sequential backward state transition control in the trace back Viterbi decoder. The Viterbi algorithm is an maximum likelihood decoding scheme to estimate the likelihood of encoder state for channel error detection and correction. This scheme is applied to a broad range of digital communication such as intersymbol interference removing and channel equalization. In order to achieve the area-efficiency VLSI chip design with high throughput in the Viterbi decoder in which recursive operation is implied, more research is required to obtain a simple systematic parallel ACS architecture and surviver memory management. As a method of solution to the problem, this paper addresses a progressive decoding algorithm with sequential backward state transition control in the trace back Viterbi decoder. Compared to the conventional trace back decoding techniques, the required total memory can be greatly reduced in the proposed method. Furthermore, the proposed method can be implemented with a simple pipelined structure with systolic array type architecture. The implementation of the peripheral logic circuit for the control of memory access is not required, and memory access bandwidth can be reduced Therefore, the proposed method has characteristics of high area-efficiency and low power consumption with high throughput. Finally, the examples of decoding results for the received data with channel noise and application result are provided to evaluate the efficiency of the proposed method.

A Versatile Reed-Solomon Decoder for Continuous Decoding of Variable Block-Length Codewords (가변 블록 길이 부호어의 연속 복호를 위한 가변형 Reed-Solomon 복호기)

  • 송문규;공민한
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.41 no.3
    • /
    • pp.187-187
    • /
    • 2004
  • In this paper, we present an efficient architecture of a versatile Reed-Solomon (RS) decoder which can be programmed to decode RS codes continuously with my message length k as well as any block length n. This unique feature eliminates the need of inserting zeros for decoding shortened RS codes. Also, the values of the parameters n and k, hence the error-correcting capability t can be altered at every codeword block. The decoder permits 3-step pipelined processing based on the modified Euclid's algorithm (MEA). Since each step can be driven by a separate clock, the decoder can operate just as 2-step pipeline processing by employing the faster clock in step 2 and/or step 3. Also, the decoder can be used even in the case that the input clock is different from the output clock. Each step is designed to have a structure suitable for decoding RS codes with varying block length. A new architecture for the MEA is designed for variable values of the t. The operating length of the shift registers in the MEA block is shortened by one, and it can be varied according to the different values of the t. To maintain the throughput rate with less circuitry, the MEA block uses both the recursive technique and the over-clocking technique. The decoder can decodes codeword received not only in a burst mode, but also in a continuous mode. It can be used in a wide range of applications because of its versatility. The adaptive RS decoder over GF($2^8$) having the error-correcting capability of upto 10 has been designed in VHDL, and successfully synthesized in an FPGA chip.

A Versatile Reed-Solomon Decoder for Continuous Decoding of Variable Block-Length Codewords (가변 블록 길이 부호어의 연속 복호를 위한 가변형 Reed-Solomon 복호기)

  • 송문규;공민한
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.41 no.3
    • /
    • pp.29-38
    • /
    • 2004
  • In this paper, we present an efficient architecture of a versatile Reed-Solomon (RS) decoder which can be programmed to decode RS codes continuously with my message length k as well as any block length n. This unique feature eliminates the need of inserting zeros for decoding shortened RS codes. Also, the values of the parameters n and k, hence the error-correcting capability t can be altered at every codeword block. The decoder permits 3-step pipelined processing based on the modified Euclid's algorithm (MEA). Since each step can be driven by a separate clock, the decoder can operate just as 2-step pipeline processing by employing the faster clock in step 2 and/or step 3. Also, the decoder can be used even in the case that the input clock is different from the output clock. Each step is designed to have a structure suitable for decoding RS codes with varying block length. A new architecture for the MEA is designed for variable values of the t. The operating length of the shift registers in the MEA block is shortened by one, and it can be varied according to the different values of the t. To maintain the throughput rate with less circuitry, the MEA block uses both the recursive technique and the over-clocking technique. The decoder can decodes codeword received not only in a burst mode, but also in a continuous mode. It can be used in a wide range of applications because of its versatility. The adaptive RS decoder over GF(2$^{8}$ ) having the error-correcting capability of upto 10 has been designed in VHDL, and successfully synthesized in an FPGA chip.