• Title/Summary/Keyword: pipelined parallel architecture

Search Result 38, Processing Time 0.023 seconds

Design of A Low-Voltage and High-Speed Pipelined A/D Converter Using Current-Mode Signals (저전압 고속 전류형 Pipelined A/D 변환기의 설계)

  • 박승균;이희덕;한철희
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.31A no.3
    • /
    • pp.18-27
    • /
    • 1994
  • An 8-bit 2-stage pipelined current mode A/D converter is designed with a new architecture, where the wideband track-and-hold amplifiers which have 2 integrators in parallel sample input signal twice per clock cycle. The conversion speed of the A-D converter is two times faster than that of conventional pipelined method. The converter is designed to be operated at the power supply voltage of 3.3V with the input dynamic range of 0-256$\mu$A. HSPICE simulation results show the performance of up to 55Msamples/s and power consumption of 150mW with the parameters of ISRC $1.5\mu$m BICMOS process. The chip area is 3${\times}4mm^{2}$.

  • PDF

3D graphics processor architecture based on multistreaming (다중스트리밍을 이용한 3차원 그래픽 프로세서 구조)

  • 박용진;이동호
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.9
    • /
    • pp.10-21
    • /
    • 1997
  • In this paper, we propose multiple instruction issuable multi-streaming as a processor architecture for 3D graphics processor. Multistreaming can eliminate inteferences within concurrently executing instructions inthe pipelined processor to allow enough parallelism for parallel processing. Through cycle level simulation study, we show that the proposed architecture outperforms a conventional RISC processor, MIPS R3000 by three times with reasonable resource overheads. Multiple instruction issuable multistreaming processor will be a bood architecture for instruction processor when a large number of threads are guaranteed.

  • PDF

A Design of Pipelined-parallel CABAC Decoder Adaptive to HEVC Syntax Elements (HEVC 구문요소에 적응적인 파이프라인-병렬 CABAC 복호화기 설계)

  • Bae, Bong-Hee;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.5
    • /
    • pp.155-164
    • /
    • 2015
  • This paper describes a design and implementation of CABAC decoder, which would handle HEVC syntax elements in adaptively pipelined-parallel computation manner. Even though CABAC offers the high compression rate, it is limited in decoding performance due to context-based sequential computation, and strong data dependency between context models, as well as decoding procedure bin by bin. In order to enhance the decoding computation of HEVC CABAC, the flag-type syntax elements are adaptively pipelined by precomputing consecutive flag-type ones; and multi-bin syntax elements are decoded by processing bins in parallel up to three. Further, in order to accelerate Binary Arithmetic Decoder by reducing the critical path delay, the update and renormalization of context modeling are precomputed parallel for the cases of LPS as well as MPS, and then the context modeling renewal is selected by the precedent decoding result. It is simulated that the new HEVC CABAC architecture could achieve the max. performance of 1.01 bins/cycle, which is two times faster with respect to the conventional approach. In ASIC design with 65nm library, the CABAC architecture would handle 224 Mbins/sec, which could decode QFHD HEVC video data in real time.

An FPGA-based Parallel Hardware Architecture for Real-time Eye Detection

  • Kim, Dong-Kyun;Jung, Jun-Hee;Nguyen, Thuy Tuong;Kim, Dai-Jin;Kim, Mun-Sang;Kwon, Key-Ho;Jeon, Jae-Wook
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.12 no.2
    • /
    • pp.150-161
    • /
    • 2012
  • Eye detection is widely used in applications, such as face recognition, driver behavior analysis, and human-computer interaction. However, it is difficult to achieve real-time performance with software-based eye detection in an embedded environment. In this paper, we propose a parallel hardware architecture for real-time eye detection. We use the AdaBoost algorithm with modified census transform(MCT) to detect eyes on a face image. We parallelize part of the algorithm to speed up processing. Several downscaled pyramid images of the eye candidate region are generated in parallel using the input face image. We can detect the left and the right eye simultaneously using these downscaled images. The sequential data processing bottleneck caused by repetitive operation is removed by employing a pipelined parallel architecture. The proposed architecture is designed using Verilog HDL and implemented on a Virtex-5 FPGA for prototyping and evaluation. The proposed system can detect eyes within 0.15 ms in a VGA image.

Performance Optimization of Parallel Algorithms

  • Hudik, Martin;Hodon, Michal
    • Journal of Communications and Networks
    • /
    • v.16 no.4
    • /
    • pp.436-446
    • /
    • 2014
  • The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.

Implementation of H.264/AVC Deblocking Filter on 1-D CGRA (1-D CGRA에서의 H.264/AVC 디블록킹 필터 구현)

  • Song, Sehyun;Kim, Kichul
    • Journal of IKEEE
    • /
    • v.17 no.4
    • /
    • pp.418-427
    • /
    • 2013
  • In this paper, we propose a parallel deblocking filter algorithm for H.264/AVC video standard. The deblocking filter has different filter processes according to boundary strength (BS) and each filter process requires various conditional calculations. The order of filtering makes it difficult to parallelize deblocking filter calculations. The proposed deblocking filter algorithm is performed on PRAGRAM which is a 1-D coarse grained reconfigurable architecture (CGRA). Each filter calculation is accelerated using uni-directional pipelined architecture of PRAGRAM. The filter selection and the conditional calculations are efficiently performed using dynamic reconfiguration and conditional reconfiguration. The parallel deblocking filter algorithm uses 225 cycles to process a macroblock and it can process a full HD image at 150 MHz.

A Study on a VLSI Architecture for Reed-Solomon Decoder Based on the Berlekamp Algorithm (Berlekamp 알고리즘을 이용한 Reed-Solomon 복호기의 VLSI 구조에 관한 연구)

  • 김용환;정영모;이상욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.30B no.11
    • /
    • pp.17-26
    • /
    • 1993
  • In this paper, a VlSI architecture for Reed-Solomon (RS) decoder based on the Berlekamp algorithm is proposed. The proposed decoder provided both erasure and error correcting capability. In order to reduc the chip area, we reformulate the Berlekamp algorithm. The proposed algorithm possesses a recursive structure so that the number of cells for computing the errata locator polynomial can be reduced. Moreover, in our approach, only one finite field multiplication per clock cycle is required for implementation, provided an improvement in the decoding speed, and the overall architecture features parallel and pipelined structure, making a real time decoding possible. From the performance evaluation, it is concluded that the proposed VLSI architecture is more efficient in terms of VLSI implementation than the rcursive architecture based on the Euclid algorithm.

  • PDF

A real-time high speed full search block matching motion estimation processor (고속 실시간 처리 full search block matching 움직임 추정 프로세서)

  • 유재희;김준호
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.12
    • /
    • pp.110-119
    • /
    • 1996
  • A novel high speed VLSI architecture and its VLSI realization methodologies for a motion estimation processor based on full search block matching algorithm are presentd. The presented architecture is designed in order to be suitable for highly parallel and pipelined processing with identical PE's and adjustable in performance and hardware amount according to various application areas. Also, the throughput is maximized by enhancing PE utilization up to 100% and the chip pin count is reduced by reusing image data with embedded image memories. Also, the uniform and identical data processing structure of PE's eases VLSI implementation and the clock rate of external I/O data can be made slower compared to internal clock rate to resolve I/O bottleneck problem. The logic and spice simulation results of the proposed architecture are presented. The performances of the proposed architecture are evaluated and compared with other architectures. Finally, the chip layout is shown.

  • PDF

High Performance Coprocessor Architecture for Real-Time Dense Disparity Map (실시간 Dense Disparity Map 추출을 위한 고성능 가속기 구조 설계)

  • Kim, Cheong-Ghil;Srini, Vason P.;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.14A no.5
    • /
    • pp.301-308
    • /
    • 2007
  • This paper proposes high performance coprocessor architecture for real time dense disparity computation based on a phase-based binocular stereo matching technique called local weighted phase-correlation(LWPC). The algorithm combines the robustness of wavelet based phase difference methods and the basic control strategy of phase correlation methods, which consists of 4 stages. For parallel and efficient hardware implementation, the proposed architecture employs SIMD(Single Instruction Multiple Data Stream) architecture for each functional stage and all stages work on pipelined mode. Such that the newly devised pipelined linear array processor is optimized for the case of row-column image processing eliminating the need for transposed memory while preserving generality and high throughput. The proposed architecture is implemented with Xilinx HDL tool and the required hardware resources are calculated in terms of look up tables, flip flops, slices, and the amount of memory. The result shows the possibility that the proposed architecture can be integrated into one chip while maintaining the processing speed at video rate.

Energy Efficient and Low-Cost Server Architecture for Hadoop Storage Appliance

  • Choi, Do Young;Oh, Jung Hwan;Kim, Ji Kwang;Lee, Seung Eun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4648-4663
    • /
    • 2020
  • This paper proposes the Lempel-Ziv 4(LZ4) compression accelerator optimized for scale-out servers in data centers. In order to reduce CPU loads caused by compression, we propose an accelerator solution and implement the accelerator on an Field Programmable Gate Array(FPGA) as heterogeneous computing. The LZ4 compression hardware accelerator is a fully pipelined architecture and applies 16 dictionaries to enhance the parallelism for high throughput compressor. Our hardware accelerator is based on the 20-stage pipeline and dictionary architecture, highly customized to LZ4 compression algorithm and parallel hardware implementation. Proposing dictionary architecture allows achieving high throughput by comparing input sequences in multiple dictionaries simultaneously compared to a single dictionary. The experimental results provide the high throughput with intensively optimized in the FPGA. Additionally, we compare our implementation to CPU implementation results of LZ4 to provide insights on FPGA-based data centers. The proposed accelerator achieves the compression throughput of 639MB/s with fine parallelism to be deployed into scale-out servers. This approach enables the low power Intel Atom processor to realize the Hadoop storage along with the compression accelerator.