• Title/Summary/Keyword: pipelined parallel architecture

Search Result 38, Processing Time 0.02 seconds

High-Performance Low-Power FFT Cores

  • Han, Wei;Erdogan, Ahmet T.;Arslan, Tughrul;Hasan, Mohd.
    • ETRI Journal
    • /
    • v.30 no.3
    • /
    • pp.451-460
    • /
    • 2008
  • Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low-power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low-power commutators based on an advanced interconnection, and parallel-pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel-pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures.

  • PDF

Pipelined Parallel Processing System for Image Processing (영상처리를 위한 Pipelined 병렬처리 시스템)

  • Lee, Hyung;Kim, Jong-Bae;Choi, Sung-Hyk;Park, Jong-Won
    • Journal of IKEEE
    • /
    • v.4 no.2 s.7
    • /
    • pp.212-224
    • /
    • 2000
  • In this paper, a parallel processing system is proposed for improving the processing speed of image related applications. The proposed parallel processing system is fully synchronous SIMD computer with pipelined architecture and consists of processing elements and a multi-access memory system. The multi-access memory system is made up of memory modules and a memory controller, which consists of memory module selection module, data routing module, and address calculating and routing module, to perform parallel memory accesses with the variety of types: block, horizontal, and vertical access way. Morphological filter had been applied to verify the parallel processing system and resulted in faithful processing speed.

  • PDF

Design of a Parallel Pipelined Processor Architecture (병렬 파이프라인 프로세서 아키덱처의 설계)

  • 이상정;김광준
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.3
    • /
    • pp.11-23
    • /
    • 1995
  • In this paper, a parallel pipelined processor model which acts as a small VLIW processor architecture and a scheduling algorithm for extracting instruction-level parallelism on this architecture are proposed. The proposed model has a dual-instruction mode which has maximum 4 basic operations being executed in parallel. By combining these basic operations, variable instruction set can be designed for various applications. The scheduling algorithm schedules basic operations for parallel execution and removes pipeline hazards by examining data dependency and resource conflict relations. In order to examine operation and evaluate the performance,a C compiler and a simulator are developed. By simulating various test programs with the compiler and the simulator, the characteristics and the performance result of the proposed architecture are measured.

  • PDF

Implementation of High-Speed Reed-Solomon Decoder Using the Modified Euclid's Algorithm (개선된 수정 유클리드 알고리듬을 이용한 고속의 Reed-Solomon 복호기의 설계)

  • 김동선;최종찬;정덕진
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.7
    • /
    • pp.909-915
    • /
    • 1999
  • In this paper, we propose an efficient VLSI architecture of Reed-Solomon(RS) decoder. To improve the speed. we develope an architecture featuring parallel and pipelined processing. To implement the parallel and pipelined processing architecture, we analyze the RS decoding algorithm and the honor's algorithm for parallel processing and we also modified the Euclid's algorithm to apply the efficient parallel structure in RS decoder. To show the proposed architecture, the performance of the proposed RS decoder is compared to Shao's and we obtain the 10 % efficiency in area and three times faster in speed when it's compared to Shao's time domain decoder. In addition, we implemented the proposed RS decoder with Altera FPGA Flex10K-50.

  • PDF

A Pipelined Parallel Optimized Design for Convolution-based Non-Cascaded Architecture of JPEG2000 DWT (JPEG2000 이산웨이블릿변환의 컨볼루션기반 non-cascaded 아키텍처를 위한 pipelined parallel 최적화 설계)

  • Lee, Seung-Kwon;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.7
    • /
    • pp.29-38
    • /
    • 2009
  • In this paper, a high performance pipelined computing design of parallel multiplier-temporal buffer-parallel accumulator is present for the convolution-based non-cascaded architecture aiming at the real time Discrete Wavelet Transform(DWT) processing. The convolved multiplication of DWT would be reduced upto 1/4 by utilizing the filter coefficients symmetry and the up/down sampling; and it could be dealt with 3-5 times faster computation by LUT-based DA multiplication of multiple filter coefficients parallelized for product terms with an image data. Further, the reutilization of computed product terms could be achieved by storing in the temporal buffer, which yields the saving of computation as well as dynamic power by 50%. The convolved product terms of image data and filter coefficients are realigned and stored in the temporal buffer for the accumulated addition. Then, the buffer management of parallel aligned storage is carried out for the high speed sequential retrieval of parallel accumulations. The convolved computation is pipelined with parallel multiplier-temporal buffer-parallel accumulation in which the parallelization of temporal buffer and accumulator is optimize, with respect to the performance of parallel DA multiplier, to improve the pipelining performance. The proposed architecture is back-end designed with 0.18um library, which verifies the 30fps throughput of SVGA(800$\times$600) images at 90MHz.

Design of an Area-Efficient Reed-Solomon Decoder using Pipelined Recursive Technique (파이프라인 재귀적인 기술을 이용한 면적 효율적인 Reed-Solomon 복호기의 설계)

  • Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.7 s.337
    • /
    • pp.27-36
    • /
    • 2005
  • This paper presents an area-efficient architecture to implement the high-speed Reed-Solomon(RS) decoder, which is used in a variety of communication systems such as wireless and very high-speed optical communications. We present the new pipelined-recursive Modified Euclidean(PrME) architecture to achieve high-throughput rate and reducing hardware-complexity using folding technique. The proposed pipelined recursive architecture can reduce the hardware complexity about 80$\%$ compared to the conventional systolic-array and fully-parallel architecture. The proposed RS decoder has been designed and implemented with the 0.13um CMOS technology in a supply voltage of 1.2 V. The result show that total number of gate is 393 K and it has a data processing rate of S Gbits/s at clock frequency of 625 MHz. The proposed area-efficient architecture can be readily applied to the next generation FEC devices for high-speed optical communications as well as wireless communications.

Implementation of Reed-Solomon Decoder Using the efficient Modified Euclid Module (효율적 구조의 수정 유클리드 구조를 이용한 Reed-Solomon 복호기의 설계)

  • Kim, Dong-Sun;Chung, Duck-Jin
    • Proceedings of the KIEE Conference
    • /
    • 1998.11b
    • /
    • pp.575-578
    • /
    • 1998
  • In this paper, we propose a VLSI architecture of Reed-Solomon decoder. Our goal is the development of an architecture featuring parallel and pipelined processing to improve the speed and low power design. To achieve the this goal, we analyze the RS decoding algorithm to be used parallel and pipelined processing efficiently, and modified the Euclid's algorithm arithmetic part to apply the parallel structure in RS decoder. The overall RS decoder are compared to Shao's, and we show the 10% area efficiency than Shao's time domain decoder and three times faster, in addition, we approve the proposed RS decoders with Altera FPGA Flex 10K-50, and Implemeted with LG 0.6{\mu}$ processing.

  • PDF

A design of synchronous nonlinear and parallel for pipeline stage on IP-based H.264 decoder implementation (IP기반 H.264 디코더 설계를 위한 동기식 비선형 및 병렬화 파이프라인 설계)

  • Ko, Byung-Soo;Kong, Jin-Hyeung
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.409-410
    • /
    • 2008
  • This paper presents nonlinear and parallel design for synchronous pipelining in IP-based H.264 decoder implementation. Since H.264 decoder includes the dataflow of feedback loop, the data dependency requires one NOP stage per pipelining latency to drop the throughput into 1/2. Further, it is found that, in execution time, the stage scheduled for MC is more occupied than that for CAVLD/ITQ/DF. The less efficient stage would be improved by nonlinear scheduling, while the fully-utilized stage could be accelerated by parallel scheduling of IP. The optimization yields 3 nonlinear {CAVLD&ITQ}|3 parallel (MC/IP&Rec.)| 3 nonlinear {DF} pipelined architecture for IP-based H.264 decoder. In experiments, the nonlinear and parallel pipelined H.264 decoder, including existing IPs, could deal with full HD video at 41.86MHz, in real time processing.

  • PDF

Design and Comparison of the Pipelined IFFT/FFT modules for IEEE 802.11a OFDM System (IEEE 802.11a OFDM System을 위한 파이프라인 구조 IFFT/FFT 모듈의 설계와 비교)

  • 이창훈;김주현;강봉순
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.3
    • /
    • pp.570-576
    • /
    • 2004
  • In this paper, we design the IFFT/FFT (Inverse fast Fourier Transform/Fast Fourier Transform) modules for IEEE 802.11a-1999, which is a standard of the High-speed Wireless LAN using the OFDM (Orthogonal Frequency Division Multiplexing). The designed IFFT/FFT is the 64-point FFT to be compatible with IEEE 802.11a and the pipelined architecture which needs neither serial-to-parallel nor parallel-to-serial converter. We compare four types of IFFT/FFT modules for the hardware complexity and operation : R22SDF (Radix-2 Single-path Delay feedback), the R2SDF (Radix-2 Single-path Delay feedback), R2SDF (Radix-4 Single-path Delay Feedback), and R4SDC (Radix-4 Single-path Delay Commutator). In order to minimize the error, we design the IFFT/FFT module to operate with additional decimal parts after butterfly operation. In case of the R22SDF, the IFFT/FFT module has 44,747 gate counts excluding RAMs and the minimized error rate as compared with other types. And we know that the R22SDF has a small hardware structure as compared with other types.

Pipelined Parallel CRC (파이프라인 구조를 적용한 병렬 CRC 회로 설계)

  • Kim, Ki-Tae;Yi, Hyun-Bean;Park, Sung-Ju;Park, Chang-Won
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.789-792
    • /
    • 2005
  • In this paper, we propose a method that applies pipeline architecture to parallel CRC circuits. We developed a logic partitioning algorithm for applying pipeline architecture. Our algorithm can be used for the polynomial and the input data width, both of arbitrary length and minimize the logic level. Design experiments show the superiority of our approach in reducing the delay in comparison with previous works.

  • PDF