• Title/Summary/Keyword: low-complexity hardware architecture

Search Result 86, Processing Time 0.023 seconds

A Design of Parameterized Viterbi Decoder for Multi-standard Applications (다중 표준용 파라미터화된 비터비 복호기 IP 설계)

  • Park, Sang-Deok;Jeon, Heung-Woo;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.6
    • /
    • pp.1056-1063
    • /
    • 2008
  • This paper describes an efficient design of a multi-standard Viterbi decoder that supports multiple constraint lengths and code rates. The Viterbi decoder is parameterized for the code rates 1/2, 1/3 and constraint lengths 7,9, thus it has four operation nodes. In order to achieve low hardware complexity and low power, an efficient architecture based on hardware sharing techniques is devised. Also, the optimization of ACCS (Accumulate-Subtract) circuit for the one-point trace-back algorithm reduces its area by about 35% compared to the full parallel ACCS circuit. The parameterized Viterbi decoder core has 79,818 gates and 25,600 bits memory, and the estimated throughput is about 105 Mbps at 70 MHz clock frequency. Also, the simulation results for BER (Bit Error Rate) performance show that the Viterbi decoder has BER of $10^{-4}$ at $E_b/N_o$ of 3.6 dB when it operates with code rate 1/3 and constraints 7.

New Parallel MDC FFT Processor for Low Computation Complexity (연산복잡도 감소를 위한 새로운 8-병렬 MDC FFT 프로세서)

  • Kim, Moon Gi;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.3
    • /
    • pp.75-81
    • /
    • 2015
  • This paper proposed the new eight-parallel MDC FFT processor using the eight-parallel MDC architecture and the efficient scheduling scheme. The proposed FFT processor supports the 256-point FFT based on the modified radix-$2^6$ FFT algorithm. The proposed scheduling scheme can reduce the number of complex multipliers from eight to six without increasing delay buffers and computation cycles. Moreover, the proposed FFT processor can be used in OFDM systems required high throughput and low hardware complexity. The proposed FFT processor has been designed and implemented with a 90nm CMOS technology. The experimental result shows that the area of the proposed FFT processor is $0.27mm^2$. Furthermore, the proposed eight-parallel MDC FFT processor can achieve the throughput rate up to 2.7 GSample/s at 388MHz.

Systolic Arrays for Lattice-Reduction-Aided MIMO Detection

  • Wang, Ni-Chun;Biglieri, Ezio;Yao, Kung
    • Journal of Communications and Networks
    • /
    • v.13 no.5
    • /
    • pp.481-493
    • /
    • 2011
  • Multiple-input multiple-output (MIMO) technology provides high data rate and enhanced quality of service for wireless communications. Since the benefits from MIMO result in a heavy computational load in detectors, the design of low-complexity suboptimum receivers is currently an active area of research. Lattice-reduction-aided detection (LRAD) has been shown to be an effective low-complexity method with near-maximum-likelihood performance. In this paper, we advocate the use of systolic array architectures for MIMO receivers, and in particular we exhibit one of them based on LRAD. The "Lenstra-Lenstra-Lov$\acute{a}$sz (LLL) lattice reduction algorithm" and the ensuing linear detections or successive spatial-interference cancellations can be located in the same array, which is considerably hardware-efficient. Since the conventional form of the LLL algorithm is not immediately suitable for parallel processing, two modified LLL algorithms are considered here for the systolic array. LLL algorithm with full-size reduction-LLL is one of the versions more suitable for parallel processing. Another variant is the all-swap lattice-reduction (ASLR) algorithm for complex-valued lattices, which processes all lattice basis vectors simultaneously within one iteration. Our novel systolic array can operate both algorithms with different external logic controls. In order to simplify the systolic array design, we replace the Lov$\acute{a}$sz condition in the definition of LLL-reduced lattice with the looser Siegel condition. Simulation results show that for LR-aided linear detections, the bit-error-rate performance is still maintained with this relaxation. Comparisons between the two algorithms in terms of bit-error-rate performance, and average field-programmable gate array processing time in the systolic array are made, which shows that ASLR is a better choice for a systolic architecture, especially for systems with a large number of antennas.

An Area-efficient Implementation of Layered LDPC Decoder for IEEE 802.11n WLAN (IEEE 802.11n WLAN 표준용 Layered LDPC 복호기의 저면적 구현)

  • Jeong, Sang-Hyeok;Na, Young-Heon;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.486-489
    • /
    • 2010
  • This paper describes a layered LDPC decoder which supports block length of 1,944 bits and code rate 1/2 for IEEE 802.11n WLAN standard. To reduce the hardware complexity, the min-sum algorithm and layered architecture is adopted. A novel memory reduction technique suitable for min-sum algorithm reduces memory size by 75% compared with conventional method. The designed processor has 200,400 gates and 19,400 bits memory, and it is verified by FPGA implementation. The estimated throughput is about 200 Mbps at 120 MHz clock by using Xilinx Virtex-4 FPGA device.

  • PDF

A Study on the Hardware Architecture of Trinomial $GF(2^m)$ Multiplier (Trinomial $GF(2^m)$ 승산기의 하드웨어 구성에 관한 연구)

  • 변기영;윤광섭
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.41 no.5
    • /
    • pp.29-36
    • /
    • 2004
  • This study focuses on the arithmetical methodology and hardware implementation of low-system-complexity multiplier over GF(2$^{m}$ ) using the trinomial of degree a The proposed parallel-in parallel-out operator is composed of MR, PP, and MS modules, each can be established using the regular array structure of AND and XOR gates. The proposed multiplier is composed of $m^2$ 2-input AND gates and $m^2$-1 2-input XOR gates, and the propagation delay is $T_{A}$+(1+[lo $g_2$$^{m}$ ]) $T_{x}$ . Comparison result of the related multipliers of GF(2$^{m}$ ) are shown by table, it reveals that our operator involve more regular and generalized then the others, and therefore well-suited for VLSI implementation. Moreover, our multiplier is more suitable for any other GF(2$^{m}$ ) operational applications.s.

Design and Analysis of a $AB^2$ Systolic Arrays for Division/Inversion in$GF(2^m)$ ($GF(2^m)$상에서 나눗셈/역원 연산을 위한 $AB^2$ 시스톨릭 어레이 설계 및 분석)

  • 김남연;고대곤;유기영
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.1
    • /
    • pp.50-58
    • /
    • 2003
  • Among finite field arithmetic operations, the $AB^2$ operation is known as an efficient basic operation for public key cryptosystems over $GF(2^m)$,Division/Inversion is computed by performing the repetitive AB$^2$ multiplication. This paper presents two new $AB^2$algorithms and their systolic realizations in finite fields $GF(2^m)$.The proposed algorithms are based on the MSB-first scheme using standard basis representation and the proposed systolic architectures for $AB^2$ multiplication have a low hardware complexity and small latency compared to the conventional approaches. Additionally, since the proposed architectures incorporate simplicity, regularity, modularity, and pipelinability, they are well suited to VLSI implementation and can be easily applied to inversion architecture. Furthermore, these architectures will be utilized for the basic architecture of crypto-processor.

Design of Time Synchronizer for Advanced LR-WPAN Systems (개선된 LR-WPAN 시스템을 위한 시간 동기부 설계)

  • Park, Mincheol;Lee, Dongchan;Jang, Soohyun;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.5
    • /
    • pp.476-482
    • /
    • 2014
  • Recently, with the growth of various sensor applications, the need of wireless communication systems which can support variable data rate is increasing. IEEE 802.15.4 LR-WPAN system using 2.45 GHz frequency band is very popular for the sensor applications. However, since LR-WPAN only supports the data rate of 250 kbps, it has a limit to be applied to various sensor networks. Therefore, we define the preamble structure which can support the data rates of 31.25 kbps, 62.5 kbps, 125 kbps, and present the low-complexity hardware architecture for time synchronizer based on double-correlation algorithm which can resist the CFO (carrier frequency offset). Implementation results show that the proposed time synchronizer include the logic slice of 18.36 K and four DSP48s, which are reduced at the rate of 79.1% and 99.4%, respectively, compared with existing architecture.

A Design of High Performance Motion Estimation Hardware for H.264/AVC (H.264/AVC를 위한 고성능 움직임 예측 하드웨어 설계)

  • Park, Seungyong;Ryoo, Kwangki
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.1
    • /
    • pp.124-130
    • /
    • 2013
  • In this paper, a new motion estimation algorithm with low-computational complexity is proposed to improve the performance of H.264/AVC. The proposed architecture uses the directions of the median motion vector which is computed by the motion vectors of the three neighbor macroblocks in Integer Motion Estimation. By using the directions of the vector, the proposed architecture has a single computational level instead of multi-computational levels in Integer Motion Estimation. The proposed motion estimation is synthesized using the TSMC 0.18um standard cell library. The synthesis result shows that the gate count is about 217.92K at 166MHz and it was improved about 69% compared with previous one.

High Speed 8-Parallel Fft/ifft Processor using Efficient Pipeline Architecture and Scheduling Scheme (효율적인 파이프라인 구조와 스케줄링 기법을 적용한 고속 8-병렬 FFT/IFFT 프로세서)

  • Kim, Eun-Ji;SunWoo, Myung-Hoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.3C
    • /
    • pp.175-182
    • /
    • 2011
  • This paper presents a novel eight-parallel 128/256-point mixed-radix multi-path delay commutator (MRMDC) FFT/IFFT processor for orthogonal frequency-division multiplexing (OFDM) systems. The proposed FFT architecture can provide a high throughput rate and low hardware complexity by using an eight-parallel data-path scheme, a modified mixed-radix multi-path delay commutator structure and an efficient scheduling scheme of complex multiplications. The efficient scheduling scheme can reduce the number of complex multipliers at the second stage from 88 to 40. The proposed FFT/IFFT processor has been designed and implemented with the 90nm CMOS technology. The proposed eight-parallel FFT/IFFT processor can provide a throughput rate of up to 27.5Gsample/s at 430MHz.

Hardware optimized high quality image signal processor for single-chip CMOS Image Sensor (Single-chip CMOS Image Sensor를 위한 하드웨어 최적화된 고화질 Image Signal Processor 설계)

  • Lee, Won-Jae;Jung, Yun-Ho;Lee, Seong-Joo;Kim, Jae-Seok
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.5
    • /
    • pp.103-111
    • /
    • 2007
  • In this paper, we propose a VLSI architecture of hardware optimized high quality image signal processor for a Single-chip CMOS Image Sensor(CIS). The Single-chip CIS is usually used for mobile applications, so it has to be implemented as small as possible while maintaining the image quality. Several image processing algorithms are used in ISP to improve captured image quality. Among the several image processing blocks, demosaicing and image filter are the core blocks in ISP. These blocks need line memories, but the number of line memories is limited in a low cost Single-chip CIS. In our design, high quality edge-adaptive and cross channel correlation considered demosaicing algorithm is adopted. To minimize the number of required line memories for image filter, we share the line memories using the characteristics of demosaicing algorithm which consider the cross correlation. Based on the proposed method, we can achieve both high quality and low hardware complexity with a small number of line memories. The proposed method was implemented and verified successfully using verilog HDL and FPGA. It was synthesized to gate-level circuits using 0.25um CMOS standard cell library. The total logic gate count is 37K, and seven and half line memories are used.