• Title/Summary/Keyword: Bit-Parallel

Search Result 406, Processing Time 0.027 seconds

A memory management scheme for parallel viterbi algorithm with multiple add-compare-select modules (다중의 Add-compare-select 모듈을 갖는 병렬 비터비 알고리즘의 메모리 관리 방법)

  • 지현순;박동선;송상섭
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.8
    • /
    • pp.2077-2089
    • /
    • 1996
  • In this paper, a memory organization and its control method are proposed for the implementation of parallel Virterbi decoders. The design is mainly focused on lowering the hardware complexity of a parallel Viterbi decoder which is to reduce the decoding speed. The memories requeired in a Viterbi decoder are the SMM(State Metric Memory) and the TBM(Traceback Memory);the SMM for storing the path metrics of states and the TBM for storing the survial path information. A general parallel Viterbi decoder for high datarate usually consists of multiple ACS (Add-Compare-Select) units and their corresponding memeory modules.for parallel ACS units, SMMs and TBMs are partitioned into smaller independent pairs of memory modules which are separately interleaved to provide the maximum processing speed. In this design SMMs are controlled with addrss generators which can simultaneously compute addresses of the new path metrics. A bit shuffle technique is employed to provide a parallel access to the TBMs to store the survivor path informations from multiple ACS modules.

  • PDF

Design of Lightweight Parallel BCH Decoder for Sensor Network (센서네트워크 활용을 위한 경량 병렬 BCH 디코더 설계)

  • Choi, Won-Jung;Lee, Je-Hoon
    • Journal of Sensor Science and Technology
    • /
    • v.24 no.3
    • /
    • pp.188-193
    • /
    • 2015
  • This paper presents a new byte-wise BCH (4122, 4096, 2) decoder, which treats byte-wise parallel operations so as to enhance its throughput. In particular, we evaluate the parallel processing technique for the most time-consuming components such as syndrome generator and Chien search owing to the iterative operations. Even though a syndrome generator is based on the conventional LFSR architecture, it allows eight consecutive bit inputs in parallel and it treats them in a cycle. Thus, it can reduce the number of cycles that are needed. In addition, a Chien search eliminates the redundant operations to reduce the hardware complexity. The proposed BCH decoder is implemented with VHDL and it is verified using a Xilinx FPGA. From the simulation results, the proposed BCH decoder can enhance the throughput as 43% and it can reduce the hardware complexity as 67% compared to its counterpart employing parallel processing architecture.

Area-Efficient Semi-Parallel Encoding Structure for Long Polar Codes (긴 극 부호를 위한 저 면적 부분 병렬 극 부호 부호기 설계)

  • Shin, Yerin;Choi, Soyeon;Yoo, Hoyoung
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1288-1294
    • /
    • 2019
  • The channel-achieving property made the polar code show to advantage as an error-correcting code. However, sufficient error-correction performance shows the asymptotic property that is achieved when the length of the code is long. Therefore, efficient architecture is needed to realize the implementation of very-large-scale integration for the case of long input data. Although the most basic fully parallel encoder is intuitive and easy to implement, it is not suitable for long polar codes because of the high hardware complexity. Complementing this, a partially parallel encoder was proposed which has an excellent result in terms of hardware area. Nevertheless, this method has not been completely generalized and has the disadvantage that different architectures appear depending on the hardware designer. In this paper, we propose a hardware design scheme that applies the proposed systematic approach which is optimized for bit-dimension permutations. By applying this solution, it is possible to design a generalized partially parallel encoder for long polar codes with the same intuitive architecture as a fully parallel encoder.

Design of High-Speed Parallel Multiplier over Finite Field $GF(2^m)$ (유한체 $GF(2^m)$상의 고속 병렬 승산기의 설계)

  • Seong Hyeon-Kyeong
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.43 no.5 s.311
    • /
    • pp.36-43
    • /
    • 2006
  • In this paper we present a new high-speed parallel multiplier for Performing the bit-parallel multiplication of two polynomials in the finite fields $GF(2^m)$. Prior to construct the multiplier circuits, we consist of the MOD operation part to generate the result of bit-parallel multiplication with one coefficient of a multiplicative polynomial after performing the parallel multiplication of a multiplicand polynomial with a irreducible polynomial. The basic cells of MOD operation part have two AND gates and two XOR gates. Using these MOD operation parts, we can obtain the multiplication results performing the bit-parallel multiplication of two polynomials. Extending this process, we show the design of the generalized circuits for degree m and a simple example of constructing the multiplier circuit over finite fields $GF(2^4)$. Also, the presented multiplier is simulated by PSpice. The multiplier presented in this paper use the MOD operation parts with the basic cells repeatedly, and is easy to extend the multiplication of two polynomials in the finite fields with very large degree m, and is suitable to VLSI. Also, since this circuit has a low propagation delay time generated by the gates during operating process because of not use the memory elements in the inside of multiplier circuit, this multiplier circuit realizes a high-speed operation.

Broadband $180^{\circ}$ Bit X-band Phase Shifter Using Payallel-Coupled tines (평행 결합선로를 이용한 광대역 $180^{\circ}$ Bit X-대역 위상 변이기의 설계)

  • Sung Gyu-Je;Park Hyun-Sik;Kim Dong-Yen
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.12 no.2 s.35
    • /
    • pp.175-179
    • /
    • 2005
  • A novel, simple and broadband $180^{\circ}$ bit X-band phase shifter was proposed and fabricated in a standard micromachining process. It is composed of two $90^{\circ}$ parallel-coupled lines; one of which is shorted and the other is grounded. Design equations for the proposed $180^{\circ}$ bit phase shifter are derived by the method of even and odd mode analysis. Based on design equations, $180^{\circ}$ bit phase shifter was designed and fabricated to operate from 7 to 13 GHz with ${\pm}5^{\circ}$ of phase deviation.

  • PDF

CMOS-Memristor Hybrid 4-bit Multiplier Circuit for Energy-Efficient Computing

  • Vo, Huan Minh;Truong, Son Ngoc;Shin, Sanghak;Min, Kyeong-Sik
    • Journal of IKEEE
    • /
    • v.18 no.2
    • /
    • pp.228-233
    • /
    • 2014
  • In this paper, we propose a CMOS-memristor hybrid circuit that can perform 4-bit multiplication for future energy-efficient computing in nano-scale digital systems. The proposed CMOS-memristor hybrid circuit is based on the parallel architecture with AND and OR planes. This parallel architecture can be very useful in improving the power-delay product of the proposed circuit compared to the conventional CMOS array multiplier. Particularly, from the SPECTRE simulation of the proposed hybrid circuit with 0.13-mm CMOS devices and memristors, this proposed multiplier is estimated to have better power-delay product by 48% compared to the conventional CMOS array multiplier. In addition to this improvement in energy efficiency, this 4-bit multiplier circuit can occupy smaller area than the conventional array multiplier, because each cross-point memristor can be made only as small as $4F^2$.

Acceleration Method of Inter Prediction using Advanced SIMD (Advanced SIMD를 이용한 화면 간 예측 고속화방법)

  • Kim, Wan-Su;Lee, Jae-Heung
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.382-388
    • /
    • 2012
  • An H.264/AVC fast motion estimation methodology is presented in this paper. Advanced SIMD based NEON which is one of the parallel processing methods is supported under the ARM Cortex-A9 dual-core platform. NEON is applied to a full search technique with one of the various motion estimation methods and SAD operation count of each macroblock is reduced to 1/4. Pixel values of the corresponding macroblock are assigned to eight 16-bit NEON registers and Intrinsic function in NEON architecture carried out 128 bits arithmetic operations at the same time. In this way, the exact motion vector with the minimum SAD value among the calculated SAD values can be designated. Experimental results show that performance gets improved 30% above average in accordance with the size of image and macroblock.

256-channel 1ks/s MCG Signal Acquisition System (256-channel 1 ksamples/sec 심자도 신호획득 시스템)

  • Lee, Dong-Ha;Yoo, Jae-Tack;Huh, Young
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.538-540
    • /
    • 2004
  • Electrical currents generated by human heart activities create magnetic fields represented by MCG(MagnetoCardioGram). Since an MCG signal acquisition system requires precise and stable operation, the system adopts hundreds of SQUID(Superconducting QUantum Interface Device) sensors for signal acquisition. Such a system requires fast real-time data acquisition in a required sampling interval, i.e., 1 mili-second for each sensor. This paper presents designed hardware to acquire data from 256-channel analog signal with 1 ksamples/sec speed, using 12-bit 8-channel ADC devices, SPI interfaces, parallel interfaces, 8-bit microprocessors, and a DSP processor. We implemented SPI interface between ADCs and a microprocessor, parallel interfaces between microprocessors. Our result concludes that the data collection can be done in $168{\mu}sec$ time-interval for 256 SQUID sensors, which can be interpreted to 6 ksamples/sec speed.

  • PDF

Lowering Error Floor of LDPC Codes Using an Improved Parallel WBF Algorithm

  • Ma, Kexiang;Li, Yongzhao;Zhu, Caizhi;Zhang, Hailin;Zhang, Yuming
    • ETRI Journal
    • /
    • v.36 no.1
    • /
    • pp.171-174
    • /
    • 2014
  • In weighted bit-flipping-based algorithms for low-density parity-check (LDPC) codes, due to the existence of overconfident incorrectly received bits, the metric values of the corresponding bits will always be wrong in the decoding process. Since these bits cannot be flipped, decoding failure results. To solve this problem, an improved parallel weighted bit flipping algorithm is proposed. Specifically, a reliability-saturation strategy is adopted to increase the flipping probability of the overconfident incorrectly received bits. Simulation results show that the error floor of LDPC codes is greatly lowered.

Design and bread boarding of parallel-series type 4-bit A/D converter (직병렬형 4비트 A/D 변환기 설계 및 제작)

  • Kim, T.H.;Bae, C.S.;Chung, H.S.;Lee, W.I.;Kuen, T.W.;Kim, J.S.
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1573-1576
    • /
    • 1987
  • A 4-bit parallel-series A/D converter has been designed using a new matrix circuit and breadboarded with transister array(TPQ2483). The simple matrix circuit is substituted for D/A converter and sebtracter-multiplier. The system has been simulated with SPICE. This converter is capable of operating at clock rate of 20MHz.

  • PDF