• Title/Summary/Keyword: parallel decoding

Search Result 152, Processing Time 0.029 seconds

Design of Traceback Algorithm for Performance Improvement in Viterbi Decoder (비터비 디코더의 성능 향상을 위한 역추적 알고리듬의 설계)

  • 황의준;이종화;임신일;황선영
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.31A no.8
    • /
    • pp.100-110
    • /
    • 1994
  • This paper proposes an efficient traceback method for parallel hardware implementation of the Viterbi algorithm. Compared to the conventional Viterbi algorithm where initial state for traceback is selected arbitrarily the proposed algorithm decides decoding output by analyzing the survivor paths of consecutive tracebacks. This makes Viterbi algorithm more efficient in error correction event when more than one survivor path exists. The proposed traceback algorithm together with its hardware realization is presented in this paper. Experimental results show tht the proposed algorithms is efficient in error correction in noisy channels compared to the existing algorithms.

  • PDF

A Design of Superscalar Digital Signal Processor (다중 명령어 처리 DSP 설계)

  • Park, Sung-Wook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.3
    • /
    • pp.323-328
    • /
    • 2008
  • This paper presents a Digital Signal Processor achieving high through-put for both decision intensive and computation intensive tasks. The proposed processor employees a multiplier, two ALU and load/store. Unit as operational units. Those four units are controlled and works parallel by superscalar control scheme, which is different from prior DSP architecture. The performance evaluation was done by implementing AC-3 decoding algorithm and 37.8% improvement was achieved. This study is valuable especially for the consumer electronics applications, which require very low cost.

Parallelization Method of Slice-based video CODEC (슬라이스 기반 비디오 코덱 병렬화 기법)

  • Nam, Jung-Hak;Ji, Bong-Il;Jo, Hyun-Ho;Sim, Dong-Gyu;Cho, Dae-Sung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.6
    • /
    • pp.48-56
    • /
    • 2010
  • Recently, we need to dramatically speed up real-time video encoding and decoding on mobile devices because complexity of video CODEC is significantly increasing along with the demand for multimedia service of high-quality and high-definition videos by users. A variety of research is conducted for parallelism of video processing using newly developed multi-core platforms. In this paper, we propose a method of parallelism based on slice partition of video compression CODEC. We propose a novel concept of a parallel slice for parallelism and propose a new coding order to be adequate to the parallel slice which keeps high coding efficiency. To minimize synchronization time of multiple parallel slices, we also propose a synchronization method to determinate whether the parallel slice could be independently decoded or not. Experimental results shows that we achieved 27.5% (40.7%) speed-up by parallelism with bit-rate increase of 3.4% (2.7%) for CIF sequences (720p sequences) by implementing the proposed algorithm on the H.264/AVC.

An Efficient Hardware Design for Scaling and Transform Coefficients Decoding (스케일링과 변환계수 복호를 위한 효율적인 하드웨어 설계)

  • Jung, Hongkyun;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.10
    • /
    • pp.2253-2260
    • /
    • 2012
  • In this paper, an efficient hardware architecture is proposed for inverse transform and inverse quantization of H.264/AVC decoder. The previous inverse transform and quantization architecture has a different AC and DC coefficients decoding order. In the proposed architecture, IQ is achieved after IT regardless of the DC or AC coefficients. A common operation unit is also proposed to reduce the computational complexity of inverse quantization. Since division operation is included in the previous architecture, it will generate errors if the processing order is changed. In order to solve the problem, the division operation is achieved after IT to prevent errors in the proposed architecture. The architecture is implemented with 3-stage pipeline and a parallel vertical and horizontal IDCT is also implemented to reduce the operation cycle. As a result of analyzing the proposed ITIQ architecture operation cycle for one macroblock, the proposed one has improved by 45% than the previous one.

Design of Reed-Solomon Decoder for High Speed Data Networks

  • Park, Young-Shig;Park, Heyk-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.1
    • /
    • pp.170-178
    • /
    • 2004
  • In this work a high speed 8-error correcting Reed-Solomon decoder is designed using the modified Euclid algorithm. Decoding algorithm of Reed-Solomon codes consists of four steps, those are, compute syndromes, find error-location polynomials, decide error-locations, and determine error values. The decoding speed is increased and the latency is reduced by using the parallel architecture in the syndrome generator and a faster clock speed in the modified Euclid algorithm block. In addition. the error locator polynomial in Chien search block is separated into even and odd terms to increase the overall speed of the decoder. All the functionalities of the decoder are verified first through C++ programs. Verilog is used for hardware description, and then the decoder is synthesized with a $.25{\mu}m$ CMOS TML library. The functionalities of the chip is also verified through test vectors. The clock speed of the chip is 250MHz, and the maximum data rate is 1Gbps.

Hardware Channel Decoder for Holographic WORM Storage (홀로그래픽 WORM의 하드웨어 채널 디코더)

  • Hwang, Eui-Seok;Yoon, Pil-Sang;Kim, Hak-Sun;Park, Joo-Youn
    • Transactions of the Society of Information Storage Systems
    • /
    • v.1 no.2
    • /
    • pp.155-160
    • /
    • 2005
  • In this paper, the channel decoder promising reliable data retrieving in noisy holographic channel has been developed for holographic WORM(write once read many) system. It covers various DSP(digital signal processing) blocks, such as align mark detector, adaptive channel equalizer, modulation decoder and ECC(error correction code) decoder. The specific schemes of DSP are designed to reduce the effect of noises in holographic WORM(H-WORM) system, particularly in prototype of DAEWOO electronics(DEPROTO). For real time data retrieving, the channel decoder is redesigned for FPGA(field programmable gate array) based hardware, where DSP blocks calculate in parallel sense with memory buffers between blocks and controllers for driving peripherals of FPGA. As an input source of the experiments, MPEG2 TS(transport stream) data was used and recorded to DEPROTO system. During retrieving, the CCD(charge coupled device), capturing device of DEPROTO, detects retrieved images and transmits signals of them to the FPGA of hardware channel decoder. Finally, the output data stream of the channel decoder was transferred to the MPEG decoding board for monitoring video signals. The experimental results showed the error corrected BER(bit error rate) of less than $10^{-9}$, from the raw BER of DEPROTO, about $10^{-3}$. With the developed hardware channel decoder, the real-time video demonstration was possible during the experiments. The operating clock of the FPGA was 60 MHz, of which speed was capable of decoding up to 120 mega channel bits per sec.

  • PDF

On Designing 4-way Superscalar Digital Signal Processor Core (4-way 수퍼 스칼라 디지털 시그널 프로세서 코어 설계)

  • 김준석;유선국;박성욱;정남훈;고우석;이근섭;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.6
    • /
    • pp.1409-1418
    • /
    • 1998
  • The recent audio CODEC(Coding/Decoding) algorithms are complex of several coding techniques, and can be divided into DSP tasks, controller tasks and mixed tasks. The traditional DSP processor has been designed for fast processing of DSP tasks only, but not for controller and mixed tasks. This paper presents a new architecture that achieves high throughput on both controller and mixed tasks of such algorithms while maintaining high performance for DSP tasks. The proposed processor, YSP-3, operates four algorithms while maintaining high performance for DSP tasks. The proposed processor, YSP-3, operates functional units (Multiplier, two ALUs, Load/Store Unit) in parallel via 4-issue super-scalar instruction structure. The performance evaluation of YSP-3 has been done through the implementation of the several DSP algorithms and the part of the AC-3 decoding algorithms.

  • PDF

A LDPC Decoder for DVB-S2 Standard Supporting Multiple Code Rates (DVB-S2 기반에서 다양한 부호화 율을 지원하는 LCPC 복호기)

  • Ryu, Hye-Jin;Lee, Jong-Yeol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.118-124
    • /
    • 2008
  • For forward error correction, DVB-S2, which is the digital video broadcasting forward error coding and modulation standard for satellite television, uses a system based the concatenation of BCH with LDPC inner coding. In DVB-S2 the LDPC codes are defined for 11 different code rates, which means that a DVB-S2 LDPC decoder should support multiple code rates. Seven of the 11 code rates, 3/5, 2/3, 3/4, 4/5, 5/6, 8/9, and 9/10, are regular and the rest four code rates, 1/4, 1/3, 2/5, and 1/2, are irregular. In this paper we propose a flexible decoder for the regular LDPC codes. We combined the partially parallel decoding architecture that has the advantages in the chip size, the memory efficiency, and the processing rate with Benes network to implement a DVB-S2 LDPC decoder that can support multiple code rates with a block size of 64,800 and can configure the interconnection between the variable nodes and the check nodes according to the parity-check matrix. The proposed decoder runs correctly at the frequency of 200MHz enabling 193.2Mbps decoding throughput. The area of the proposed decoder is $16.261m^2$ and the power dissipation is 198mW at a power supply voltage of 1.5V.

Area-efficient Interpolation Architecture for Soft-Decision List Decoding of Reed-Solomon Codes (연판정 Reed-Solomon 리스트 디코딩을 위한 저복잡도 Interpolation 구조)

  • Lee, Sungman;Park, Taegeun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.3
    • /
    • pp.59-67
    • /
    • 2013
  • Reed-Solomon (RS) codes are powerful error-correcting codes used in diverse applications. Recently, algebraic soft-decision decoding algorithm for RS codes that can correct the errors beyond the error correcting bound has been proposed. The algorithm requires very intensive computations for interpolation, therefore an efficient VLSI architecture, which is realizable in hardware with a moderate hardware complexity, is mandatory for various applications. In this paper, we propose an efficient architecture with low hardware complexity for interpolation in soft-decision list decoding of Reed-Solomon codes. The proposed architecture processes the candidate polynomial in such a way that the terms of X degrees are processed in serial and the terms of Y degrees are processed in parallel. The processing order of candidate polynomials adaptively changes to increase the efficiency of memory access for coefficients; this minimizes the internal registers and the number of memory accesses and simplifies the memory structure by combining and storing data in memory. Also, the proposed architecture shows high hardware efficiency, since each module is balanced in terms of latency and the modules are maximally overlapped in schedule. The proposed interpolation architecture for the (255, 239) RS list decoder is designed and synthesized using the DongbuHitek $0.18{\mu}m$ standard cell library, the number of gate counts is 25.1K and the maximum operating frequency is 200 MHz.

Efficient DSP Architecture for Viterbi Algorithm (비터비 알고리즘의 효율적인 연산을 위한 DSP 구조 설계)

  • Park Weon heum;Sunwoo Myung hoon;Oh Seong keun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.3A
    • /
    • pp.217-225
    • /
    • 2005
  • This paper presents specialized DSP instructions and their architecture for the Viterbi algorithm used in various wireless communication standards. The proposed architecture can significantly reduce the Trace Back (TB) latency. The proposed instructions perform the Add Compare Select (ACS) and TB operations in parallel and the architecture has special hardware, called the Offset Calculation Unit (OCU), which automatically calculates data addresses for the trellis butterfly computations. Logic synthesis has been Performed using the Samsung SEC 0.18 μm standard cell library. OCU consists of 1,460 gates and the maximum delay of OCU is about 5.75 ns. The BER performance of the ACS-TB parallel method increases about 0.00022dB at 6dB Eb/No compared with the typical TB method, which is negligible. When the constraint length K is 5, the proposed DSP architecture can reduce the decoding cycles about 17% compared with the Carmel DSP and about 45% compared with 7MS320c15x.