• 제목/요약/키워드: low-complexity hardware architecture

Search Result 86, Processing Time 0.021 seconds

Design of an Area-Efficient Survivor Path Unit for Viterbi Decoder Supporting Punctured Codes (천공 부호를 지원하는 Viterbi 복호기의 면적 효율적인 생존자 경로 계산기 설계)

  • Kim, Sik;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.3A
    • /
    • pp.337-346
    • /
    • 2004
  • Punctured convolutional codes increase transmission efficiency without increasing hardware complexity. However, Viterbi decoder supporting punctured codes requires long decoding length and large survivor memory to achieve sifficiently low bit error rate (BER), when compared to the Viterbi decoder for a rate 1/2 convolutional code. This Paper presents novel architecture adopting a pipelined trace-forward unit reducing survivor memory requirements in the Viterbi decoder. The proposed survivor path architecture reduces the memory requirements by removing the initial decoding delay needed to perform trace-back operation and by accelerating the trace-forward process to identify the survivor path in the Viterbi decoder. Experimental results show that the area of survivor path unit has been reduced by 16% compared to that of conventional hybrid survivor path unit.

Performance Evaluation of Secure Embedded Processor using FEC-Based Instruction-Level Correlation Technique (오류정정 부호 기반 명령어 연관성 기법을 적용한 임베디드 보안 프로세서의 성능평가)

  • Lee, Seung-Wook;Kwon, Soon-Gyu;Kim, Jong-Tae
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.5B
    • /
    • pp.526-531
    • /
    • 2009
  • In this paper, we propose new novel technique (ILCT: Instruction-Level Correlation Technique) which can detect tempered instructions by software attacks or hardware attacks before their execution. In conventional works, due to both high complex computation of cipher process and low processing speed of cipher modules, existing secure processor architecture applying cipher technique can cause serious performance degradation. While, the secure processor architecture applying ILCT with FEC does not incur excessive performance decrease by complexity of computation and speed of tampering detection modules. According to experimental results, total memory overhead including parity are increased in average of 26.62%. Also, secure programs incur CPI degradation in average of $1.20%{\sim}1.97%$.

Three-Parallel Reed-Solomon based Forward Error Correction Architecture for 100Gb/s Optical Communications (100Gb/s급 광통신시스템을 위한 3-병렬 Reed-Solomon 기반 FEC 구조 설계)

  • Choi, Chang-Seok;Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.48-55
    • /
    • 2009
  • This paper presents a high-speed Forward Error Correction (FEC) architecture based on three-parallel Reed-Solomon (RS) decoder for next-generation 100-Gb/s optical communication systems. A high-speed three-parallel RS(255,239) decoder has been designed and the derived structure can also be applied to implement the 100-Gb/s RS-FEC architecture. The proposed 100-Gb/s RS-FEC has been implemented with 0.13-${\mu}m$ CMOS standard cell technology in a supply voltage of 1.2V. The implementation results show that 16-Ch. RS-FEC architecture can operate at a clock frequency of 300MHz and has a throughput of 115-Gb/s for 0.13-${\mu}m$ CMOS technology. As a result, the proposed three-parallel RS-FEC architecture has a much higher data processing rate and low hardware complexity compared with the conventional two-parallel, three-parallel and serial RS-FEC architectures.

Low-Complexity Deeply Embedded CPU and SoC Implementation (낮은 복잡도의 Deeply Embedded 중앙처리장치 및 시스템온칩 구현)

  • Park, Chester Sungchung;Park, Sungkyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.3
    • /
    • pp.699-707
    • /
    • 2016
  • This paper proposes a low-complexity central processing unit (CPU) that is suitable for deeply embedded systems, including Internet of things (IoT) applications. The core features a 16-bit instruction set architecture (ISA) that leads to high code density, as well as a multicycle architecture with a counter-based control unit and adder sharing that lead to a small hardware area. A co-processor, instruction cache, AMBA bus, internal SRAM, external memory, on-chip debugger (OCD), and peripheral I/Os are placed around the core to make a system-on-a-chip (SoC) platform. This platform is based on a modified Harvard architecture to facilitate memory access by reducing the number of access clock cycles. The SoC platform and CPU were simulated and verified at the C and the assembly levels, and FPGA prototyping with integrated logic analysis was carried out. The CPU was synthesized at the ASIC front-end gate netlist level using a $0.18{\mu}m$ digital CMOS technology with 1.8V supply, resulting in a gate count of merely 7700 at a 50MHz clock speed. The SoC platform was embedded in an FPGA on a miniature board and applied to deeply embedded IoT applications.

Bit-serial Discrete Wavelet Transform Filter Design (비트 시리얼 이산 웨이블렛 변환 필터 설계)

  • Park Tae geun;Kim Ju young;Noh Jun rye
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.4A
    • /
    • pp.336-344
    • /
    • 2005
  • Discrete Wavelet Transform(DWT) is the oncoming generation of compression technique that has been selected for MPEG4 and JEPG2000, because it has no blocking effects and efficiently determines frequency property of temporary time. In this paper, we propose an efficient bit-serial architecture for the low-power and low-complexity DWT filter, employing two-channel QMF(Qudracture Mirror Filter) PR(Perfect Reconstruction) lattice filter. The filter consists of four lattices(filter length=8) and we determine the quantization bit for the coefficients by the fixed-length PSNR(peak-signal-to-noise ratio) analysis and propose the architecture of the bit-serial multiplier with the fixed coefficient. The CSD encoding for the coefficients is adopted to minimize the number of non-zero bits, thus reduces the hardware complexity. The proposed folded 1D DWT architecture processes the other resolution levels during idle periods by decimations and its efficient scheduling is proposed. The proposed architecture requires only flip-flops and full-adders. The proposed architecture has been designed and verified by VerilogHDL and synthesized by Synopsys Design Compiler with a Hynix 0.35$\mu$m STD cell library. The maximum operating frequency is 200MHz and the throughput is 175Mbps with 16 clock latencies.

High Throughput Parallel Design of 2-D $8{\times}8$ Integer Transforms for H.264/AVC (H.264/AVC 를 위한 높은 처리량의 2-D $8{\times}8$ integer transforms 병렬 구조 설계)

  • Sharma, Meeturani;Tiwari, Honey;Cho, Yong-Beom
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.8
    • /
    • pp.27-34
    • /
    • 2012
  • In this paper, the implementation of high throughput two-dimensional (2-D) $8{\times}8$ forward and inverse integer DCT transform for H.264 is presented. The forward and inverse transforms are represented using simple shift and addition operations. Matrix decomposition and matrix operation such as the Kronecker product and direct sum are used to reduce the computation complexity. The proposed design uses integer computations and does not use transpose memory and hence, the resource consumption is also reduced. The maximum operating frequency of the proposed pipelined architecture is 1.184 GHz, which achieves 25.27 Gpixels/sec throughput rate with the hardware cost of 44864 gates. High throughput and low hardware makes the proposed design useful for real time H.264/AVC high definition processing.

Sphere Decoding Algorithm and VLSI Implementation Using Two-Level Search (2 레벨 탐색을 이용한 스피어 디코딩 알고리즘과 VLSI 구현)

  • Huynh, Tronganh;Cho, Jong-Min;Kim, Jin-Sang;Cho, Won-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.6
    • /
    • pp.104-110
    • /
    • 2008
  • In this paper, a novel 2-level-search sphere decoding algorithm for multiple-input multiple-output (MIMO) detection and its VLSI implementation are presented. The proposed algorithm extends the search space by concurrently performing symbol detection on 2 level of the tree search. Therefore, the possibility of discarding good candidates can be avoided. Simulation results demonstrate the good performance of the proposed algorithm in terms of bit-error-rate (BER). From the proposed algorithm, an efficient very large scale integration (VLSI) architecture which incorporates low-complexity and fixed throughput features is proposed. The proposed architecture supports many modulation techniques such as BPSK, QPSK, 16-QAM and 64-QAM. The sorting block, which occupies a large portion of hardware utilization, is shared for different operating modes to reduce the area. The proposed hardware implementation results show the improvement in terms of area and BER performance compared with existing architectures.

Parallelized Architecture of Serial Finite Field Multipliers for Fast Computation (유한체 상에서 고속 연산을 위한 직렬 곱셈기의 병렬화 구조)

  • Cho, Yong-Suk
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.17 no.1
    • /
    • pp.33-39
    • /
    • 2007
  • Finite field multipliers are the basic building blocks in many applications such as error-control coding, cryptography and digital signal processing. Hence, the design of efficient dedicated finite field multiplier architectures can lead to dramatic improvement on the overall system performance. In this paper, a new bit serial structure for a multiplier with low latency in Galois field is presented. To speed up multiplication processing, we divide the product polynomial into several parts and then process them in parallel. The proposed multiplier operates standard basis of $GF(2^m)$ and is faster than bit serial ones but with lower area complexity than bit parallel ones. The most significant feature of the proposed architecture is that a trade-off between hardware complexity and delay time can be achieved.

A Fast Inversion for Low-Complexity System over GF(2 $^{m}$) (경량화 시스템에 적합한 유한체 $GF(2^m)$에서의 고속 역원기)

  • Kim, So-Sun;Chang, Nam-Su;Kim, Chang-Han
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.9 s.339
    • /
    • pp.51-60
    • /
    • 2005
  • The design of efficient cryptosystems is mainly appointed by the efficiency of the underlying finite field arithmetic. Especially, among the basic arithmetic over finite field, the rnultiplicative inversion is the most time consuming operation. In this paper, a fast inversion algerian in finite field $GF(2^m)$ with the standard basis representation is proposed. It is based on the Extended binary gcd algorithm (EBGA). The proposed algorithm executes about $18.8\%\;or\;45.9\%$ less iterations than EBGA or Montgomery inverse algorithm (MIA), respectively. In practical applications where the dimension of the field is large or may vary, systolic array sDucture becomes area-complexity and time-complexity costly or even impractical in previous algorithms. It is not suitable for low-weight and low-power systems, i.e., smartcard, the mobile phone. In this paper, we propose a new hardware architecture to apply an area-efficient and a synchronized inverter on low-complexity systems. It requires the number of addition and reduction operation less than previous architectures for computing the inverses in $GF(2^m)$ furthermore, the proposed inversion is applied over either prime or binary extension fields, more specially $GF(2^m)$ and GF(P) .

Design and Implementation of Feature Detector for Object Tracking (객체 추적을 위한 특징점 검출기의 설계 및 구현)

  • Lee, Du-hyeon;Kim, Hyeon;Cho, Jae-chan;Jung, Yun-ho
    • Journal of IKEEE
    • /
    • v.23 no.1
    • /
    • pp.207-213
    • /
    • 2019
  • In this paper, we propose a low-complexity feature detection algorithm for object tracking and present hardware architecture design and implementation results for real-time processing. The existing Shi-Tomasi algorithm shows good performance in object tracking applications, but has a high computational complexity. Therefore, we propose an efficient feature detection algorithm, which can reduce the operational complexity with the similar performance to Shi-Tomasi algorithm, and present its real-time implementation results. The proposed feature detector was implemented with 1,307 logic slices, 5 DSP 48s and 86.91Kbits memory with FPGA. In addition, it can support the real-time processing of 54fps at an operating frequency of 114MHz for $1920{\times}1080FHD$ images.