• Title/Summary/Keyword: 공통연산기

Search Result 46, Processing Time 0.027 seconds

Design of an Efficient AES-ARIA Processor using Resource Sharing Technique (자원 공유기법을 이용한 AES-ARIA 연산기의 효율적인 설계)

  • Koo, Bon-Seok;Ryu, Gwon-Ho;Chang, Tae-Joo;Lee, Sang-Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.6A
    • /
    • pp.39-49
    • /
    • 2008
  • AEA and ARIA are next generation standard block cipher of US and Korea, respectively, and these algorithms are used in various fields including smart cards, electronic passport, and etc. This paper addresses the first efficient unified hardware architecture of AES and ARIA, and shows the implementation results with 0.25um CMOS library. We designed shared S-boxes based on composite filed arithmetic for both algorithms, and also extracted common terms of the permutation matrices of both algorithms. With the $0.25-{\mu}m$ CMOS technology, our processor occupies 19,056 gate counts which is 32% decreased size from discrete implementations, and it uses 11 clock cycles and 16 cycles for AES and ARIA encryption, which shows 720 and 1,047 Mbps, respectively.

Parallel Inverse Transform and Small-sized Inverse Quantization Architectures Design of H.264/AVC Decoder (H.264/AVC 복호기의 병렬 역변환 구조 및 저면적 역양자화 구조 설계)

  • Jung, Hong-Kyun;Cha, Ki-Jong;Park, Seung-Yong;Kim, Jin-Young;Ryoo, Kwang-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.10a
    • /
    • pp.444-447
    • /
    • 2011
  • In this paper, parallel IT(inverse transform) architecture and IQ(inverse quantization) architecture with common operation unit for the H.264/AVC decoder are proposed. By using common operation unit, the area cost and computational complexity of IQ are reduced. In order to take four execution cycles to perform IT, the proposed IT architecture has parallel architecture with one horizontal DCT unit and four vertical DCT units. Furthermore, the execution cycles of the proposed architecture is reduced to five cycles by applying two state pipeline architecture. The proposed architecture is implemented to a single chip by using Magnachip 0.18um CMOS technology. The gate count of the proposed architecture is 14.3k at clock frequency of 13MHz and the area of proposed IQ is reduced 39.6% compared with the previous one. The experimental result shows that execution cycle the proposed architecture is about 49.09% higher than that of the previous one.

  • PDF

Design of Bit Manipulation Accelerator fo Communication DSP (통신용 DSP를 위한 비트 조작 연산 가속기의 설계)

  • Jeong Sug H.;Sunwoo Myung H.
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.8 s.338
    • /
    • pp.11-16
    • /
    • 2005
  • This paper proposes a bit manipulation accelerator (BMA) having application specific instructions, which efficiently supports scrambling, convolutional encoding, puncturing, and interleaving. Conventional DSPs cannot effectively perform bit manipulation functions since かey have multiply accumulate (MAC) oriented data paths and word-based functions. However, the proposed accelerator can efficiently process bit manipulation functions using parallel shift and Exclusive-OR (XOR) operations and bit jnsertion/extraction operations on multiple data. The proposed BMA has been modeled by VHDL and synthesized using the SEC $0.18\mu m$ standard cell library and the gate count of the BMA is only about 1,700 gates. Performance comparisons show that the number of clock cycles can be reduced about $40\%\sim80\%$ for scrambling, convolutional encoding and interleaving compared with existing DSPs.

Design of High Performance Multi-mode 2D Transform Block for HEVC (HEVC를 위한 고성능 다중 모드 2D 변환 블록의 설계)

  • Kim, Ki-Hyun;Ryoo, Kwang-Ki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.2
    • /
    • pp.329-334
    • /
    • 2014
  • This paper proposes the hardware architecture of high performance multi-mode 2D forward transform for HEVC which has same number of cycles for processing any type of four TUs and yield high throughput. In order to make the original image which has high pixel and high resolution into highly compressed image effectively, the transform technique of HEVC supports 4 kinds of pixel units, TUs and it finds the optimal mode after performs each transform computation. As the proposed transform engine uses the common computation operator which is produced by analyzing the relationship among transform matrix coefficients, it can process every 4 kinds of TU mode matrix operation with 35cycles equally. The proposed transform block was designed by Verilog HDL and synthesized by using TSMC 0.18um CMOS processing technology. From the results of logic synthesis, the maximum operating frequency was 400MHz and total gate count was 214k gates which has the throughput of 10-Gpels/cycle with the $4k(3840{\times}2160)@30fps$ image.

An Efficient Hardware Design for Scaling and Transform Coefficients Decoding (스케일링과 변환계수 복호를 위한 효율적인 하드웨어 설계)

  • Jung, Hongkyun;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.10
    • /
    • pp.2253-2260
    • /
    • 2012
  • In this paper, an efficient hardware architecture is proposed for inverse transform and inverse quantization of H.264/AVC decoder. The previous inverse transform and quantization architecture has a different AC and DC coefficients decoding order. In the proposed architecture, IQ is achieved after IT regardless of the DC or AC coefficients. A common operation unit is also proposed to reduce the computational complexity of inverse quantization. Since division operation is included in the previous architecture, it will generate errors if the processing order is changed. In order to solve the problem, the division operation is achieved after IT to prevent errors in the proposed architecture. The architecture is implemented with 3-stage pipeline and a parallel vertical and horizontal IDCT is also implemented to reduce the operation cycle. As a result of analyzing the proposed ITIQ architecture operation cycle for one macroblock, the proposed one has improved by 45% than the previous one.

FPGA Design of High-Speed Motion Estimator (고속 움직임 예측기의 FPGA 설계)

  • Lim, Jeong-Hun;Seo, Young-Ho;Choi, Hyun-Jun;Kim, Dong-Wook
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.07a
    • /
    • pp.104-107
    • /
    • 2010
  • 본 논문은 H.264/AVC 디코더의 하드웨어 구현 시 가장 많은 시간을 소비하는 부분이 움직임 추정기를 하드웨어로 구현하였다. 움직임 추정을 함에 있어서 외부메모리 Access 량을 줄이고, SAD연산을 수행할 때 Clock의 손실 없이 계산을 하는 움직임 예측기를 제안한다. 제안한 구조는 재탐색 구간에서 이전 탐색 범위와 공통부분을 이루는 부분을 레지스터에 따로 저장해 두었다가, 재탐색시에 이전 Data를 사용하는 방법을 이용하였다. 움직임 추정을 수행할 때의 SAD (Sum of absolute differences)연산 부분과 Adder-tree를 묶은 PU Array와 SAD 누적기, 선택기를 Pipelining을 통하여 Clock의 손실 없이 연속적으로 계산하는 움직임 예측기를 설계하였다. 구현한 하드웨어는 최대 446.43MHz의 주파수에서 동작할 수 있었고, 탐색영역 64${\times}$64, 참조 프레임 3, 그리고 영상크기 1920${\times}$1080 기준으로 구현한 결과 50 프레임을 처리할 수 있는 성능을 보였다.

  • PDF

Low Area Hardware Design of Efficient SAO for HEVC Encoder (HEVC 부호기를 위한 효율적인 SAO의 저면적 하드웨어 설계)

  • Cho, Hyunpyo;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.169-177
    • /
    • 2015
  • This paper proposes a hardware architecture for an efficient SAO(Sample Adaptive Offset) with low area for HEVC(High Efficiency Video Coding) encoder. SAO is a newly adopted technique in HEVC as part of the in-loop filter. SAO reduces mean sample distortion by adding offsets to reconstructed samples. The existing SAO requires a great deal of computational and processing time for UHD(Ultra High Definition) video due to sample by sample processing. To reduce SAO processing time, the proposed SAO hardware architecture processes four samples simultaneously, and is implemented with a 2-step pipelined architecture. In addition, to reduce hardware area, it has a single architecture for both luma and chroma components and also uses optimized and common operators. The proposed SAO hardware architecture is designed using Verilog HDL(Hardware Description Language), and has a total of 190k gates in TSMC $0.13{\mu}m$ CMOS standard cell library. At 200MHz, it can support 4K UHD video encoding at 60fps in real time, but operates at a maximum of 250MHz.

Efficient Intra Predictor Design for H.264/AVC Decoder (H.264/AVC 복호기를 위한 효율적인 인트라 예측기 설계)

  • Kim, Ok;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.175-178
    • /
    • 2009
  • H.264/AVC is a video coding standard of ITU-T and ISO/IEC, and widely spreads its application due to its high compression ratio more than twice that of MPEG-2 and high image quality. In this paper, we explained Intra Prediction in H.264/AVC, which is able to achieve higher compressing efficiency from correlation removal of adjacent samples in spatial domain, and proposed efficient Intra Predictor architecture design for H.264/AVC decoder. The proposed system reduced computation cycle using processing element and precomputation processing element and also reduced the number of access to external memory using efficient register. We designed the proposed system with Verilog-HDL and verified with suitable test vector. The proposed Intra Predictor achieved about 60% cycle reduction comparing with existing Intra Predictors.

  • PDF

Low-area FFT Processor Structure using Common Sub-expression Sharing (Common Sub-expression Sharing을 사용한 저면적 FFT 프로세서 구조)

  • Jang, Young-Beom;Lee, Dong-Hoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.4
    • /
    • pp.1867-1875
    • /
    • 2011
  • In this paper, a low-area 256-point FFT structure is proposed. For low-area implementation CSD(Canonic Signed Digit) multiplier method is chosen. Because multiplication type should be less for efficient CSD multiplier application to the FFT structure, the Radix-$4^2$ algorithm is chosen for those purposes. After, in the proposed structure, the number of multiplication type is minimized in each multiplication block, the CSD multipliers are applied for implementation of multiplication. Furthermore, in CSD multiplier implementation, cell-area is more reduced through common sub-expression sharing(CSS). The Verilog-HDL coding result shows 29.9% cell area reduction in the complex multiplication part and 12.54% cell area reduction in overall 256-point FFT structure comparison with those of the conventional structure.

Design and Analysis of a Linear Systolic Array for Modular Exponentation in GF(2m) (GF(2m) 상에서 모듈러 지수 연산을 위한 선형 시스톨릭 어레이 설계 및 분석)

  • Lee, Won-Ho;Lee, Geon-Jik;Yu, Gi-Yeong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.7
    • /
    • pp.743-751
    • /
    • 1999
  • 공개키 암호 시스템에서 모듈러 지수 연산은 주된 연산으로, 이 연산은 내부적으로 모듈러 곱셈을 반복적으로 수행함으로써 계산된다. 본 논문에서는 GF(2m)상에서 수행할 수 있는 Montgomery 알고리즘을 분석하여 right-to-left 방식의 모듈러 지수 연산에서 공통으로 계산 가능한 부분을 이용하여 모듈러 제곱과 모듈러 곱셈을 동시에 수행하는 선형 시스톨릭 어레이를 설계한다. 본 논문에서 설계한 시스톨릭 어레이는 기존의 곱셈기보다 모듈러 지수 연산시 약 0.67배 처리속도 향상을 가진다. 그리고, VLSI 칩과 같은 하드웨어로 구현함으로써 IC 카드에 이용될 수 있다.Abstract One of the main operations for the public key cryptographic system is the modular exponentiation, it is computed by performing the repetitive modular multiplications. In this paper, we analyze Montgomery's algorithm and design a linear systolic array to perform modular multiplication and modular squaring simultaneously. It is done by using common-multiplicand modular multiplication in the right-to-left modular exponentiation over GF(2m). The systolic array presented in this paper improves about 0.67 times than existing multipliers for performing the modular exponentiation. It could be designed on VLSI hardware and used in IC cards.