• Title/Summary/Keyword: 덧셈기

Search Result 164, Processing Time 0.023 seconds

Floating Point Converter Design Supporting Double/Single Precision of IEEE754 (IEEE754 단정도 배정도를 지원하는 부동 소수점 변환기 설계)

  • Park, Sang-Su;Kim, Hyun-Pil;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.10
    • /
    • pp.72-81
    • /
    • 2011
  • In this paper, we proposed and designed a novel floating point converter which supports single and double precisions of IEEE754 standard. The proposed convertor supports conversions between floating point number single/double precision and signed fixed point number(32bits/64bits) as well as conversions between signed integer(32bits/64bits) and floating point number single/double precision and conversions between floating point number single and double precisions. We defined a new internal format to convert various input types into one type so that overflow checking could be conducted easily according to range of output types. The internal format is similar to the extended format of floating point double precision defined in IEEE754 2008 standard. This standard specifies that minimum exponent bit-width of the extended format of floating point double precision is 15bits, but 11bits are enough to implement the proposed converting unit. Also, we optimized rounding stage of the convertor unit so that we could make it possible to operate rounding and represent correct negative numbers using an incrementer instead an adder. We designed single cycle data path and 5 cycles data path. After describing the HDL model for two data paths of the convertor, we synthesized them with TSMC 180nm technology library using Synopsys design compiler. Cell area of synthesis result occupies 12,886 gates(2 input NAND gate), and maximum operating frequency is 411MHz.

Optimization of Approximate Modular Multiplier for R-LWE Cryptosystem (R-LWE 암호화를 위한 근사 모듈식 다항식 곱셈기 최적화)

  • Jae-Woo, Lee;Youngmin, Kim
    • Journal of IKEEE
    • /
    • v.26 no.4
    • /
    • pp.736-741
    • /
    • 2022
  • Lattice-based cryptography is the most practical post-quantum cryptography because it enjoys strong worst-case security, relatively efficient implementation, and simplicity. Ring learning with errors (R-LWE) is a public key encryption (PKE) method of lattice-based encryption (LBC), and the most important operation of R-LWE is the modular polynomial multiplication of rings. This paper proposes a method for optimizing modular multipliers based on approximate computing (AC) technology, targeting the medium-security parameter set of the R-LWE cryptosystem. First, as a simple way to implement complex logic, LUT is used to omit some of the approximate multiplication operations, and the 2's complement method is used to calculate the number of bits whose value is 1 when converting the value of the input data to binary. We propose a total of two methods to reduce the number of required adders by minimizing them. The proposed LUT-based modular multiplier reduced both speed and area by 9% compared to the existing R-LWE modular multiplier, and the modular multiplier using the 2's complement method reduced the area by 40% and improved the speed by 2%. appear. Finally, the area of the optimized modular multiplier with both of these methods applied was reduced by up to 43% compared to the previous one, and the speed was reduced by up to 10%.

An Efficient Hardware Design of Intra Predictor for High Performance HEVC Decoder (고성능 HEVC 복호기를 위한 화면내 예측기의 효율적인 하드웨어 설계)

  • Jung, Hongkyun;Kang, Sukmin;Ryoo, Kwangki
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.668-671
    • /
    • 2012
  • 본 논문에서는 차세대 비디오 압축 표준인 HEVC(High Efficiency Video Coding) 복호기의 연산량과 하드웨어 면적을 감소시키기 위하여 화면내 예측 하드웨어 구조를 제안한다. 제안하는 하드웨어 구조는 공통 수식에 대한 연산을 공유하는 공유 연산기를 사용하여 연산량 및 연산기 개수를 감소시키고, $4{\times}4$ PU와 $64{\times}64$ PU의 필터링 수행 여부에 대한 연산을 수행하지 않고 나머지 PU에 대해서는 LUT를 이용하여 연산을 수행하기 때문에 연산량 및 연산 시간을 감소시킨다. 또한 하나의 공통 연산기만을 사용하여 예측 픽셀을 생성하기 때문에 하드웨어 면적이 감소한다. 제안하는 구조를 TSMC 0.18um 공정을 이용하여 합성한 결과 최대 동작 주파수는 100MHz이고, 게이트 수는 140,697이다. $4{\times}4$ PU를 기준으로 제안하는 구조의 처리 사이클 수는 11 사이클로 기존 구조 대비 54% 감소하였고, 16개 참조 픽셀의 필터링 처리를 기준으로 제안하는 구조의 덧셈 연산기 개수는 37개로 표준 draft 6에 비해 22.9% 감소하였다.

Optimal Bit-level Arithmetic Optimization for High-Speed Circuits (고속 회로를 위한 비트 단위의 연산 최적화)

  • 엄준형;김영태;김태환;여준기;홍성백
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04a
    • /
    • pp.21-23
    • /
    • 2000
  • 고속 회로 합성에 있어서, Wallace 트리 스타일은 연산을 위한 가장 효율적인 수행방식의 하나로 인식되어 졌다. 그러나, 이러한 방법은 빠른 곱셈기의 수행이나 여러 가지 연산수행에 있어, 입력 시그널을 고려하지 않은 일반적인 구조로 수행되어졌다. 본 논문은 연산기에 있어서 이러한 제한점을 극복하는 문제를 다룬다. 우리는 캐리-세이브 방법을 덧셈, 뺄셈, 곱셈이 혼합되어 일T는 일반적인 연산 회로에 적용한다. 그 결과 효율적인 회로를 생성하며, 시그널들이 임의의 도달시간에 대해 회로의 도달시간을 최적화 한다. 또한, 우리는 최적 지연시간의 캐리-세이브 가산회로를 생성하는 효율적인 알고리즘을 제안하였다. 우리는 이러한 최적화 방법을 여러 고속 디지털 필터에 적용시켜 보았고 이는 기존의 비트 단위가 아닌 캐리-세이브 수행방법보다 5%에서 30%사이의 수행시간 향상을 가져왔다.

  • PDF

A Theoretical Consideration of Complex Processor Using RNS (Residue 수체계에 의한 복소 프로세서의 이론적 고찰)

  • Kim, Duck-Hyun;Kim, Jae-Kong
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.20 no.6
    • /
    • pp.69-74
    • /
    • 1983
  • This paper discussed the high speed complex multiplier based on the Residue Number System (RNS) using combinational logic circuits. In addition, the sigil determination and overflow correction problem in residue addition has been studied. The estimated multiplication time of considered processor were about 53.15 ns.

  • PDF

Cell array multiplier in GF(p$^{m}$ ) using Current mode CMOS (전류모드 CMOS를 이용한 GF(P$^{m}$ )상의 셀 배열 승산기)

  • 최재석
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.3
    • /
    • pp.102-109
    • /
    • 2001
  • In this paper, a new multiplication algorithm which describes the methods of constructing a multiplierover GF(p$^{m}$ ) was presented. For the multiplication of two elements in the finite field, the multiplication formula was derived. Multiplier structures which can be constructed by this formula were considered as well. For example, both GF(3) multiplication module and GF(3) addition module were realized by current-mode CMOS technology. By using these operation modules the basic cell used in GF(3$^{m}$ ) multiplier was realized and verified by SPICE simulation tool. Proposed multipliers consisted of regular interconnection of simple cells use regular cellular arrays. So they are simply expansible for the multiplication of two elements in the finite field increasing the degree m.

  • PDF

A 8b 1GS/s Fractional Folding-Interpolation ADC with a Novel Digital Encoding Technique (새로운 디지털 인코딩 기법을 적용한 8비트 1GS/s 프랙셔널 폴딩-인터폴레이션 ADC)

  • Choi, Donggwi;Kim, Daeyun;Song, Minkyu
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.1
    • /
    • pp.137-147
    • /
    • 2013
  • In this paper, an 1.2V 8b 1GS/s A/D Converter(ADC) based on a folding architecture with a resistive interpolation technique is described. In order to overcome the asymmetrical boundary-condition error of conventional folding ADCs, a novel scheme with an odd number of folding blocks and a fractional folding rate are proposed. Further, a new digital encoding technique with an arithmetic adder is described to implement the proposed fractional folding technique. The proposed ADC employs an iterating offset self-calibration technique and a digital error correction circuit to minimize device mismatch and external noise The chip has been fabricated with a 1.2V 0.13um 1-poly 6-metal CMOS technology. The effective chip area is $2.1mm^2$ (ADC core : $1.4mm^2$, calibration engine : $0.7mm^2$) and the power dissipation is about 350mW including calibration engine at 1.2V power supply. The measured result of SNDR is 46.22dB, when Fin = 10MHz at Fs = 1GHz. Both the INL and DNL are within 1LSB with the self-calibration circuit.

VLSI Design of DWT-based Image Processor for Real-Time Image Compression and Reconstruction System (실시간 영상압축과 복원시스템을 위한 DWT기반의 영상처리 프로세서의 VLSI 설계)

  • Seo, Young-Ho;Kim, Dong-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.102-110
    • /
    • 2004
  • In this paper, we propose a VLSI structure of real-time image compression and reconstruction processor using 2-D discrete wavelet transform and implement into a hardware which use minimal hardware resource using ASIC library. In the implemented hardware, Data path part consists of the DWT kernel for the wavelet transform and inverse transform, quantizer/dequantizer, the huffman encoder/huffman decoder, the adder/buffer for the inverse wavelet transform, and the interface modules for input/output. Control part consists of the programming register, the controller which decodes the instructions and generates the control signals, and the status register for indicating the internal state into the external of circuit. According to the programming condition, the designed circuit has the various selective output formats which are wavelet coefficient, quantization coefficient or index, and Huffman code in image compression mode, and Huffman decoding result, reconstructed quantization coefficient, and reconstructed wavelet coefficient in image reconstructed mode. The programming register has 16 stages and one instruction can be used for a horizontal(or vertical) filtering in a level. Since each register automatically operated in the right order, 4-level discrete wavelet transform can be executed by a programming. We synthesized the designed circuit with synthesis library of Hynix 0.35um CMOS fabrication using the synthesis tool, Synopsys and extracted the gate-level netlist. From the netlist, timing information was extracted using Vela tool. We executed the timing simulation with the extracted netlist and timing information using NC-Verilog tool. Also PNR and layout process was executed using Apollo tool. The Implemented hardware has about 50,000 gate sizes and stably operates in 80MHz clock frequency.

High Speed Modular Multiplication Algorithm for RSA Cryptosystem (RSA 암호 시스템을 위한 고속 모듈라 곱셈 알고리즘)

  • 조군식;조준동
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.3C
    • /
    • pp.256-262
    • /
    • 2002
  • This paper presents a novel radix-4 modular multiplication algorithm based on the sign estimation technique (3). The sign estimation technique detects the sign of a number represented in the form of a carry-sum pair. It can be implemented with 5-bit carry look-ahead adder. The hardware speed of the cryptosystem is dependent on the performance modular multiplication of large numbers. Our algorithm requires only (n/2+3) clock cycle for n bit modulus in performing modular multiplication. Our algorithm out-performs existing algorithm in terms of required clock cycles by a half, It is efficient for modular exponentiation with large modulus used in RSA cryptosystem. Also, we use high-speed adder (7) instead of CPA (Carry Propagation Adder) for modular multiplication hardware performance in fecal stage of CSA (Carry Save Adder) output. We apply RL (Right-and-Left) binary method for modular exponentiation because the number of clock cycles required to complete the modular exponentiation takes n cycles. Thus, One 1024-bit RSA operation can be done after n(n/2+3) clock cycles.

Variable Radix-Two Multibit Coding and Its VLSI Implementation of DCT/IDCT (가변길이 다중비트 코딩을 이용한 DCT/IDCT의 설계)

  • 김대원;최준림
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.12
    • /
    • pp.1062-1070
    • /
    • 2002
  • In this paper, variable radix-two multibit coding algorithm is presented and applied in the implementation of discrete cosine transform(DCT) and inverse discrete cosine transform(IDCT). Variable radix-two multibit coding means the 2k SD (signed digit) representation of overlapped multibit scanning with variable shift method. SD represented by 2k generates partial products, which can be easily implemented with shifters and adders. This algorithm is most powerful for the hardware implementation of DCT/IDCT with constant coefficient matrix multiplication. This paper introduces the suggested algorithm, it's proof and the implementation of DCT/IDCT The implemented IDCT chip with 8 PEs(Processing Elements) and one transpose memory runs at a tate of 400 Mpixels/sec at 54MHz frequency for high speed parallel signal processing, and it's verified in HDTV and MPEG decoder.