• Title/Summary/Keyword: 곱셈 구조

Search Result 342, Processing Time 0.026 seconds

Design of Systolic Multipliers in GF(2$^{m}$ ) Using an Irreducible All One Polynomial (기약 All One Polynomial을 이용한 유한체 GF(2$^{m}$ )상의 시스톨릭 곱셈기 설계)

  • Gwon, Sun Hak;Kim, Chang Hun;Hong, Chun Pyo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.8C
    • /
    • pp.1047-1054
    • /
    • 2004
  • In this paper, we present two systolic arrays for computing multiplications in CF(2$\^$m/) generated by an irreducible all one polynomial (AOP). The proposed two systolic mays have parallel-in parallel-out structure. The first systolic multiplier has area complexity of O(㎡) and time complexity of O(1). In other words, the multiplier consists of m(m+1)/2 identical cells and produces multiplication results at a rate of one every 1 clock cycle, after an initial delay of m/2+1 cycles. Compared with the previously proposed related multiplier using AOP, our design has 12 percent reduced hardware complexity and 50 percent reduced computation delay time. The other systolic multiplier, designed for cryptographic applications, has area complexity of O(m) and time complexity of O(m), i.e., it is composed of m+1 identical cells and produces multiplication results at a rate of one every m/2+1 clock cycles. Compared with other linear systolic multipliers, we find that our design has at least 43 percent reduced hardware complexity, 83 percent reduced computation delay time, and has twice higher throughput rate Furthermore, since the proposed two architectures have a high regularity and modularity, they are well suited to VLSI implementations. Therefore, when the proposed architectures are used for GF(2$\^$m/) applications, one can achieve maximum throughput performance with least hardware requirements.

Multiple-Valued Logic Multiplier for System-On-Panel (System-On-Panel을 위한 다치 논리 곱셈기 설계)

  • Hong, Moon-Pyo;Jeong, Ju-Young
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.2
    • /
    • pp.104-112
    • /
    • 2007
  • We developed a $7{\times}7$ parallel multiplier using LTPS-TFT. The proposed multiplier has multi-valued logic 7-3 Compressor with folding, 3-2 Compressor, and final carry propagation adder. Architecture minimized the carry propagation. And power consumption reduced by switching the current source to the circuit which is operated in current mode. The proposed multiplier improved PDP by 23%, EDP by 59%, and propagation delay time by 47% compared with Wallace Tree multiplier.

A Truncated Booth Multiplier Architecture for Low Power Design (저전력 설계를 위한 전달된 Booth 곱셈기 구조)

  • Lee, Kwang-Hyun;Park, Chong-Suck
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.9
    • /
    • pp.55-65
    • /
    • 2000
  • In this paper, we propose a hardware reduced multiplier for DSP applications. In many DSP applications, all of multiplier products were not used, but only upper bits of product were used. Kidambi proposed truncated unsigned multiplier for this idea. in this paper, we adopt this scheme to Booth multiplier which can be used real DSP systems. Also, zero input guarantees zero output that was not provided in previous paper. In addition, we propose bit extension scheme to reduce truncation error more and more. And, we adopted this multiplier to FIR filters for more efficient design.

  • PDF

Design of the Multiplier in case of P=2 over the Finite Fields based on the Polynomial (다항식에 기초한 유한체상의 P=2인 경우의 곱셈기 설계)

  • Park, Chun-Myoung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.2
    • /
    • pp.70-75
    • /
    • 2016
  • This paper proposes the constructing method of effective multiplier based on the finite fields in case of P=2. The proposed multiplier is constructed by polynomial arithmetic part, mod F(${\alpha}$) part and modular arithmetic part. Also, each arithmetic parts can extend according to m because of it have modular structure, and it is adopted VLSI because of use AND gate and XOR gate only. The proposed multiplier is more compact, regularity, normalization and extensibility compare with earlier multiplier. Also, it is able to apply several fields in recent hot issue IoT configuration.

The Architecture Design of 32-bit RISC Microprocessor with DSP Functional Unit (DSP 기능 유닛을 내장한 32비트 RISC 마이크로프로세서의 구조 설계)

  • An, Sang-Jun;Jeong, Wook-Kyeong;Kim, Moon-Gyung;Moon, Sang-Ook;Lee, Yong-Surk
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.345-348
    • /
    • 1999
  • 본 논문에서는 내장형 응용에 적합한 RISC 마이크로프로세서와 DSP 프로세서의 기능을 유기적으로 결합한 구조를 연구하고 이를 설계한다. 프로그램의 크기를 줄이기 위해 RISC 명령어는 16비트 명령어 집합을 설계하고 분기 명령어로 인한 손실을 줄이기 위해 한 개의 지연 슬롯을 갖고 있다. DSP 명령어는 32비트 길이를 갖고 한 명령어로 곱셈, 덧셈(뺄셈), 두 가지 데이터 이동을 할 수 있어서 한 사이클에 최대 네 가지 동작을 할 수 있다 파이프라인 단계는 IF, ID, EX, MA, WB/DSP의 다섯 단계로 구성된다. DSP 기능을 지원하기 위해 내부 루프 버퍼를 갖고 정수 실행부에서는 주소 발생을 위한 전용 하드웨어와 DSP 유닛에서는 곱셈 및 누적 기능을 지원하기 위한 17 × 17 비트 곱셈기가 내장된다. 제안된 구조의 설계는 Verilog-HDL을 이용하여 top-down 설계 방식으로 설계되었고 각 기능 검증을 마친 후 3.3V, 0.6㎛ CMOS triple metal single poly 공정을 이용하여 합성하고 레이아웃 하였다.

  • PDF

Design of a Low-Power Parallel Multiplier Using Low-Swing Technique (저 전압 스윙 기술을 이용한 저 전력 병렬 곱셈기 설계)

  • Kim, Jeong-Beom
    • The KIPS Transactions:PartA
    • /
    • v.14A no.3 s.107
    • /
    • pp.147-150
    • /
    • 2007
  • This paper describes a new low-swing inverter for low power consumption. To reduce a power consumption, an output voltage swing is in the range from 0 to VDD-2VTH. This can be done by the inverter structure that allow a full swing or a swing on its input terminal without leakage current. Using this low-swing voltage technology, we proposed a low-power 16$\times$16 bit parallel multiplier. The proposed circuits are designed with Samsung 0.35$\mu$m standard CMOS process at a 3.3V supply voltage. The validity and effectiveness are verified through the HSPICE simulation.. Compared to the previous works, this circuit can reduce the power consumption rate of 17.3% and the power-delay product of 16.5%.

An Efficient Multiplexer-based AB2 Multiplier Using Redundant Basis over Finite Fields

  • Kim, Keewon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.1
    • /
    • pp.13-19
    • /
    • 2020
  • In this paper, we propose a multiplexer based scheme that performs modular AB2 multiplication using redundant basis over finite field. Then we propose an efficient multiplexer based semi-systolic AB2 multiplier using proposed scheme. We derive a method that allows the multiplexers to perform the operations in the cell of the modular AB2 multiplier. The cell of the multiplier is implemented using multiplexers to reduce cell latency. As compared to the existing related structures, the proposed AB2 multiplier saves about 80.9%, 61.8%, 61.8%, and 9.5% AT complexity of the multipliers of Liu et al., Lee et al., Ting et al., and Kim-Kim, respectively. Therefore, the proposed multiplier is well suited for VLSI implementation and can be easily applied to various applications.

Montgomery Multiplier Base on Modified RBA and Hardware Architecture (변형된 RBA를 이용한 몽고메리 곱셈기와 하드웨어 구조)

  • Ji Sung-Yeon;Lim Dae-Sung;Jang Nam-Su;Kim Chang-Han;Lee Sang-Jin
    • Proceedings of the Korea Institutes of Information Security and Cryptology Conference
    • /
    • 2006.06a
    • /
    • pp.351-355
    • /
    • 2006
  • RSA 암호 시스템은 IC카드, 모바일 및 WPKI, 전자화폐, SET, SSL 시스템 등에 많이 사용된다. RSA는 모듈러 지수승 연산을 통하여 수행되며, Montgomery 곱셈기를 사용하는 것이 효율적이라고 알려져 있다. Montgomery 곱셈기에서 임계 경로 지연 시간(Critical Path Delay)은 세 피연산자의 덧셈에 의존하고 캐리 전파를 효율적으로 처리하는 문제는 Montgomery 곱셈기의 효율성에 큰 영향을 미친다. 최근 캐리 전파를 제거하는 방법으로 캐리 저장 덧셈기(Carry Save Adder, CSA)를 사용하는 연구가 계속 되고 있다. McIvor외 세 명은 지수승 연산에 최적인 CSA 3단계로 구성된 Montgomery 곱셈기와 CSA 2단계로 구성된 Montgomery 곱셈기를 제안했다. 시간 복잡도 측면에서 후자는 전자에 비해 효율적이다. 본 논문에서는 후자보다 빠른 연산을 수행하기 위해 캐리 전파 제거 특성을 가진 이진 부호 자리(Signed-Digit, SD) 수 체계를 사용한다. 두 이진 SD 수의 덧셈을 수행하는 잉여 이진 덧셈기(Redundant Binary Adder, RBA)를 새로 제안하고 Montgomery 곱셈기에 적용한다. 기존의 RBA에서 사용하는 이진 SD 덧셈 규칙 대신 새로운 덧셈 규칙을 제안하고 삼성 STD130 $0.18{\mu}m$ 1.8V 표준 셀 라이브러리에서 지원하는 게이트들을 사용하여 설계하고 시뮬레이션 하였다. 그 결과 McIvor의 2 방법과 기존의 RBA보다 최소 12.46%의 속도 향상을 보였다.

  • PDF

Design of Multipliers Optimized for CNN Inference Accelerators (CNN 추론 연산 가속기를 위한 곱셈기 최적화 설계)

  • Lee, Jae-Woo;Lee, Jaesung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1403-1408
    • /
    • 2021
  • Recently, FPGA-based AI processors are being studied actively. Deep convolutional neural networks (CNN) are basic computational structures performed by AI processors and require a very large amount of multiplication. Considering that the multiplication coefficients used in CNN inference operation are all constants and that an FPGA is easy to design a multiplier tailored to a specific coefficient, this paper proposes a methodology to optimize the multiplier. The method utilizes 2's complement and distributive law to minimize the number of bits with a value of 1 in a multiplication coefficient, and thereby reduces the number of required stacked adders. As a result of applying this method to the actual example of implementing CNN in FPGA, the logic usage is reduced by up to 30.2% and the propagation delay is also reduced by up to 22%. Even when implemented with an ASIC chip, the hardware area is reduced by up to 35% and the delay is reduced by up to 19.2%.

New Parallel MDC FFT Processor for Low Computation Complexity (연산복잡도 감소를 위한 새로운 8-병렬 MDC FFT 프로세서)

  • Kim, Moon Gi;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.3
    • /
    • pp.75-81
    • /
    • 2015
  • This paper proposed the new eight-parallel MDC FFT processor using the eight-parallel MDC architecture and the efficient scheduling scheme. The proposed FFT processor supports the 256-point FFT based on the modified radix-$2^6$ FFT algorithm. The proposed scheduling scheme can reduce the number of complex multipliers from eight to six without increasing delay buffers and computation cycles. Moreover, the proposed FFT processor can be used in OFDM systems required high throughput and low hardware complexity. The proposed FFT processor has been designed and implemented with a 90nm CMOS technology. The experimental result shows that the area of the proposed FFT processor is $0.27mm^2$. Furthermore, the proposed eight-parallel MDC FFT processor can achieve the throughput rate up to 2.7 GSample/s at 388MHz.