• Title/Summary/Keyword: Multiplier-Accumulator

Search Result 30, Processing Time 0.268 seconds

A Design of 24-bit Floating Point MAC Unit for Transformation of 3D Graphics (3차원 그래픽의 트랜스포메이션을 위한 24-bit 부동 소수점 MAC 연산기의 설계)

  • Lee, Jungwoo;Kim, Woojin;Kim, Kichul
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.4 no.1
    • /
    • pp.1-8
    • /
    • 2009
  • This paper proposes a 24-bit floating point multiply and accumulate(MAC) unit that can be used in geometry transformation process in 3D graphics. The MAC unit is composed of floating point multiplier and floating point accumulator. When separate multiplier and accumulator are used, matrix calculation, used in the transformation process, can't use continuous accumulation values. In the proposed MAC unit the accumulator can get continuous input from the multiplier and the calculation time is reduced. The MAC unit uses about 4,300 gates and can be operated at 150 MHz frequency.

  • PDF

A Efficient Architecture of MBA-based Parallel MAC for High-Speed Digital Signal Processing (고속 디지털 신호처리를 위한 MBA기반 병렬 MAC의 효율적인 구조)

  • 서영호;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.7
    • /
    • pp.53-61
    • /
    • 2004
  • In this paper, we proposed a new architecture of MAC(Multiplier-Accumulator) to operate high-speed multiplication-accumulation. We used the MBA(Modified radix-4 Booth Algorithm) which is based on the 1's complement number system, and CSA(Carry Save Adder) for addition of the partial products. During the addition of the partial product, the signed numbers with the 1's complement type after Booth encoding are converted in the 2's complement signed number in the CSA tree. Since 2-bit CLA(Carry Look-ahead Adder) was used in adding the lower bits of the partial product, the input bit width of the final adder and whole delay of the critical path were reduced. The proposed MAC was applied into the DWT(Discrete Wavelet Transform) filtering operation for JPEG2000, and it showed the possibility for the practical application. Finally we identified the improved performance according to the comparison with the previous architecture in the aspect of hardware resource and delay.

On the Digital Implementation of the Sigmoid function (시그모이드 함수의 디지털 구현에 관한 연구)

  • 이호선;홍봉화
    • The Journal of Information Technology
    • /
    • v.4 no.3
    • /
    • pp.155-163
    • /
    • 2001
  • In this paper, we implemented sigmoid active function which make it difficult to design of the digital neuron networks. Therefore, we designed of the high speed processing of the sigmoid function in order to digital neural networks. we designed of the MAC(Multiplier and Accumulator) operation unit used residue number system without carry propagation for the high speed operation. we designed of MAC operation unit and sigmoid processing unit are proved that it could run of the high speed. On the simulation, the faster than 4.6ns on the each order, we expected that it adapted to the implementation of the high speed digital neural network.

  • PDF

New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm (Radix-2 MBA 기반 병렬 MAC의 VLSI 구조)

  • Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.4
    • /
    • pp.94-104
    • /
    • 2008
  • In this paper, we propose a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator which has the largest delay in MAC was removed and its function was included into CSA, the overall performance becomes to be elevated. The proposed CSA tree uses 1's complement-based radix-2 modified booth algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of operands. The CSA propagates the carries by the least significant bits of the partial products and generates the least significant bits in advance for decreasing the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits not the output of the final adder for improving the performance by optimizing the efficiency of pipeline scheme. The proposed architecture was synthesized with $250{\mu}m,\;180{\mu}m,\;130{\mu}m$ and 90nm standard CMOS library after designing it. We analyzed the results such as hardware resource, delay, and pipeline which are based on the theoretical and experimental estimation. We used Sakurai's alpha power low for the delay modeling. The proposed MAC has the superior properties to the standard design in many ways and its performance is twice as much than the previous research in the similar clock frequency.

Design of an Optimized 32-bit Multiplier for RSA Cryptoprocessors (RSA 암호화 프로세서에 최적화한 32비트 곱셈기 설계)

  • Moon, Sang-Ook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.1
    • /
    • pp.75-80
    • /
    • 2009
  • RSA cryptoprocessors equipped with more than 1024 bits of key space handle the entire key stream in units of blocks. The RSA processor which will be the target design in this paper defines the length of the basic word as 128 bits, and uses an 256-bits register as the accumulator. For efficient execution of 128-bit multiplication, 32b*32b multiplier was designed and adopted and the results are stored in 8 separate 128-bit registers according to the status flag. In this paper, a fast 32bit modular multiplier which is required to execute 128-bit MAC (multiplication and accumulation) operation is proposed. The proposed architecture prototype of the multiplier unit was automatically synthesized, and successfully operated at the frequency in the target RSA processor.

Study of a 32-bit Multiplier Suitable for Reconfigurable Cryptography Processor (재구성 가능한 암호화 프로세서에 적합한 32비트 곱셈기의 연구)

  • Moon, San-Gook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.740-743
    • /
    • 2008
  • RSA crypto-processors equipped with more than 1024 bits of key space handle the entire key stream in units of blocks. The RSA processor which will be the target design in this paper defines the length of the basic word as 128 bits, and uses an 256-bits register as the accumulator. For efficient execution of 128-bit multiplication, $32b^*32b$ multiplier was designed and adopted and the results are stored in 8 separate 128-bit registers according to the stalks flag. In this paper, a fast 32bit nodular multiplier which is required to execute 128-bit MAC (multiplication and accumulation) operation is proposed. The proposed architecture prototype of the multiplier unit was automatically synthesized, and successfully operated at the frequency in the target RSA processor.

  • PDF

A Study on Multiplier Architectures Optimized for 32-bit RISC Processor with 3-Stage Pipeline (32비트 3단 파이프라인을 가진 RISC 프로세서에 최적화된 Multiplier 구조에 관한 연구)

  • 정근영;박주성;김석찬
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.11
    • /
    • pp.123-130
    • /
    • 2004
  • This paper describes a multiplier architecture optimized for 32 bit RISC processor with 3-stage pipeline. The multiplier of ARM7, the target processor, is variably carried out on the execution stage of pipeline within 7 cycles. The included multiplier employs a modified Booth's algerian to produce 64 bit multiplication and addition product and it has 6 separate instructions. We analyzed several multiplication algorithm such as radix4-32${\times}$8, radix4-32${\times}$16 and radix8-32${\times}$32 to decide which multiplication architecture is most fit for a typical architecture of ARM7. VLSI area, cycle delay time and execution cycle number is the index of an efficient design and the final multiplier was designed on these indexes. To verify the operation of embedded multiplier, it was simulated with various audio algorithms.

On the Implementation of the Digital Neuron Processor (디지탈 뉴런프로세서의 구현에 관한 연구)

  • 홍봉화;이지영
    • Journal of the Korea Society of Computer and Information
    • /
    • v.4 no.2
    • /
    • pp.27-38
    • /
    • 1999
  • This paper proposes a high speed digital neuron processor which uses the residue number system, making the high speed operation possible without carry propagation,. Consisting of the MAC(Multiplier and with Accumulator) operation unit, quotient operation unit and sigmoid function operation unit, the neuron processor is designed through 0.8$\mu$m CMOS fabrication. The result shows that the new implemented neuron processor can run at the speed of 19.2 nSec and the size can be reduced to 1/2 compared to the neuron processor implemented by the real number operation unit.

  • PDF

FPGA Implementation of Real Time Image Compression CODEC Using Wavelet Transform (2차원 이산 웨이블릿 변환을 이용한 실시간 영상압축 코덱의 FPGA 구현)

  • 서영호;김왕현;김종현;김동욱
    • Proceedings of the IEEK Conference
    • /
    • 2001.06d
    • /
    • pp.49-52
    • /
    • 2001
  • This paper presents a FPGA Implementation of wavelet-based CODEC, which can compress 2-dimensional image. For real-time processing, a scheduling method of input image data is proposed and a new structure of MAC(multiplier-accumulator) is proposed for wavelet transforms. Also this study proposes global pipelining structure of wavelet CODEC and efficient buffering method at interfaces between each module with different clock frequency.

  • PDF

Design of Audio Sampling Rate Conversion Block (Audio Sampling Rate Conversion Block의 설계)

  • 정혜진;심윤정;이승준
    • Proceedings of the IEEK Conference
    • /
    • 2003.07b
    • /
    • pp.827-830
    • /
    • 2003
  • This paper proposes an area-efficient FIR filter architecture for sampling rate conversion of hi-fi audio data. Sampling rate conversion(SRC) block converts audio data sampled at 96KHz down to 48KHz sampled data and vice versa. 63-tap FIR filter coefficients have been synthesized that gives 100dB stop band attenuation and 5.2KHz transition bandwidth. Time-shared filter architecture requires only one multiplier and accumulator for 63-tap filter operation. This results in huge hardware saving of up to 10~19 times smaller compared with traditional FIR structure.

  • PDF