• Title/Summary/Keyword: Multiplier-Accumulator

Search Result 30, Processing Time 0.024 seconds

Design of FIR System and Hilbert Transformer Having Ability of Selecting Filter Length (필터 Length를 가변할 수 있는 FIR 디지털 필터 및 힐버트 변환기의 설계)

  • Kim, Se-Jung;Hwang, Ho-Jung
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.567-570
    • /
    • 1988
  • This paper describes the design of FIR filtering DSP-chip that can be operated without programming. The proposed DSP-chip has not only the improvement of execution time but also selectivity of filter length from N=1 to N=128. Hilbert Transformer can be designed from this chip. FIR filter system is composed of Data memory/Control Unit, external memory and multiplier-accumulator. Data memory/Control Unit is laid out in this paper.

  • PDF

An Optimized Hybrid Radix MAC Design (최적화된 4진18진 혼합 MAC 설계)

  • 정진우;김승철;이용주;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.173-176
    • /
    • 2002
  • This paper is about a high-speed MAC (multiplier and accumulator) design applying radix-4 and radix-8 Booth's algorithm at the same time. The optimized hybrid radix design for high speed MAC has taken advantage of both a radix-4 and a radix-8 architectures. A radix-4 architecture meets high-speed, but it takes much more power and chip area than a radix-8 architecture. A radix-8 architecture needs less power and chip area than the other, but it has a bottleneck of generating three times the multiplicand problem. An optimized hybrid architecture performs the radix-4 multiplication partially in parallel with the generation of three times the multiplicand for use of the radix-8 multiplication. It reduces the concerned bit width of multiplier in radix-8 multiplication.

  • PDF

An Optimized Hybrid Radix MAC Design (최적화된 4진/8진 혼합 MAC 설계)

  • 정진우;김승철;이용주;이용석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06a
    • /
    • pp.125-128
    • /
    • 2002
  • This paper is about a high-speed MAC (multiplier and accumulator) design applying radix-4 and radix-8 Booth's algorithm at the same time. The optimized hybrid radix design for high speed MAC has taken advantage of both a radix-4 and a radix-8 architectures. A radix-4 architecture meets high-speed, but it takes much more power and chip area than a radix-8 architecture. A radix-8 architecture needs less power and chip area than the other, but it has a bottleneck of generating three times the multiplicand problem. An optimized hybrid architecture performs tile radix-4 multiplication partially in parallel with the generation of three times the multiplicand for use of tile radix-8 multiplication. It reduces the concerned bit width of multiplier in radix-8 multiplication.

  • PDF

A New Complex-Number Multiplication Algorithm using Radix-4 Booth Recoding and RB Arithmetic, and a 10-bit CMAC Core Design (Radix-4 Booth Recoding과 RB 연산을 이용한 새로운 복소수 승산 알고리듬 및 10-bit CMAC코어 설계)

  • 김호하;신경욱
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.9
    • /
    • pp.11-20
    • /
    • 1998
  • High-speed complex-number arithmetic units are essential to baseband signal processing of modern digital communication systems such as channel equalization, timing recovery, modulation and demodulation. In this paper, a new complex-number multiplication algorithm is proposed, which is based on redundant binary (RB) arithmetic combined with radix-4 Booth recoding scheme. The proposed algorithm reduces the number of partial product by one-half as compared with the conventional direct method using real-number multipliers and adders. It also leads to a highly parallel architecture and simplified circuit, resulting in high-speed operation and low power dissipation. To demonstrate the proposed algorithm, a prototype complex-number multiplier-accumulator (CMAC) core with 10-bit operands has been designed using 0.8-$\mu\textrm{m}$ N-Well CMOS technology. The designed CMAC core contains about 18,000 transistors on the area of about 1.60 ${\times}$ 1.93 $\textrm{mm}^2$. The functional and speed test results show that it can operate with 120-MHz clock at V$\sub$DD/=3.3-V, and its power consumption is given to about 63-mW.

  • PDF

A 200-MHZ@2.5-V Dual-Mode Multiplier for Single / Double -Precision Multiplications (단정도/배정도 승산을 위한 200-MHZ@2.5-V 이중 모드 승산기)

  • 이종남;박종화;신경욱
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.5
    • /
    • pp.1143-1150
    • /
    • 2000
  • A dual-mode multiplier (DMM) that performs single- and double-precision multiplications has been designed using a $0.25-\mum$ 5-metal CMOS technology. An algorithm for efficiently implementing double-precision multiplication with a single-precision multiplier was proposed, which is based on partitioning double-precision multiplication into four single-precision sub-multiplications and computing them with sequential accumulations. When compared with conventional double-precision multipliers, our approach reduces the hardware complexity by about one third resulting in small silicon area and low-power dissipation at the expense of increased latency and throughput cycles. The DMM consists of a $28-b\times28-b$ single-precision multiplier designed using radix-4 Booth receding and redundant binary (RB) arithmetic, an accumulator and a simple control logic for mode selection. It contains about 25,000 transistors on the area of about $0.77\times0.40-m^2$. The HSPICE simulation results show that the DMM core can safely operate with 200-MHZ clock at 2.5-V, and its estimated power dissipation is about 130-㎽ at double-precision mode.

  • PDF

A Pipelined Parallel Optimized Design for Convolution-based Non-Cascaded Architecture of JPEG2000 DWT (JPEG2000 이산웨이블릿변환의 컨볼루션기반 non-cascaded 아키텍처를 위한 pipelined parallel 최적화 설계)

  • Lee, Seung-Kwon;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.7
    • /
    • pp.29-38
    • /
    • 2009
  • In this paper, a high performance pipelined computing design of parallel multiplier-temporal buffer-parallel accumulator is present for the convolution-based non-cascaded architecture aiming at the real time Discrete Wavelet Transform(DWT) processing. The convolved multiplication of DWT would be reduced upto 1/4 by utilizing the filter coefficients symmetry and the up/down sampling; and it could be dealt with 3-5 times faster computation by LUT-based DA multiplication of multiple filter coefficients parallelized for product terms with an image data. Further, the reutilization of computed product terms could be achieved by storing in the temporal buffer, which yields the saving of computation as well as dynamic power by 50%. The convolved product terms of image data and filter coefficients are realigned and stored in the temporal buffer for the accumulated addition. Then, the buffer management of parallel aligned storage is carried out for the high speed sequential retrieval of parallel accumulations. The convolved computation is pipelined with parallel multiplier-temporal buffer-parallel accumulation in which the parallelization of temporal buffer and accumulator is optimize, with respect to the performance of parallel DA multiplier, to improve the pipelining performance. The proposed architecture is back-end designed with 0.18um library, which verifies the 30fps throughput of SVGA(800$\times$600) images at 90MHz.

A Study on Optimization Design of MPEG Layer 2 Audio Decoder for Digital Broadcasting (디지털 방송용 MPEG Layer 2 오디오 복호기의 최적화 설계에 관한 연구)

  • 박종진;조원경
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.37 no.5
    • /
    • pp.48-55
    • /
    • 2000
  • Recently due to rapid improvement of integrated circuit design environment, size of IC design is to become large to possible design System on Chip(SoC) that one chip with multi function enclosed. Also cause to this rapid change, consumption market is require to spend smallest time for new product development. In this paper to propose a methodology can design a large size IC for save time and applied to design of MPEG Layer 2 decoder to can use audio receiver in digital broadcast system. The digital broadcast audio decoder in this paper is pointed to save hardware size as optimizing algorithm. MPEG Layer 2 decoder algorithm is include MAC to can have an effect on hardware size. So coefficients are using sign digit expression. It is for hardware optimization. If using this method can design MAC without multiplier. The designed audio decoder is using 14,000 gates hardware size and save 22% (4000 gates) hardware usage than using multiplier. Also can design MPEG Layer 2 decoder usable digital broadcast receiver for short time.

  • PDF

A Design of 16${\times}$16-bit Redundant Binary MAC Using 0.25 ${\mu}{\textrm}{m}$ CMOS Technology

  • Kim, Tae-Min;Shin, Gun-Soon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.1
    • /
    • pp.122-128
    • /
    • 2003
  • In this paper, a 16${\times}$16-bit Multiplier and Accumulator (MAC) is designed using a Redundant Binary Adder (RBA) circuit so that it can make a fast addition of the Redundant Binary Partial Products (RB_PP's) by using Wallace-tree structure. Because a RBA adds two RB numbers, it acts as a 4-2 compressor, which reduces four inputs to two output signals. We propose a method to convert the Redundant Binary (RB) representation into the 2's complement binary representation. Instead of using the conventional full adders, a more efficient RB number to binary number converter can be designed with new conversion method.

Constraint Algorithm in Double-Base Number System for High Speed A/D Converters

  • Nguyen, Minh Son;Kim, Man-Ho;Kim, Jong-Soo
    • Journal of Electrical Engineering and Technology
    • /
    • v.3 no.3
    • /
    • pp.430-435
    • /
    • 2008
  • In the paper, an algorithm called a Constraint algorithm is proposed to solve the fan-in problem occurred in ADC encoding circuits. The Flash ADC architecture uses a double-base number system (DBNS). The DBNS has known to represent the multi-dimensional logarithmic number system (MDLNS) used for implementing the multiplier accumulator architecture of FIR filter in digital signal processing (DSP) applications. The authors use the DBNS with the base 2 and 3 to represent binary output of ADC. A symmetric map is analyzed first, and then asymmetric map is followed to provide addition read DBNS to DSP circuitry. The simulation results are shown for the Double-Base Integer Encoder (DBIE) of the 6-bit ADC to demonstrate an effectiveness of the Constraint algorithm, using $0.18{\mu}\;m$ CMOS technology. The DBIE’s processing speed of the ADC is fast compared to the FAT tree encoder circuit by 0.95 GHz.

A High-Security RSA Cryptoprocessor Embedded with an Efficient MAC Unit

  • Moon, Sang-Ook
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.4
    • /
    • pp.516-520
    • /
    • 2009
  • RSA crypto-processors equipped with more than 1024 bits of key space handle the entire key stream in units of blocks. The RSA processor which will be the target design in this paper defines the length of the basic word as 128 bits, and uses an 256-bits register as the accumulator. For efficient execution of 128-bit multiplication, 32b*32b multiplier was designed and adopted and the results are stored in 8 separate 128-bit registers according to the status flag. In this paper, an efficient method to execute 128-bit MAC (multiplication and accumulation) operation is proposed. The suggested method pre-analyzed the all possible cases so that the MAC unit can remove unnecessary calculations to speed up the execution. The proposed architecture prototype of the MAC unit was automatically synthesized, and successfully operated at 20MHz, which will be the operation frequency in the RSA processor.