• Title/Summary/Keyword: low-power multiplier

Search Result 128, Processing Time 0.019 seconds

Novel Radix-26 DF IFFT Processor with Low Computational Complexity (연산복잡도가 적은 radix-26 FFT 프로세서)

  • Cho, Kyung-Ju
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.1
    • /
    • pp.35-41
    • /
    • 2020
  • Fast Fourier transform (FFT) processors have been widely used in various application such as communications, image, and biomedical signal processing. Especially, high-performance and low-power FFT processing is indispensable in OFDM-based communication systems. This paper presents a novel radix-26 FFT algorithm with low computational complexity and high hardware efficiency. Applying a 7-dimensional index mapping, the twiddle factor is decomposed and then radix-26 FFT algorithm is derived. The proposed algorithm has a simple twiddle factor sequence and a small number of complex multiplications, which can reduce the memory size for storing the twiddle factor. When the coefficient of twiddle factor is small, complex constant multipliers can be used efficiently instead of complex multipliers. Complex constant multipliers can be designed more efficiently using canonic signed digit (CSD) and common subexpression elimination (CSE) algorithm. An efficient complex constant multiplier design method for the twiddle factor multiplication used in the proposed radix-26 algorithm is proposed applying CSD and CSE algorithm. To evaluate performance of the previous and the proposed methods, 256-point single-path delay feedback (SDF) FFT is designed and synthesized into FPGA. The proposed algorithm uses about 10% less hardware than the previous algorithm.

Low-power Horizontal DA Filter Structure Using Radix-16 Modified Booth Algorithm (Radix-16 Modified Booth 알고리즘을 이용한 저전력 Horizontal DA 필터 구조)

  • Shin, Ji-Hye;Jang, Young-Beom
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.12
    • /
    • pp.31-38
    • /
    • 2010
  • In tins paper, a new DA(Distributed Arithmetic) tilter implementation technique has been proposed. Contrary to vertical directional calculation of input sample bit format in the conventional DA implementation technique, proposed implementation technique utilizes horizontal directional calculation of input sample bit format. Since proposed technique calculates in horizontal direction, it does not need ROM and utilizes the Modified Booth algorithm. Furthermore proposed technique can be applied to implement the variable coefficients filters in addition to the fixed coefficients filters. Using conventional and proposed techniques, a 20 tap filter is implemented by Verilog-HDL coding. Through Synopsis synthesis tool, it has been shown that 41.6% area reduction can be achieved.

Binary Image Based Fast DoG Filter Using Zero-Dimensional Convolution and State Machine LUTs

  • Lee, Seung-Jun;Lee, Kye-Shin;Kim, Byung-Gyu
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.131-138
    • /
    • 2018
  • This work describes a binary image based fast Difference of Gaussian (DoG) filter using zero-dimensional (0-d) convolution and state machine look up tables (LUTs) for image and video stitching hardware platforms. The proposed approach for using binary images to obtain DoG filtering can significantly reduce the data size compared to conventional gray scale based DoG filters, yet binary images still preserve the key features of the image such as contours, edges, and corners. Furthermore, the binary image based DoG filtering can be realized with zero-dimensional convolution and state machine LUTs which eliminates the major portion of the adder and multiplier blocks that are generally used in conventional DoG filter hardware engines. This enables fast computation time along with the data size reduction which can lead to compact and low power image and video stitching hardware blocks. The proposed DoG filter using binary images has been implemented with a FPGA (Altera DE2-115), and the results have been verified.

Design of Low Area Decimation Filters Using CIC Filters (CIC 필터를 이용한 저면적 데시메이션 필터 설계)

  • Kim, Sunhee;Oh, Jaeil;Hong, Dae-ki
    • Journal of the Semiconductor & Display Technology
    • /
    • v.20 no.3
    • /
    • pp.71-76
    • /
    • 2021
  • Digital decimation filters are used in various digital signal processing systems using ADCs, including digital communication systems and sensor network systems. When the sampling rate of digital data is reduced, aliasing occurs. So, an anti-aliasing filter is necessary to suppress aliasing before down-sampling the data. Since the anti-aliasing filter has to have a sharp transition band between the passband and the stopband, the order of the filter is very high. However, as the order of the filter increases, the complexity and area of the filter increase, and more power is consumed. Therefore, in this paper, we propose two types of decimation filters, focusing on reducing the area of the hardware. In both cases, the complexity of the circuit is reduced by applying the required down-sampling rate in two times instead of at once. In addition, CIC decimation filters without a multiplier are used as the decimation filter of the first stage. The second stage is implemented using a CIC filter and a down sampler with an anti-aliasing filter, respectively. It is designed with Verilog-HDL and its function and implementation are validated using ModelSim and Quartus, respectively.

A Design and Implementation of 32-bit RISC-V RV32IM Pipelined Processor in Embedded Systems (임베디드 환경에서의 32-bit RISC-V RV32IM 파이프라인 프로세서 설계 및 구현)

  • Subin Park;Yongwoo Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.4
    • /
    • pp.81-86
    • /
    • 2023
  • Recently, demand for embedded systems requiring low power and high specifications has been increasing, and RISC-V processors are being widely applied. RISC-V, a RISC-based open instruction set architecture (ISA), has been developed and researched by UC Berkeley and other researchers since 2010. RV32I ISA is sufficient to support integer operations such as addition and subtraction instructions, but M-extension should be defined for multiplication and division instructions. This paper proposes an RV32I, RV32IM processor, and indicates benchmark performance scores compared to an existing processor. Additionally, A non-stalling method was proposed to support a 2-stage pipelined DSP multiplier to the 5-stage pipelined RV32IM processor. Proposed RV32I and RV32IM processors satisfied a maximum operating frequency of 50 MHz on Artix-7 FPGA. The performance of the proposed processors was verified using benchmark programs from Dhrystone and Coremark. As a result, the Coremark benchmark results of the proposed processor showed that it outperformed the existing RV32IM processor by 23.91%.

  • PDF

A New Small Size Digital Optical Ozone Monitor Using CCD Array as a UV Detector (UV 감지기로서 CCD어레이를 사용한 소형 디지털 광 오존모니터)

  • Chung, Wan-Young;Lee, Seung-Chul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.1
    • /
    • pp.158-163
    • /
    • 2008
  • Ozone monitor based on UV techniques has been widely used due to their signal stability. The high concentration ozone monitor for real time ozone monitoring from ozone generator was composed of a low pressure mercury lamp as UV source and a photo multiplier tube as UV detector. The structure could be very useful for low price high concentration ozone monitor and showed good linearity to ozone in the concentration range between 0.05 and 2wt%. For accurate ambient ozone monitoring, the system composed of a high power pulsed xenon lamp as UV source, an optical spectrometer with a high sensitivity linear CCD array as UV detector. The optical signal form the CCD array was converted to digital signal, and the digital signal was displayed on screen using PC interface. The developed system showed good linearity and sensitivity in relatively low measuring range between 10ppm and 10,000ppm, and showed some feasibility of hish resolution ozone monitor using CCD array as a photodetecor.

Design of 24-GHz 1Tx 2Rx FMCW Transceiver (24 GHz 1Tx 2Rx FMCW 송수신기 설계)

  • Kim, Tae-Hyun;Kwon, Oh-Yun;Kim, Jun-Seong;Park, Jae-Hyun;Kim, Byung-Sung
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.29 no.10
    • /
    • pp.758-765
    • /
    • 2018
  • This paper presents a 24-GHz frequency-modulated continuous wave(FMCW) radar transceiver with two Rx and one Tx channels in 65-nm complementary metal-oxide-semiconductor(CMOS) process and implemented it on a radar system using the developed transceiver chip. The transceiver chip includes a $14{\times}$ frequency multiplier, low-noise amplifier, down-conversion mixer, and power amplifier(PA). The transmitter achieves >10 dBm output power from 23.8 to 24.36 GHz and the phase noise is -97.3 GHz/Hz at a 1-MHz offset. The receiver achieves 25.2 dB conversion gain and output $P_{1dB}$ of -31.7 dBm. The transceiver consumes 295 mW of power and occupies an area of $1.63{\times}1.6mm^2$. The radar system is fabricated on a low-loss Duroid printed circuit board(PCB) stacked on the low-cost FR4 PCBs. The chip and antenna are placed on the Duroid PCB with interconnects and bias, gain blocks and FMCW signal-generating circuitry are mounted on the FR4 PCB. The transmit antenna is a $4{\times}4$ patch array with 14.76 dBi gain and receiving antennas are two $4{\times}2$ patch antennas with a gain of 11.77 dBi. The operation of the radar is evaluated and confirmed by detecting the range and azimuthal angle of the corner reflectors.

A DLL-Based Multi-Clock Generator Having Fast-Relocking and Duty-Cycle Correction Scheme for Low Power and High Speed VLSIs (저전력 고속 VLSI를 위한 Fast-Relocking과 Duty-Cycle Correction 구조를 가지는 DLL 기반의 다중 클락 발생기)

  • Hwang Tae-Jin;Yeon Gyu-Sung;Jun Chi-Hoon;Wee Jae-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.2 s.332
    • /
    • pp.23-30
    • /
    • 2005
  • This paper describes a DLL(delay locked loop)-based multi-clock generator having the lower active stand-by power as well as a fast relocking after re-activating the DLL. for low power and high speed VLSI chip. It enables a frequency multiplication using frequency multiplier scheme and produces output clocks with 50:50 duty-ratio regardless of the duty-ratio of system clock. Also, digital control scheme using DAC enables a fast relocking operation after exiting a standby-mode of the clock system which was obtained by storing analog locking information as digital codes in a register block. Also, for a clock multiplication, it has a feed-forward duty correction scheme using multiphase and phase mixing corrects a duty-error of system clock without requiring additional time. In this paper, the proposed DLL-based multi-clock generator can provides a synchronous clock to an external clock for I/O data communications and multiple clocks of slow and high speed operations for various IPs. The proposed DLL-based multi-clock generator was designed by the area of $1796{\mu}m\times654{\mu}m$ using $0.35-{\mu}m$ CMOS process and has $75MHz\~550MHz$ lock-range and maximum multiplication frequency of 800 MHz below 20psec static skew at 2.3v supply voltage.

Analysis of Tensor Processing Unit and Simulation Using Python (텐서 처리부의 분석 및 파이썬을 이용한 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.165-171
    • /
    • 2019
  • The study of the computer architecture has shown that major improvements in price-to-energy performance stems from domain-specific hardware development. This paper analyzes the tensor processing unit (TPU) ASIC which can accelerate the reasoning of the artificial neural network (NN). The core device of the TPU is a MAC matrix multiplier capable of high-speed operation and software-managed on-chip memory. The execution model of the TPU can meet the reaction time requirements of the artificial neural network better than the existing CPU and the GPU execution models, with the small area and the low power consumption even though it has many MAC and large memory. Utilizing the TPU for the tensor flow benchmark framework, it can achieve higher performance and better power efficiency than the CPU or CPU. In this paper, we analyze TPU, simulate the Python modeled OpenTPU, and synthesize the matrix multiplication unit, which is the key hardware.

Floating Point Unit Design for the IEEE754-2008 (IEEE754-2008을 위한 고속 부동소수점 연산기 설계)

  • Hwang, Jin-Ha;Kim, Hyun-Pil;Park, Sang-Su;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.10
    • /
    • pp.82-90
    • /
    • 2011
  • Because of the development of Smart phone devices, the demands of high performance FPU(Floating-point Unit) becomes increasing. Therefore, we propose the high-speed single-/double-precision FPU design that includes an elementary add/sub unit and improved multiplier and compare and convert units. The most commonly used add/sub unit is optimized by the parallel rounding unit. The matrix operation is used in complex calculation something like a graphic calculation. We designed the Multiply-Add Fused(MAF) instead of multiplier to calculate the matrix more quickly. The branch instruction that is decided by the compare operation is very frequently used in various programs. We bypassed the result of the compare operation before all the pipeline processes ended to decrease the total execution time. And we included additional convert operations that are added in IEEE754-2008 standard. To verify our RTL designs, we chose four hundred thousand test vectors by weighted random method and simulated each unit. The FPU that was synthesized by Samsung's 45-nm low-power process satisfied the 600-MHz operation frequency. And we confirm a reduction in area by comparing the improved FPU with the existing FPU.