• Title/Summary/Keyword: radix-4 algorithm

Search Result 89, Processing Time 0.025 seconds

Design of Radix-4 FFT Processor Using Twice Perfect Shuffle (이중 완전 Shuffle을 이용한 Radix-4 FFT 프로세서의 설계)

  • Hwang, Myoung-Ha;Hwang, Ho-Jung
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.2
    • /
    • pp.144-150
    • /
    • 1990
  • This paper describes radix-4 Fast Fourier Transform (FFT) Processor designed with the new twice perfect shuffle developed from a perfect shuffle used in radix-2 FFT algorithm. The FFT Processor consists of a butterfly arithmetic circuit, address generators for input, output and coefficient, input and output registers and controller. Also, it requires the external ROM for storage of coefficient and RAM for input and output. The butterfly circuit includes 12 bit-serial ($16{\times}8$) multipliers, adders, subtractors and delay shift registers. Operating on 25 MHz two phase clock, this processor can compute 256 point FFT in 6168 clocks, i.e. 247 us and provides flexibility by allowing the user to select any size among 4,16,64,and256points. Being fabricated with 2-um double metal CMOS process, it includes about 28000 transistors and 55 pads in $8.0{\times}8.2mm^2$area.

  • PDF

Fast Motion Estimation Algorithm Using Motion Vectors of Neighboring Blocks (인접블록의 움직임벡터를 이용한 고속 움직임추정 방식)

  • So Hyeon-Ho;Kim Jinsang;Cho Won-Kyung;Kim Young-Soo;Suh Doug Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.12C
    • /
    • pp.1256-1261
    • /
    • 2005
  • In this paper, we propose a low-power Booth multiplication which reduces the switching activities of partial products during multiplication process. Radix-4 Booth algorithm has a characteristic that produces the Booth encoded products with zero when input data have sequentially equal values (0 or 1). Therefore, partial products have higher chances of being zero when an input with a smaller effective dynamic range of two multiplication inputs is used as a multiplier data instead of a multiplicand. The proposed multiplier divides a multiplication expression into several multiplication expressions with smaller bits than those of an original input data, and each multiplication is computed independently for the Booth encoding. Finally, the results of each multiplication are added. This means that the proposed multiplier has a higher chance to have zero encoded products so that we can implement a low power multiplier with the smaller switching activity. Implementation results show the proposed multiplier can save maximally about $20\%$ power dissipation than a previous Booth multiplier.

Design Method of Variable Point Prime Factor FFT For DRM Receiver (DRM 수신기의 효율적인 수신을 위한 가변 프라임펙터 FFT 설계)

  • Kim, Hyun-Sik;Lee, Youn-Sung;Seo, Jeong-Wook;Baik, Jong-Ho
    • 한국정보통신설비학회:학술대회논문집
    • /
    • 2008.08a
    • /
    • pp.257-261
    • /
    • 2008
  • The Digital Radio Mondiale (DRM) system is a digital broadcasting standard designed for use in the LF, MF and HF bands of the broadcasting bands below 30 MHz. The system provides both superior audio quality and improved user services / operability compared with existing AM transmissions. In this paper, we propose a variable point Prime Factor FFT design method for Digital Radio Mondiale (DRM) system. Proposed method processes a various size IFFT/FFT of Robustness Mode on DRM standard efficiently by composing Radix-Prime Factor FFT Processing Unit of form similar to Radix-4 by insertion of a variable Prime Factor Twiddle Factor and Garbage data. So, we improved limitation that cannot process 112/176/256/288 FFT of each mode of DRM system with a existent Radix Processor and increase memory size and memory access time for IFFT/FFT processing by software processing in case of implementation with a existent high speed DSP.

  • PDF

A Study on the Intelligent High Voltage Switchboard for Custormer (고압 수용가용 배전반의 intelligent화 연구)

  • Byun, Young-Bok;Joe, Ki-Youn;Koo, Heun-Hoi
    • Proceedings of the KIEE Conference
    • /
    • 1994.07a
    • /
    • pp.444-446
    • /
    • 1994
  • This paper describes the design of a digital multifunction controller for the intelligent high voltage customer switchboard and proposes a relaying algorithm for high impedance faults using back-propagation neural network. The hardware design uses the three microprocessors and global memory architecture to achive real time operation and control 4 feeders. The controller uses a 64-point radix-4 DIF FFT algorithm to measure the harmonic and relay parameters. Synthesized fault current waveforms are used to train and test the back - propagation network.

  • PDF

Design of a Booth's Multiplier Suitable for Embedded Systems (임베디드 시스템에 적용이 용이한 Booth 알고리즘 방식의 곱셈기 설계)

  • Moon, San-Gook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.838-841
    • /
    • 2007
  • In this study, we implemented a $17^*17b$ binary digital multiplier using radix-4 Booth's algorithm. Two stage pipeline architecture was applied to achieve higher throughput and 4:2 adders were used for regular layout structure in the Wallace tree partition. To evaluate the circuit, several MPW chips were fabricated using Hynix 0.6-um 3M N-well CMOS technology. Also we proposed an efficient test methodology and did fault simulations. The chip contains 9115 transistors and the core area occupies about $1135^*1545$ mm2. The functional tests using ATS-2 tester showed that it can operate with 24 MHz clock at 5.0 V at room temperature.

  • PDF

Design and Architecture of Low-Latency High-Speed Turbo Decoders

  • Jung, Ji-Won;Lee, In-Ki;Choi, Duk-Gun;Jeong, Jin-Hee;Kim, Ki-Man;Choi, Eun-A;Oh, Deock-Gil
    • ETRI Journal
    • /
    • v.27 no.5
    • /
    • pp.525-532
    • /
    • 2005
  • In this paper, we propose and present implementation results of a high-speed turbo decoding algorithm. The latency caused by (de)interleaving and iterative decoding in a conventional maximum a posteriori turbo decoder can be dramatically reduced with the proposed design. The source of the latency reduction is from the combination of the radix-4, center to top, parallel decoding, and early-stop algorithms. This reduced latency enables the use of the turbo decoder as a forward error correction scheme in real-time wireless communication services. The proposed scheme results in a slight degradation in bit error rate performance for large block sizes because the effective interleaver size in a radix-4 implementation is reduced to half, relative to the conventional method. To prove the latency reduction, we implemented the proposed scheme on a field-programmable gate array and compared its decoding speed with that of a conventional decoder. The results show an improvement of at least five fold for a single iteration of turbo decoding.

  • PDF

A 18-Mbp/s, 8-State, High-Speed Turbo Decoder

  • Jung Ji-Won;Kim Min-Hyuk;Jeong Jin-Hee
    • Journal of electromagnetic engineering and science
    • /
    • v.6 no.3
    • /
    • pp.147-154
    • /
    • 2006
  • In this paper, we propose and present implementation results of a high-speed turbo decoding algorithm. The latency caused by (de) interleaving and iterative decoding in a conventional maximum a posteriori(MAP) turbo decoder can be dramatically reduced with the proposed design. The source of the latency reduction is come from the combination of the radix-4, dual-path processing, parallel decoding, and rearly-stop algorithms. This reduced latency enables the use of the turbo decoder as a forward error correction scheme in real-time wireless communication services. The proposed scheme results in a slight degradation in bit-error rate(BER) performance for large block sizes because the effective interleaver size in a radix-4 implementation is reduced to half, relative to the conventional method. Fixed on the parameters of N=212, iteration=3, 8-states, 3 iterations, and QPSK modulation scheme, we designed the adaptive high-speed turbo decoder using the Xilinx chip (VIRTEX2P (XC2VP30-5FG676)) with the speed of 17.78 Mb/s. From the results, we confirmed that the decoding speed of the proposed decoder is faster than conventional algorithms by 8 times.

Low-Power Multiplier Using Input Data Partition (입력 데이터 분할을 이용한 저전력 부스 곱셈기 설계)

  • Park Jongsu;Kim Jinsang;Cho Won-Kyung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.11A
    • /
    • pp.1092-1097
    • /
    • 2005
  • In this paper, we propose a low-power Booth multiplication which reduces the switching activities of partial products during multiplication process. Radix-4 Booth algorithm has a characteristic that produces the Booth encoded products with zero when input data have sequentially equal values (0 or 1). Therefore, partial products have higher chances of being zero when an input with a smaller effective dynamic range of two multiplication inputs is used as a multiplier data instead of a multiplicand. The proposed multiplier divides a multiplication expression into several multiplication expressions with smaller bits than those of an original input data, and each multiplication is computed independently for the Booth encoding. Finally, the results of each multiplication are added. This means that the proposed multiplier has a higher chance to have zero encoded products so that we can implement a low power multiplier with the smaller switching activity. Implementation results show the proposed multiplier can save maximally about $20\%$ power dissipation than a previous Booth multiplier.

A High-Performance ECC Processor Supporting NIST P-521 Elliptic Curve (NIST P-521 타원곡선을 지원하는 고성능 ECC 프로세서)

  • Yang, Hyeon-Jun;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.548-555
    • /
    • 2022
  • This paper describes the hardware implementation of elliptic curve cryptography (ECC) used as a core operation in elliptic curve digital signature algorithm (ECDSA). The ECC processor supports eight operation modes (four point operations, four modular operations) on the NIST P-521 curve. In order to minimize computation complexity required for point scalar multiplication (PSM), the radix-4 Booth encoding scheme and modified Jacobian coordinate system were adopted, which was based on the complexity analysis for five PSM algorithms and four different coordinate systems. Modular multiplication was implemented using a modified 3-Way Toom-Cook multiplication and a modified fast reduction algorithm. The ECC processor was implemented on xczu7ev FPGA device to verify hardware operation. Hardware resources of 101,921 LUTs, 18,357 flip-flops and 101 DSP blocks were used, and it was evaluated that about 370 PSM operations per second were achieved at a maximum operation clock frequency of 45 MHz.

High Speed Modular Multiplication Algorithm for RSA Cryptosystem (RSA 암호 시스템을 위한 고속 모듈라 곱셈 알고리즘)

  • 조군식;조준동
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.3C
    • /
    • pp.256-262
    • /
    • 2002
  • This paper presents a novel radix-4 modular multiplication algorithm based on the sign estimation technique (3). The sign estimation technique detects the sign of a number represented in the form of a carry-sum pair. It can be implemented with 5-bit carry look-ahead adder. The hardware speed of the cryptosystem is dependent on the performance modular multiplication of large numbers. Our algorithm requires only (n/2+3) clock cycle for n bit modulus in performing modular multiplication. Our algorithm out-performs existing algorithm in terms of required clock cycles by a half, It is efficient for modular exponentiation with large modulus used in RSA cryptosystem. Also, we use high-speed adder (7) instead of CPA (Carry Propagation Adder) for modular multiplication hardware performance in fecal stage of CSA (Carry Save Adder) output. We apply RL (Right-and-Left) binary method for modular exponentiation because the number of clock cycles required to complete the modular exponentiation takes n cycles. Thus, One 1024-bit RSA operation can be done after n(n/2+3) clock cycles.