• Title/Summary/Keyword: FLOPS

Search Result 128, Processing Time 0.017 seconds

Design of MRI Spectrometer Using 1 Giga-FLOPS DSP (1-GFLOPS DSP를 이용한 자기공명영상 스펙트로미터 설계)

  • 김휴정;고광혁;이상철;정민영;장경섭;이동훈;이흥규;안창범
    • Investigative Magnetic Resonance Imaging
    • /
    • v.7 no.1
    • /
    • pp.12-21
    • /
    • 2003
  • Purpose : In order to overcome limitations in the existing conventional spectrometer, a new spectrometer with advanced functionalities is designed and implemented. Materials and Methods : We designed a spectrometer using the TMS320C6701 DSP capable of 1 giga floating point operations per second (GFLOPS). The spectrometer can generate continuously varying complicate gradient waveforms by real-time calculation, and select image plane interactively. The designed spectrometer is composed of two parts: one is DSP-based digital control part, and the other is analog part generating gradient and RF waveforms, and performing demodulation of the received RF signal. Each recover board can measure 4 channel FID signals simultaneously for parallel imaging, and provides fast reconstruction using the high speed DSP. Results : The developed spectrometer was installed on a 1.5 Tesla whole body MRI system, and performance was tested by various methods. The accurate phase control required in digital modulation and demodulation was tested, and multi-channel acquisition was examined with phase-array coil imaging. Superior image quality is obtained by the developed spectrometer compared to existing commercial spectrometer especially in the fast spin echo images. Conclusion : Interactive control of the selection planes and real-time generation of gradient waveforms are important functions required for advanced imaging such as spiral scan cardiac imaging. Multi-channel acquisition is also highly demanding for parallel imaging. In this paper a spectrometer having such functionalities is designed and developed using the TMS320C6701 DSP having 1 GFLOPS computational power. Accurate phase control was achieved by the digital modulation and demodulation techniques. Superior image qualities are obtained by the developed spectrometer for various imaging techniques including FSE, GE, and angiography compared to those obtained by the existing commercial spectrometer.

  • PDF

An Implementation of Low Power MAC using Improvement of Multiply/Subtract Operation Method and PTL Circuit Design Methodology (승/감산 연산방법의 개선 및 PTL회로설계 기법을 이용한 저전력 MAC의 구현)

  • Sim, Gi-Hak;O, Ik-Gyun;Hong, Sang-Min;Yu, Beom-Seon;Lee, Gi-Yeong;Jo, Tae-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.4
    • /
    • pp.60-70
    • /
    • 2000
  • An 8$\times$8+20-bit MAC is designed with low power design methodologies at each of the system design levels. At algorithm level, a new method for multipl $y_tract operation is proposed, and it saves the transistor counts over conventional methods in hardware realization. A new Booth selector circuit using NMOS pass-transistor logic is also proposed at circuit level. It is superior to other circuits designed by CMOS in power-delay-product. And at architecture level, we adopted an ELM adder that is known to be the most efficient in power consumption, operating frequency, area and design regularity as the final adder. For registers, dynamic CMOS single-edge triggered flip-flops are used because they need less transistors per bit. To increase the operating frequency 2-stage pipeline architecture is adopted, and fast 4:2 compressors are applied in Wallace tree block. As a simulation result, the designed MAC in 0.6${\mu}{\textrm}{m}$ 1-poly 3-metal CMOS process is operated at 200MHz, 3.3V and consumed 35㎽ of power in multiply operation, and operated at 100MHz consuming 29㎽ in MAC operations, respectively.ly.

  • PDF

Design of a Small-Area Finite-Field Multiplier with only Latches (래치구조의 저면적 유한체 승산기 설계)

  • Lee, Kwang-Youb
    • Journal of IKEEE
    • /
    • v.7 no.1 s.12
    • /
    • pp.9-15
    • /
    • 2003
  • An optimized finite-field multiplier is proposed for encryption and error correction devices. It is based on a modified Linear Feedback Shift Register (LFSR) which has lower power consumption and smaller area than prior LFSR-based finite-field multipliers. The proposed finite field multiplier for GF(2n) multiplies two n-bit polynomials using polynomial basis to produce $z(x)=a(x)^*b(x)$ mod p(x), where p(x) is a irreducible polynomial for the Galois Field. The LFSR based on a serial multiplication structure has less complex circuits than array structures and hybrid structures. It is efficient to use the LFSR structure for systems with limited area and power consumption. The prior finite-field multipliers need 3${\cdot}$m flip-flops for multiplication of m-bit polynomials. Consequently, they need 6${\cdot}$m latches because one flip-flop consists of two latches. The proposed finite-field multiplier requires only 4${\cdot}$m latches for m-bit multiplication, which results in 1/3 smaller area than the prior finite-field multipliers. As a result, it can be used effectively in encryption and error correction devices with low-power consumption and small area.

  • PDF

An Embedded FAST Hardware Accelerator for Image Feature Detection (영상 특징 추출을 위한 내장형 FAST 하드웨어 가속기)

  • Kim, Taek-Kyu
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.28-34
    • /
    • 2012
  • Various feature extraction algorithms are widely applied to real-time image processing applications for extracting significant features from images. Feature extraction algorithms are mostly combined with image processing algorithms mostly for image tracking and recognition. Feature extraction function is used to supply feature information to the other image processing algorithms and it is mainly implemented in a preprocessing stage. Nowadays, image processing applications are faced with embedded system implementation for a real-time processing. In order to satisfy this requirement, it is necessary to reduce execution time so as to improve the performance. Reducing the time for executing a feature extraction function dose not only extend the execution time for the other image processing algorithms, but it also helps satisfy a real-time requirement. This paper explains FAST (Feature from Accelerated Segment Test algorithm) of E. Rosten and presents FPGA-based embedded hardware accelerator architecture. The proposed acceleration scheme can be implemented by using approximately 2,217 Flip Flops, 5,034 LUTs, 2,833 Slices, and 18 Block RAMs in the Xilinx Vertex IV FPGA. In the Modelsim - based simulation result, the proposed hardware accelerator takes 3.06 ms to extract 954 features from a image with $640{\times}480$ pixels and this result shows the cost effectiveness of the propose scheme.

The Design of 10-bit 200MS/s CMOS Parallel Pipeline A/D Converter (10-비트 200MS/s CMOS 병렬 파이프라인 아날로그/디지털 변환기의 설계)

  • Chung, Kang-Min
    • The KIPS Transactions:PartA
    • /
    • v.11A no.2
    • /
    • pp.195-202
    • /
    • 2004
  • This paper introduces the design or parallel Pipeline high-speed analog-to-digital converter(ADC) for the high-resolution video applications which require very precise sampling. The overall architecture of the ADC consists of 4-channel parallel time-interleaved 10-bit pipeline ADC structure a]lowing 200MSample/s sampling speed which corresponds to 4-times improvement in sampling speed per channel. Key building blocks are composed of the front-end sample-and-hold amplifier(SHA), the dynamic comparator and the 2-stage full differential operational amplifier. The 1-bit DAC, comparator and gain-2 amplifier are used internally in each stage and they were integrated into single switched capacitor architecture allowing high speed operation as well as low power consumption. In this work, the gain of operational amplifier was enhanced significantly using negative resistance element. In the ADC, a delay line Is designed for each stage using D-flip flops to align the bit signals and minimize the timing error in the conversion. The converter has the power dissipation of 280㎽ at 3.3V power supply. Measured performance includes DNL and INL of +0.7/-0.6LSB, +0.9/-0.3LSB.

Bit-serial Discrete Wavelet Transform Filter Design (비트 시리얼 이산 웨이블렛 변환 필터 설계)

  • Park Tae geun;Kim Ju young;Noh Jun rye
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.4A
    • /
    • pp.336-344
    • /
    • 2005
  • Discrete Wavelet Transform(DWT) is the oncoming generation of compression technique that has been selected for MPEG4 and JEPG2000, because it has no blocking effects and efficiently determines frequency property of temporary time. In this paper, we propose an efficient bit-serial architecture for the low-power and low-complexity DWT filter, employing two-channel QMF(Qudracture Mirror Filter) PR(Perfect Reconstruction) lattice filter. The filter consists of four lattices(filter length=8) and we determine the quantization bit for the coefficients by the fixed-length PSNR(peak-signal-to-noise ratio) analysis and propose the architecture of the bit-serial multiplier with the fixed coefficient. The CSD encoding for the coefficients is adopted to minimize the number of non-zero bits, thus reduces the hardware complexity. The proposed folded 1D DWT architecture processes the other resolution levels during idle periods by decimations and its efficient scheduling is proposed. The proposed architecture requires only flip-flops and full-adders. The proposed architecture has been designed and verified by VerilogHDL and synthesized by Synopsys Design Compiler with a Hynix 0.35$\mu$m STD cell library. The maximum operating frequency is 200MHz and the throughput is 175Mbps with 16 clock latencies.

Implementation of High-radix Modular Exponentiator for RSA using CRT (CRT를 이용한 하이래딕스 RSA 모듈로 멱승 처리기의 구현)

  • 이석용;김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.81-93
    • /
    • 2000
  • In a methodological approach to improve the processing performance of modulo exponentiation which is the primary arithmetic in RSA crypto algorithm, we present a new RSA hardware architecture based on high-radix modulo multiplication and CRT(Chinese Remainder Theorem). By implementing the modulo multiplier using radix-16 arithmetic, we reduced the number of PE(Processing Element)s by quarter comparing to the binary arithmetic scheme. This leads to having the number of clock cycles and the delay of pipelining flip-flops be reduced by quarter respectively. Because the receiver knows p and q, factors of N, it is possible to apply the CRT to the decryption process. To use CRT, we made two s/2-bit multipliers operating in parallel at decryption, which accomplished 4 times faster performance than when not using the CRT. In encryption phase, the two s/2-bit multipliers can be connected to make a s-bit linear multiplier for the s-bit arithmetic operation. We limited the encryption exponent size up to 17-bit to maintain high speed, We implemented a linear array modulo multiplier by projecting horizontally the DG of Montgomery algorithm. The H/W proposed here performs encryption with 15Mbps bit-rate and decryption with 1.22Mbps, when estimated with reference to Samsung 0.5um CMOS Standard Cell Library, which is the fastest among the publications at present.

A Security SoC embedded with ECDSA Hardware Accelerator (ECDSA 하드웨어 가속기가 내장된 보안 SoC)

  • Jeong, Young-Su;Kim, Min-Ju;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.7
    • /
    • pp.1071-1077
    • /
    • 2022
  • A security SoC that can be used to implement elliptic curve cryptography (ECC) based public-key infrastructures was designed. The security SoC has an architecture in which a hardware accelerator for the elliptic curve digital signature algorithm (ECDSA) is interfaced with the Cortex-A53 CPU using the AXI4-Lite bus. The ECDSA hardware accelerator, which consists of a high-performance ECC processor, a SHA3 hash core, a true random number generator (TRNG), a modular multiplier, BRAM, and control FSM, was designed to perform the high-performance computation of ECDSA signature generation and signature verification with minimal CPU control. The security SoC was implemented in the Zynq UltraScale+ MPSoC device to perform hardware-software co-verification, and it was evaluated that the ECDSA signature generation or signature verification can be achieved about 1,000 times per second at a clock frequency of 150 MHz. The ECDSA hardware accelerator was implemented using hardware resources of 74,630 LUTs, 23,356 flip-flops, 32kb BRAM, and 36 DSP blocks.