• Title/Summary/Keyword: FAST hardware accelerator

Search Result 15, Processing Time 0.024 seconds

FPGA based Implementation of FAST and BRIEF algorithm for Object Recognition (객체인식을 위한 FAST와 BRIEF 알고리즘 기반 FPGA 설계)

  • Heo, Hoon;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.17 no.2
    • /
    • pp.202-207
    • /
    • 2013
  • This paper implemented the conventional FAST and BRIEF algorithm as hardware on Zynq-7000 SoC Platform. Previous feature-based hardware accelerator is mostly implemented using the SIFT or SURF algorithm, but it requires excessive internal memory and hardware cost. The proposed FAST & BRIEF accelerator reduces approximately 57% of internal memory usage and 70% of hardware cost compared to the conventional SIFT or SURF accelerator, and it processes 0.17 pixel per Clock.

A Threshold Controller for FAST Hardware Accelerator (FAST 하드웨어 가속기를 위한 임계값 제어기)

  • Kim, Taek-Kyu;Suh, Yong-Suk
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.11
    • /
    • pp.187-192
    • /
    • 2014
  • Various researches are performed to extract significant features from continuous images. The FAST algorithm has the simple structure for arithmetic operation and it is easy to extraction the features in real time. For this reason, the FPGA based hardware accelerator is implemented and widely applied for the FAST algorithm. The hardware accelerator needs the threshold to extract the features from images. The threshold is influenced not only the number of extracted features but also the total execution time. Therefore, the way of threshold control is important to stabilize the total execution time and to extract features as much as possible. In order to control the threshold, this paper proposes the PI controller. The function and performance for the proposed PI controller are verified by using test images and the PI control logic is designed based on Xilinx Vertex IV FPGA. The proposed scheme can be implemented by adding 47 Flip Flops, 146 LUTs, and 91 Slices to the FAST hardware accelerator. This proposed approach only occupies 2.1% of Flip Flop, 4.4% of LUTs, and 4.5% of Slices and can be regarded as a small portion of hardware cost.

An Embedded FAST Hardware Accelerator for Image Feature Detection (영상 특징 추출을 위한 내장형 FAST 하드웨어 가속기)

  • Kim, Taek-Kyu
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.28-34
    • /
    • 2012
  • Various feature extraction algorithms are widely applied to real-time image processing applications for extracting significant features from images. Feature extraction algorithms are mostly combined with image processing algorithms mostly for image tracking and recognition. Feature extraction function is used to supply feature information to the other image processing algorithms and it is mainly implemented in a preprocessing stage. Nowadays, image processing applications are faced with embedded system implementation for a real-time processing. In order to satisfy this requirement, it is necessary to reduce execution time so as to improve the performance. Reducing the time for executing a feature extraction function dose not only extend the execution time for the other image processing algorithms, but it also helps satisfy a real-time requirement. This paper explains FAST (Feature from Accelerated Segment Test algorithm) of E. Rosten and presents FPGA-based embedded hardware accelerator architecture. The proposed acceleration scheme can be implemented by using approximately 2,217 Flip Flops, 5,034 LUTs, 2,833 Slices, and 18 Block RAMs in the Xilinx Vertex IV FPGA. In the Modelsim - based simulation result, the proposed hardware accelerator takes 3.06 ms to extract 954 features from a image with $640{\times}480$ pixels and this result shows the cost effectiveness of the propose scheme.

Radix-2 16 Points FFT Algorithm Accelerator Implementation Using FPGA (FPGA를 사용한 radix-2 16 points FFT 알고리즘 가속기 구현)

  • Gyu Sup Lee;Seong-Min Cho;Seung-Hyun Seo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.1
    • /
    • pp.11-19
    • /
    • 2024
  • The increased utilization of the FFT in signal processing, cryptography, and various other fields has highlighted the importance of optimization. In this paper, we propose the implementation of an accelerator that processes the radix-2 16 points FFT algorithm more rapidly and efficiently than FFT implementation of existing studies, using FPGA(Field Programmable Gate Array) hardware. Leveraging the hardware advantages of FPGA, such as parallel processing and pipelining, we design and implement the FFT logic in the PL (Programmable Logic) part using the Verilog language. We implement the FFT using only the Zynq processor in the PS (Processing System) part, and compare the computation times of the implementation in the PL and PS part. Additionally, we demonstrate the efficiency of our implementation in terms of computation time and resource usage, in comparison with related works.

Implementation of SDR-based LTE-A PDSCH Decoder for Supporting Multi-Antenna Using Multi-Core DSP (멀티코어 DSP를 이용한 다중 안테나를 지원하는 SDR 기반 LTE-A PDSCH 디코더 구현)

  • Na, Yong;Ahn, Heungseop;Choi, Seungwon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.4
    • /
    • pp.85-92
    • /
    • 2019
  • This paper presents a SDR-based Long Term Evolution Advanced (LTE-A) Physical Downlink Shared Channel (PDSCH) decoder using a multicore Digital Signal Processor (DSP). For decoder implementation, multicore DSP TMS320C6670 is used, which provides various hardware accelerators such as turbo decoder, fast Fourier transformer and Bit Rate Coprocessors. The TMS320C6670 is a DSP specialized in implementing base station platforms and is not an optimized platform for implementing mobile terminal platform. Accordingly, in this paper, the hardware accelerator was changed to the terminal implementation to implement the LTE-A PDSCH decoder supporting the multi-antenna and the functions not provided by the hardware accelerator were implemented through core programming. Also pipeline using multicore was implemented to meet the transmission time interval. To confirm the feasibility of the proposed implementation, we verified the real-time decoding capability of the PDSCH decoder implemented using the LTE-A Reference Measurement Channel (RMC) waveform about transmission mode 2 and 3.

MultiRing An Efficient Hardware Accelerator for Design Rule Checking (멀티링 설계규칙검사를 위한 효과적인 하드웨어 가속기)

  • 노길수;경종민
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.6
    • /
    • pp.1040-1048
    • /
    • 1987
  • We propose a hardware architecture called Multiring which is applicable for various geometrical operations on rectilinear objects such as design rule checking in VLSI layout and many image processing operations including noise suppression and coutour extraction. It has both a fast execution speed and extremely high flexibility. The whole architecture is mainly divided into four parts` I/O between host and Multiring, ring memory, linear processor array and instruction decoder. Data transmission between host and Multiring is bit serial thereby reducing the bandwidth requirement for teh channel and the number of external pins, while each row data in the bit map stored in ring memory is processed in the corresponding processor in full parallelism. Each processor is simultaneously configured by the instruction decoder/controller to perform one of the 16 basic instructions such as Boolean (AND, OR, NOT, and Copy), geometrical(Expand and Shrink), and I/O operations each ring cycle, which gives Multiring maximal flexibility in terms of design rule change or the instruction set enhancement. Correct functional behavior of Multiring was confirmed by successfully running a software simulator having one-to-one structural correspondence to the Multiring hardware.

  • PDF

Host Interface Design for TCP/IP Hardware Accelerator (TCP/IP Hardware Accelerator를 위한 Host Interface의 설계)

  • Jung, Yeo-Jin;Lim, Hye-Sook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.2B
    • /
    • pp.1-10
    • /
    • 2005
  • TCP/IP protocols have been implemented in software program running on CPU in end systems. As the increased demand of fast protocol processing, it is required to implement the protocols in hardware, and Host Interface is responsible for communication between external CPU and the hardware blocks of TCP/IP implementation. The Host Interface follows AMBA AHB specification for the communication with external world. For control flow, the Host Interface behaves as a slave of AMBA AHB. Using internal Command/status Registers, the Host Interface receives commands from CPU and transfers hardware status and header information to CPU. On the other hand, the Host Interface behaves as a master for data flow. Data flow has two directions, Receive Flow and Transmit Flow. In Receive Flow, using internal RxFIFO, the Host Interface reads data from UDP FIFO or TCP buffer and transfers data to external RAM for CPU to read. For Transmit Flow, the Host Interface reads data from external RAM and transfers data to UDP buffer or TCP buffer through internal TxFIFO. TCP/IP hardware blocks generate packets using the data and transmit. Buffer Descriptor is one of the Command/Status Registers, and the information stored in Buffer Descriptor is used for external RAM access. Several testcases are designed to verify TCP/IP functions. The Host Interface is synthesized using the 0.18 micron technology, and it results in 173 K gates including the Command/status Registers and internal FIFOs.

Parallel Fuzzy Information Processing System - KAFA : KAist Fuzzy Accelerator -

  • Kim, Young-Dal;Lee, Hyung-Kwang;Park, Kyu-Ho
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.981-984
    • /
    • 1993
  • During the past decade, several specific hardwares for fast fuzzy inference have been developed. Most of them are dedicated to a specific inference method and thus cannot support other inference methods. In this paper, we present a hardware architecture called KAFA(KAist Fuzzy Accelerator) which provides various fuzzy inference methods and fuzzy set operators. The architecture has SIMD structure, which consists of two parts; system control/interface unit(Main Controller) and arithmetic units(FPEs). Using the parallel processing technology, the KAFA has the high performance for fuzzy information processing. The speed of the KAFA holds promise for the development of the new fuzzy application systems.

  • PDF

Novel IME Instructions and their Hardware Architecture for Fast Search Algorithm (고속 탐색 알고리즘에 적합한 움직임 추정 전용 명령어 및 구조 설계)

  • Bang, Ho-Il;SunWoo, Myung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.12
    • /
    • pp.58-65
    • /
    • 2011
  • This paper presents an ASIP (Application-specific Instruction Processor) for motion estimation that employs specific IME instructions and its programmable and reconfigurable hardware architecture for various video codecs, such as H.264/AVC, MPEG4, etc. With the proposed specific instructions and variable point 2D SAD hardware accelerator, it can handle the real-time processing requirement of High Definition (HD) video. With the SAD unit and its parallel operations using pattern information, the proposed IME instructions support not only full search algorithms but also other fast search algorithms. The hardware size is 25.5K gates for each Processing Element Group (PEG) which has 128 SAD Processor Elements (PEs). The proposed ASIP has been verified by the Synopsys Processor Designer and implemented by the Design Compiler using the IBM 90nm process technology. The hardware size is 453K gates for the IME unit and the operating frequency is 188MHz for 1080p@30 frame in real time. The proposed ASIP can reduce the hardware size about 26% and the number of operation cycles about 18%.

Design and Implementation of Multi-mode Sensor Signal Processor on FPGA Device (다중모드 센서 신호 처리 프로세서의 FPGA 기반 설계 및 구현)

  • Soongyu Kang;Yunho Jung
    • Journal of Sensor Science and Technology
    • /
    • v.32 no.4
    • /
    • pp.246-251
    • /
    • 2023
  • Internet of Things (IoT) systems process signals from various sensors using signal processing algorithms suitable for the signal characteristics. To analyze complex signals, these systems usually use signal processing algorithms in the frequency domain, such as fast Fourier transform (FFT), filtering, and short-time Fourier transform (STFT). In this study, we propose a multi-mode sensor signal processor (SSP) accelerator with an FFT-based hardware design. The FFT processor in the proposed SSP is designed with a radix-2 single-path delay feedback (R2SDF) pipeline architecture for high-speed operation. Moreover, based on this FFT processor, the proposed SSP can perform filtering and STFT operation. The proposed SSP is implemented on a field-programmable gate array (FPGA). By sharing the FFT processor for each algorithm, the required hardware resources are significantly reduced. The proposed SSP is implemented and verified on Xilinxh's Zynq Ultrascale+ MPSoC ZCU104 with 53,591 look-up tables (LUTs), 71,451 flip-flops (FFs), and 44 digital signal processors (DSPs). The FFT, filtering, and STFT algorithm implementations on the proposed SSP achieve 185x average acceleration.