• Title/Summary/Keyword: FPGA acceleration

Search Result 28, Processing Time 0.023 seconds

Implementation of a Sensor Fusion FPGA for an IoT System (사물인터넷 시스템을 위한 센서 융합 FPGA 구현)

  • Jung, Chang-Min;Lee, Kwang-Yeob;Park, Tae-Ryong
    • Journal of IKEEE
    • /
    • v.19 no.2
    • /
    • pp.142-147
    • /
    • 2015
  • In this paper, a Kalman filter-based sensor fusion filter that measures posture by calibrating and combining information obtained from acceleration and gyro sensors was proposed. Recent advancements in sensor network technology have required sensor fusion technology. In the proposed approach, the nonlinear system model of the filter is converted to a linear system model through a Jacobian matrix operation, and the measurement value predicted via Euler integration. The proposed filter was implemented at an operating frequency of 74 MHz using a Virtex-6 FPGA Board from Xilinx Inc. Further, the accuracy and reliability of the measured posture were validated by comparing the values obtained using the implemented filters with those from existing filters.

Design and Implementation of Multi-mode Sensor Signal Processor on FPGA Device (다중모드 센서 신호 처리 프로세서의 FPGA 기반 설계 및 구현)

  • Soongyu Kang;Yunho Jung
    • Journal of Sensor Science and Technology
    • /
    • v.32 no.4
    • /
    • pp.246-251
    • /
    • 2023
  • Internet of Things (IoT) systems process signals from various sensors using signal processing algorithms suitable for the signal characteristics. To analyze complex signals, these systems usually use signal processing algorithms in the frequency domain, such as fast Fourier transform (FFT), filtering, and short-time Fourier transform (STFT). In this study, we propose a multi-mode sensor signal processor (SSP) accelerator with an FFT-based hardware design. The FFT processor in the proposed SSP is designed with a radix-2 single-path delay feedback (R2SDF) pipeline architecture for high-speed operation. Moreover, based on this FFT processor, the proposed SSP can perform filtering and STFT operation. The proposed SSP is implemented on a field-programmable gate array (FPGA). By sharing the FFT processor for each algorithm, the required hardware resources are significantly reduced. The proposed SSP is implemented and verified on Xilinxh's Zynq Ultrascale+ MPSoC ZCU104 with 53,591 look-up tables (LUTs), 71,451 flip-flops (FFs), and 44 digital signal processors (DSPs). The FFT, filtering, and STFT algorithm implementations on the proposed SSP achieve 185x average acceleration.

Design and Implementation of BNN-based Gait Pattern Analysis System Using IMU Sensor (관성 측정 센서를 활용한 이진 신경망 기반 걸음걸이 패턴 분석 시스템 설계 및 구현)

  • Na, Jinho;Ji, Gisan;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.5
    • /
    • pp.365-372
    • /
    • 2022
  • Compared to sensors mainly used in human activity recognition (HAR) systems, inertial measurement unit (IMU) sensors are small and light, so can achieve lightweight system at low cost. Therefore, in this paper, we propose a binary neural network (BNN) based gait pattern analysis system using IMU sensor, and present the design and implementation results of an FPGA-based accelerator for computational acceleration. Six signals for gait are measured through IMU sensor, and a spectrogram is extracted using a short-time Fourier transform. In order to have a lightweight system with high accuracy, a BNN-based structure was used for gait pattern classification. It is designed as a hardware accelerator structure using FPGA for computation acceleration of binary neural network. The proposed gait pattern analysis system was implemented using 24,158 logics, 14,669 registers, and 13.687 KB of block memory, and it was confirmed that the operation was completed within 1.5 ms at the maximum operating frequency of 62.35 MHz and real-time operation was possible.

Design of a Low-Vibration Micro-Stepping Controller for Pan-Tilt Camera (팬.틸트 카메라의 저 진동 마이크로스텝핑 제어기 설계)

  • Yoo, Jong-won;Kim, Jung-han
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.27 no.9
    • /
    • pp.43-51
    • /
    • 2010
  • Speed, accuracy and smoothness are the important properties of pan-tilt camera. In the case of a high ratio zoom lens system, low vibration characteristic is a crucial point in driving pan-tilt mechanism. In this paper, a novel micro-stepping controller with a function of reducing vibration was designed using field programmable gate arrays (FPGA) technology for high zoom ratio pan-tilt camera. The proposed variable reference current (VRC) control scheme reduces vibration decently and optimizing coil current in order to prevent the step motor from occurring missing steps. By employing VRC control scheme, the vibration in low speed could be significantly minimized. The proposed controller can also make very high speed of 378kpps micro-step driving, and increase maximum acceleration in motion profiles.

A Design of Floating-Point Geometry Processor for Embedded 3D Graphics Acceleration (내장형 3D 그래픽 가속을 위한 부동소수점 Geometry 프로세서 설계)

  • Nam Ki hun;Ha Jin Seok;Kwak Jae Chang;Lee Kwang Youb
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.2 s.344
    • /
    • pp.24-33
    • /
    • 2006
  • The effective geometry processing IP architecture for mobile SoC that has real time 3D graphics acceleration performance in mobile information system is proposed. Base on the proposed IP architecture, we design the floating point arithmetic unit needed in geometry process and the floating point geometry processor supporting the 3D graphic international standard OpenGL-ES. The geometry processor is implemented by 160k gate area in a Xilinx-Vertex FPGA and we measure the performance of geometry processor using the actual 3D graphic data at 80MHz frequency environment The experiment result shows 1.5M polygons/sec processing performance. The power consumption is measured to 83.6mW at Hynix 0.25um CMOS@50MHz.

Optimizing 2-stage Tiling-based Matrix Multiplication in FPGA-based Neural Network Accelerator (FPGA기반 뉴럴네트워크 가속기에서 2차 타일링 기반 행렬 곱셈 최적화)

  • Jinse, Kwon;Jemin, Lee;Yongin, Kwon;Jeman, Park;Misun, Yu;Taeho, Kim;Hyungshin, Kim
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.6
    • /
    • pp.367-374
    • /
    • 2022
  • The acceleration of neural networks has become an important topic in the field of computer vision. An accelerator is absolutely necessary for accelerating the lightweight model. Most accelerator-supported operators focused on direct convolution operations. If the accelerator does not provide GEMM operation, it is mostly replaced by CPU operation. In this paper, we proposed an optimization technique for 2-stage tiling-based GEMM routines on VTA. We improved performance of the matrix multiplication routine by maximizing the reusability of the input matrix and optimizing the operation pipelining. In addition, we applied the proposed technique to the DarkNet framework to check the performance improvement of the matrix multiplication routine. The proposed GEMM method showed a performance improvement of more than 2.4 times compared to the non-optimized GEMM method. The inference performance of our DarkNet framework has also improved by at least 2.3 times.

Design and Implementation of CNN-based HMI System using Doppler Radar and Voice Sensor (도플러 레이다 및 음성 센서를 활용한 CNN 기반 HMI 시스템 설계 및 구현)

  • Oh, Seunghyun;Bae, Chanhee;Kim, Seryeong;Cho, Jaechan;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.777-782
    • /
    • 2020
  • In this paper, we propose CNN-based HMI system using Doppler radar and voice sensor, and present hardware design and implementation results. To overcome the limitation of single sensor monitoring, the proposed HMI system combines data from two sensors to improve performance. The proposed system exhibits improved performance by 3.5% and 12% compared to a single radar and voice sensor-based classifier in noisy environment. In addition, hardware to accelerate the complex computational unit of CNN is implemented and verified on the FPGA test system. As a result of performance evaluation, the proposed HMI acceleration platform can be processed with 95% reduction in computation time compared to a single software-based design.

Design and Implementation of CNN-Based Human Activity Recognition System using WiFi Signals (WiFi 신호를 활용한 CNN 기반 사람 행동 인식 시스템 설계 및 구현)

  • Chung, You-shin;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.4
    • /
    • pp.299-304
    • /
    • 2021
  • Existing human activity recognition systems detect activities through devices such as wearable sensors and cameras. However, these methods require additional devices and costs, especially for cameras, which cause privacy issue. Using WiFi signals that are already installed can solve this problem. In this paper, we propose a CNN-based human activity recognition system using channel state information of WiFi signals, and present results of designing and implementing accelerated hardware structures. The system defined four possible behaviors during studying in indoor environments, and classified the channel state information of WiFi using convolutional neural network (CNN), showing and average accuracy of 91.86%. In addition, for acceleration, we present the results of an accelerated hardware structure design for fully connected layer with the highest computation volume on CNN classifiers. As a result of performance evaluation on FPGA device, it showed 4.28 times faster calculation time than software-based system.

Lightweight FPGA Implementation of Symmetric Buffer-based Active Noise Canceller with On-Chip Convolution Acceleration Units (온칩 컨볼루션 가속기를 포함한 대칭적 버퍼 기반 액티브 노이즈 캔슬러의 경량화된 FPGA 구현)

  • Park, Seunghyun;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1713-1719
    • /
    • 2022
  • As the noise canceler with a small processing delay increases the sampling frequency, a better-quality output can be obtained. For a single buffer, processing delay occurs because it is impossible to write new data while the processor is processing the data. When synthesizing with anti-noise and output signal, this processing delay creates additional buffering overhead to match the phase. In this paper, we propose an accelerator structure that minimizes processing delay and increases processing speed by alternately performing read and write operations using the Symmetric Even-Odd-buffer. In addition, we compare the structural differences between the two methods of noise cancellation (Fast Fourier Transform noise cancellation and adaptive Least Mean Square algorithm). As a result, using an Symmetric Even-Odd-buffer the processing delay was reduced by 29.2% compared to a single buffer. The proposed Symmetric Even-Odd-buffer structure has the advantage that it can be applied to various canceling algorithms.

An Embedded FAST Hardware Accelerator for Image Feature Detection (영상 특징 추출을 위한 내장형 FAST 하드웨어 가속기)

  • Kim, Taek-Kyu
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.28-34
    • /
    • 2012
  • Various feature extraction algorithms are widely applied to real-time image processing applications for extracting significant features from images. Feature extraction algorithms are mostly combined with image processing algorithms mostly for image tracking and recognition. Feature extraction function is used to supply feature information to the other image processing algorithms and it is mainly implemented in a preprocessing stage. Nowadays, image processing applications are faced with embedded system implementation for a real-time processing. In order to satisfy this requirement, it is necessary to reduce execution time so as to improve the performance. Reducing the time for executing a feature extraction function dose not only extend the execution time for the other image processing algorithms, but it also helps satisfy a real-time requirement. This paper explains FAST (Feature from Accelerated Segment Test algorithm) of E. Rosten and presents FPGA-based embedded hardware accelerator architecture. The proposed acceleration scheme can be implemented by using approximately 2,217 Flip Flops, 5,034 LUTs, 2,833 Slices, and 18 Block RAMs in the Xilinx Vertex IV FPGA. In the Modelsim - based simulation result, the proposed hardware accelerator takes 3.06 ms to extract 954 features from a image with $640{\times}480$ pixels and this result shows the cost effectiveness of the propose scheme.