• Title/Summary/Keyword: Hardware Accelerator

Search Result 112, Processing Time 0.021 seconds

FPGA Implementation of Levenverg-Marquardt Algorithm (LM(Levenberg-Marquardt) 알고리즘의 FPGA 구현)

  • Lee, Myung-Jin;Jung, Yong-Jin
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.11
    • /
    • pp.73-82
    • /
    • 2014
  • The LM algorithm is used in solving the least square problem in a non linear system, and is used in various fields. However, in cases the applied field's target functionis complicated and high-dimensional, it takes a lot of time solving the inner matrix and vector operations. In such cases, the LM algorithm is unsuitable in embedded environment and requires a hardware accelerator. In this paper, we implemented the LM algorithm in hardware. In the implementation, we used pipeline stages to divide the target function operation, and reduced the period of data input of the matrix and vector operations in order to accelerate the speed. To measure the performance of the implemented hardware, we applied the refining fundamental matrix(RFM), which is a part of 3D reconstruction application. As a result, the implemented system showed similar performance compared to software, and the execution speed increased in a product of 74.3.

Performance Assessment of a Lithium-Polymer Battery for HEV Utilizing Pack-Level Battery Hardware-in-the-Loop-Simulation System

  • Han, Sekyung;Lim, Jawhwan
    • Journal of Electrical Engineering and Technology
    • /
    • v.8 no.6
    • /
    • pp.1431-1438
    • /
    • 2013
  • A pack-level battery hardware-in-the-loop simulation (B-HILS) platform is implemented. It consists of dynamic vehicle models using PSAT and multiple control interfaces including real-time 3D driving and GPS mode. In real-time 3D driving mode, user can drive a virtual vehicle using actual drive equipment such as steering wheel and accelerator to generate the cycle profile of the battery. In GPS mode, actual road traffic and terrain effects can be simulated using GPS data while the trajectory is displayed on Google map. In the latter part of the paper, several performance tests of an actual lithium-polymer battery pack are carried out utilizing the developed system. All experiments are conducted as parts of actual development process of a commercial battery pack adopting 2nd generation Prius as a target vehicle model. Through the experiments, the low temperature performance and fuel efficiency of the battery are quantitatively investigated in comparison with the original nickel-metal hydride (NiMH) pack of the Prius.

Comparison of Artificial Neural Networks for Low-Power ECG-Classification System

  • Rana, Amrita;Kim, Kyung Ki
    • Journal of Sensor Science and Technology
    • /
    • v.29 no.1
    • /
    • pp.19-26
    • /
    • 2020
  • Electrocardiogram (ECG) classification has become an essential task of modern day wearable devices, and can be used to detect cardiovascular diseases. State-of-the-art Artificial Intelligence (AI)-based ECG classifiers have been designed using various artificial neural networks (ANNs). Despite their high accuracy, ANNs require significant computational resources and power. Herein, three different ANNs have been compared: multilayer perceptron (MLP), convolutional neural network (CNN), and spiking neural network (SNN) only for the ECG classification. The ANN model has been developed in Python and Theano, trained on a central processing unit (CPU) platform, and deployed on a PYNQ-Z2 FPGA board to validate the model using a Jupyter notebook. Meanwhile, the hardware accelerator is designed with Overlay, which is a hardware library on PYNQ. For classification, the MIT-BIH dataset obtained from the Physionet library is used. The resulting ANN system can accurately classify four ECG types: normal, atrial premature contraction, left bundle branch block, and premature ventricular contraction. The performance of the ECG classifier models is evaluated based on accuracy and power. Among the three AI algorithms, the SNN requires the lowest power consumption of 0.226 W on-chip, followed by MLP (1.677 W), and CNN (2.266 W). However, the highest accuracy is achieved by the CNN (95%), followed by MLP (76%) and SNN (90%).

Design and Implementation of Multi-mode Sensor Signal Processor on FPGA Device (다중모드 센서 신호 처리 프로세서의 FPGA 기반 설계 및 구현)

  • Soongyu Kang;Yunho Jung
    • Journal of Sensor Science and Technology
    • /
    • v.32 no.4
    • /
    • pp.246-251
    • /
    • 2023
  • Internet of Things (IoT) systems process signals from various sensors using signal processing algorithms suitable for the signal characteristics. To analyze complex signals, these systems usually use signal processing algorithms in the frequency domain, such as fast Fourier transform (FFT), filtering, and short-time Fourier transform (STFT). In this study, we propose a multi-mode sensor signal processor (SSP) accelerator with an FFT-based hardware design. The FFT processor in the proposed SSP is designed with a radix-2 single-path delay feedback (R2SDF) pipeline architecture for high-speed operation. Moreover, based on this FFT processor, the proposed SSP can perform filtering and STFT operation. The proposed SSP is implemented on a field-programmable gate array (FPGA). By sharing the FFT processor for each algorithm, the required hardware resources are significantly reduced. The proposed SSP is implemented and verified on Xilinxh's Zynq Ultrascale+ MPSoC ZCU104 with 53,591 look-up tables (LUTs), 71,451 flip-flops (FFs), and 44 digital signal processors (DSPs). The FFT, filtering, and STFT algorithm implementations on the proposed SSP achieve 185x average acceleration.

An Implementation of 3D Graphic Accelerator for Phong Shading (퐁 음영법을 위한 3차원 그래픽 가속기의 구현)

  • Lee, Hyung;Park, Youn-Ok;Park, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.5
    • /
    • pp.526-534
    • /
    • 2000
  • There have been many researches on the 3D graphic accelerator for high speed by needs of CAD/CAM,3D modeling, virtual reality or medical image. In this paper, an SIMD processor architecture for 3D graphic accelerator is proposed in order to improve the processing time of the 3D graphics, and a parallel Phong shading algorithm is presented to estimate performance of the proposed architecture. The proposed SIMD processor architecture for 3D graphic accelerator consists of PCI local bus interface, 16 Processing Elements (PE's), and Park's multi-access memory system (NAMS) that has 17 memory modules. A serial algorithm for Phong shading is modified for the architecture and the main key is to divide a polygon into $4\times{4}$ squares. And, for processing a square, 4 PE's are regarded as a PE Grou logically. Since MAMS can support block access type with interval 1, it is possible that 4 PE Groups process a square at a time. In consequence, 16 pixels are processed simultaneously. The proposed SIMD processor architecture is simulated by CADENCE Verilog-XL that is a package for the hardware simulation. With the same simulated results as that of the serial algorithm, the speed enhancement by the parallel algorithm to the serial one is 5.68.

  • PDF

Hardware Design of Arccosine Function for Mobile Vector Graphics Processor (모바일 벡터 그래픽 프로세서용 역코사인 함수의 하드웨어 설계)

  • Choi, Byeong-Yoon;Lee, Jong-Hyoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.4
    • /
    • pp.727-736
    • /
    • 2009
  • In this paper, the $arccos(cos^{-1})$ arithmetic unit for mobile graphics accelerator is designed. The mobile vector graphics applications need tight area, execution time, power dissipation, and accuracy constraints compared to desktop PC applications. The designed processor adopts 2nd-order polynomial approximation scheme based on IEEE floating point data format to satisfy speed and accuracy conditions and reduces area via hardware sharing structure. The arccosine processor consists of 15,280 gates and its estimated operating frequency is about 125Mhz at operating condition of $0.35{\mu}m$ CMOS technology. Because the processor can execute arccosine function within 7 clock cycles, it has about 17 MOPS(million arccos operations per second) execution rate and can be applicable to mobile OpenVG processor. And because of its flexible architecture, it can be applicable to the various transcendental functions such as exponential, trigonometric and logarithmic functions via replacement of ROM and minor hardware modification.

ASIP Design for Real-Time Processing of H.264 (실시간 H.264/AVC 처리를 위한 ASIP설계)

  • Kim, Jin-Soo;SunWoo, Myung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.5
    • /
    • pp.12-19
    • /
    • 2007
  • This paper presents an ASIP(Application Specific Instruction Set Processor) for implementation of H.264/AVC, called VSIP(Video Specific Instruction-set Processor). The proposed VSIP has novel instructions and optimized hardware architectures for specific applications, such as intra prediction, in-loop deblocking filter, integer transform, etc. Moreover, VSIP has hardware accelerators for computation intensive parts in video signal processing, such as inter prediction and entropy coding. The VSIP has much smaller area and can dramatically reduce the number of memory access compared with commercial DSP chips, which result in low power consumption. The proposed VSIP can efficiently perform in real-time video processing and it can support various profiles and standards.

A Real-time Single-Pass Visibility Culling Method Based on a 3D Graphics Accelerator Architecture (실시간 단일 패스 가시성 선별 기법 기반의 3차원 그래픽스 가속기 구조)

  • Choo, Catherine;Choi, Moon-Hee;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.15A no.1
    • /
    • pp.1-8
    • /
    • 2008
  • An occlusion culling method, one of visibility culling methods, excludes invisible objects or triangles which are covered by other objects. As it reduces computation quantity, occlusion culling is an effective method to handle complex scenes in real-time. But an existing common occlusion culling method, such as hardware occlusion query method, sends objects' data twice to GPU and this causes processing overheads once for occlusion culling test and the other is for rendering. And another existing hardware occlusion culling method, VCBP, can test objects' visibility quickly, but it neither test bounding volume nor return test result to application stage. In this paper, we propose a single pass occlusion culling method which uses temporal and spatial coherency, with effective occlusion culling hardware architecture. In our approach, the hardware performs occlusion culling test rapidly with cache on the rasterization stage where triangles are transformed into fragments. At the same time, hardware sends each primitive's visibility information to application stage. As a result, the application stage reduces data transmission quantity by excluding covered objects using the visibility information on previous frame and hierarchical spatial tree. Our proposed method improved maximum 44%, minimum 14% compared with S&W method based on hardware occlusion query. And the performance is increased 25% and 17% respectively, compared to maximum and minimum performance of CHC method which is based on occlusion culling method.

Hardware Design of SURF-based Feature extraction and description for Object Tracking (객체 추적을 위한 SURF 기반 특이점 추출 및 서술자 생성의 하드웨어 설계)

  • Do, Yong-Sig;Jeong, Yong-Jin
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.5
    • /
    • pp.83-93
    • /
    • 2013
  • Recently, the SURF algorithm, which is conjugated for object tracking system as part of many computer vision applications, is a well-known scale- and rotation-invariant feature detection algorithm. The SURF, due to its high computational complexity, there is essential to develop a hardware accelerator in order to be used on an IP in embedded environment. However, the SURF requires a huge local memory, causing many problems that increase the chip size and decrease the value of IP in ASIC and SoC system design. In this paper, we proposed a way to design a SURF algorithm in hardware with greatly reduced local memory by partitioning the algorithms into several Sub-IPs using external memory and a DMA. To justify validity of the proposed method, we developed an example of simplified object tracking algorithm. The execution speed of the hardware IP was about 31 frame/sec, the logic size was about 74Kgate in the 30nm technology with 81Kbytes local memory in the embedded system platform consisting of ARM Cortex-M0 processor, AMBA bus(AHB-lite and APB), DMA and a SDRAM controller. Hence, it can be used to the hardware IP of SoC Chip. If the image processing algorithm akin to SURF is applied to the method proposed in this paper, it is expected that it can implement an efficient hardware design for target application.

MLP accelerator implementation by approximation of activation function (활성화 함수의 근사화를 통한 MLP 가속기 구현)

  • Lee, Sangil;Choi, Sejin;Lee, Kwangyeob
    • Journal of IKEEE
    • /
    • v.22 no.1
    • /
    • pp.197-200
    • /
    • 2018
  • In this paper, sigmoid function, which is difficult to implement at hardware level and has a slow speed, is approximated by using PLAN. We use this as an activation function of MLP structure to reduce resource consumption and speed up. In this paper, we show that the proposed method maintains 95% accuracy in $5{\times}5$ size recognition and 1.83 times faster than GPGPU. We have found that even with similar resources as MLPA accelerators, we use more neurons and converge at higher accuracy and higher speed.