• Title/Summary/Keyword: neural network accelerator

Search Result 37, Processing Time 0.026 seconds

Microcode based Controller for Compact CNN Accelerators Aimed at Mobile Devices (모바일 디바이스를 위한 소형 CNN 가속기의 마이크로코드 기반 컨트롤러)

  • Na, Yong-Seok;Son, Hyun-Wook;Kim, Hyung-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.355-366
    • /
    • 2022
  • This paper proposes a microcode-based neural network accelerator controller for artificial intelligence accelerators that can be reconstructed using a programmable architecture and provide the advantages of low-power and ultra-small chip size. In order for the target accelerator to support various neural network models, the neural network model can be converted into microcode through microcode compiler and mounted on accelerator to control the operators of the accelerator such as datapath and memory access. While the proposed controller and accelerator can run various CNN models, in this paper, we tested them using the YOLOv2-Tiny CNN model. Using a system clock of 200 MHz, the Controller and accelerator achieved an inference time of 137.9 ms/image for VOC 2012 dataset to detect object, 99.5ms/image for mask detection dataset to detect wearing mask. When implementing an accelerator equipped with the proposed controller as a silicon chip, the gate count is 618,388, which corresponds to 65.5% reduction in chip area compared with an accelerator employing a CPU-based controller (RISC-V).

Implementation of Neural Network Accelerator for Rendering Noise Reduction (렌더링 노이즈 제거를 위한 뉴럴 네트워크 가속기 구현)

  • Nam, Kihun
    • Journal of IKEEE
    • /
    • v.21 no.4
    • /
    • pp.420-423
    • /
    • 2017
  • In this paper, we propose an implementation of a neural network accelerator for reducing the rendering noise. Among the rendering algorithms, we selects a ray tracing to assure a high-definition graphics. Ray tracing rendering uses ray to render. Less use of the ray will result in noise, and if used too much, it will produce a higher quality image, but will take longer. In order to quickly process such lace rendering, an algorithm is used that uses less rays and removes the noise generated. Among such algorithms, there is an algorithm using a neural network, and a neural network accelerator which obtains a filter parameter used in an operation is implemented in order to speed up the operation speed. The time it takes to calculate the parameters used for a pixel is 11.44us.

MODELLING THE DYNAMICS OF THE LEAD BISMUTH EUTECTIC EXPERIMENTAL ACCELERATOR DRIVEN SYSTEM BY AN INFINITE IMPULSE RESPONSE LOCALLY RECURRENT NEURAL NETWORK

  • Zio, Enrico;Pedroni, Nicola;Broggi, Matteo;Golea, Lucia Roxana
    • Nuclear Engineering and Technology
    • /
    • v.41 no.10
    • /
    • pp.1293-1306
    • /
    • 2009
  • In this paper, an infinite impulse response locally recurrent neural network (IIR-LRNN) is employed for modelling the dynamics of the Lead Bismuth Eutectic eXperimental Accelerator Driven System (LBE-XADS). The network is trained by recursive back-propagation (RBP) and its ability in estimating transients is tested under various conditions. The results demonstrate the robustness of the locally recurrent scheme in the reconstruction of complex nonlinear dynamic relationships.

A SoC Based on a Neural Network for Embedded Smart Applications (임베디드 스마트 응용을 위한 신경망기반 SoC)

  • Lee, Bong-Kyu
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.10
    • /
    • pp.2059-2063
    • /
    • 2009
  • This paper presents a programmable System-On-a-chip (SoC) for various embedded smart applications that need Neural Network computations. The system is fully implemented into a prototyping platform based on Field Programmable Gate Array (FPGA). The SoC consists of an embedded processor core and a reconfigurable hardware accelerator for neural computations. The performance of the SoC is evaluated using a real image processing application, an optical character recognition (OCR) system.

Optimizing 2-stage Tiling-based Matrix Multiplication in FPGA-based Neural Network Accelerator (FPGA기반 뉴럴네트워크 가속기에서 2차 타일링 기반 행렬 곱셈 최적화)

  • Jinse, Kwon;Jemin, Lee;Yongin, Kwon;Jeman, Park;Misun, Yu;Taeho, Kim;Hyungshin, Kim
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.6
    • /
    • pp.367-374
    • /
    • 2022
  • The acceleration of neural networks has become an important topic in the field of computer vision. An accelerator is absolutely necessary for accelerating the lightweight model. Most accelerator-supported operators focused on direct convolution operations. If the accelerator does not provide GEMM operation, it is mostly replaced by CPU operation. In this paper, we proposed an optimization technique for 2-stage tiling-based GEMM routines on VTA. We improved performance of the matrix multiplication routine by maximizing the reusability of the input matrix and optimizing the operation pipelining. In addition, we applied the proposed technique to the DarkNet framework to check the performance improvement of the matrix multiplication routine. The proposed GEMM method showed a performance improvement of more than 2.4 times compared to the non-optimized GEMM method. The inference performance of our DarkNet framework has also improved by at least 2.3 times.

A programmable Soc for Var ious Image Applications Based on Mobile Devices

  • Lee, Bongkyu
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.3
    • /
    • pp.324-332
    • /
    • 2014
  • This paper presents a programmable System-On-a-chip for various embedded applications that need Neural Network computations. The system is fully implemented into Field-Programmable Gate Array (FPGA) based prototyping platform. The SoC consists of an embedded processor core and a reconfigurable hardware accelerator for neural computations. The performance of the SoC is evaluated using real image processing applications, such as optical character recognition (OCR) system.

Convolutional Neural Network Based on Accelerator-Aware Pruning for Object Detection in Single-Shot Multibox Detector (싱글숏 멀티박스 검출기에서 객체 검출을 위한 가속 회로 인지형 가지치기 기반 합성곱 신경망 기법)

  • Kang, Hyeong-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.1
    • /
    • pp.141-144
    • /
    • 2020
  • Convolutional neural networks (CNNs) show high performance in computer vision tasks including object detection, but a lot of weight storage and computation is required. In this paper, a pruning scheme is applied to CNNs for object detection, which can remove much amount of weights with a negligible performance degradation. Contrary to the previous ones, the pruning scheme applied in this paper considers the base accelerator architecture. With the consideration, the pruned CNNs can be efficiently performed on an ASIC or FPGA accelerator. Even with the constrained pruning, the resulting CNN shows a negligible degradation of detection performance, less-than-1% point degradation of mAP on VOD0712 test set. With the proposed scheme, CNNs can be applied to objection dtection efficiently.

An Implementation of a Convolutional Accelerator based on a GPGPU for a Deep Learning (Deep Learning을 위한 GPGPU 기반 Convolution 가속기 구현)

  • Jeon, Hee-Kyeong;Lee, Kwang-yeob;Kim, Chi-yong
    • Journal of IKEEE
    • /
    • v.20 no.3
    • /
    • pp.303-306
    • /
    • 2016
  • In this paper, we propose a method to accelerate convolutional neural network by utilizing a GPGPU. Convolutional neural network is a sort of the neural network learning features of images. Convolutional neural network is suitable for the image processing required to learn a lot of data such as images. The convolutional layer of the conventional CNN required a large number of multiplications and it is difficult to operate in the real-time on the embedded environment. In this paper, we reduce the number of multiplications through Winograd convolution operation and perform parallel processing of the convolution by utilizing SIMT-based GPGPU. The experiment was conducted using ModelSim and TestDrive, and the experimental results showed that the processing time was improved by about 17%, compared to the conventional convolution.

Development of the Power System Fault Diagnostic Algorithm for the Proton Accelerator Research Center of PEFP (양성자가속기 연구센터 전력계통 고장진단 알고리즘 개발)

  • Mun, Kyeong-Jun;Jeon, Gye-Po;Lee, Seok-Ki;Kim, Jun-Yeon;Jung, W.;Yoo, Suk-Tae
    • Proceedings of the KIEE Conference
    • /
    • 2007.07a
    • /
    • pp.685-686
    • /
    • 2007
  • This paper presents an application of power system fault diagnostic algorithm for the PEFP Proton Accelerator Research Center using neural network. Proposed fault diagnostic system is constructed by the radial basis function (RBF) neural network because it has the capabilities of the pattern classification and function approximation of any nonlinear function. Proposed system identifies faulted section in the power system based on information about the operation of protection devices such as relays and circuit breakers. In this paper, parameters of the RBF neural networks are tuned by the GA-TS algorithm, which has the global optimal solution searching capabilities. To show the validity of the proposed method, proposed algorithm has been tested with a practical power system in Proton Accelerator Research Center of PEFP.

  • PDF

Power-Efficient DCNN Accelerator Mapping Convolutional Operation with 1-D PE Array (1-D PE 어레이로 컨볼루션 연산을 수행하는 저전력 DCNN 가속기)

  • Lee, Jeonghyeok;Han, Sangwook;Choi, Seungwon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.18 no.2
    • /
    • pp.17-26
    • /
    • 2022
  • In this paper, we propose a novel method of performing convolutional operations on a 2-D Processing Element(PE) array. The conventional method [1] of mapping the convolutional operation using the 2-D PE array lacks flexibility and provides low utilization of PEs. However, by mapping a convolutional operation from a 2-D PE array to a 1-D PE array, the proposed method can increase the number and utilization of active PEs. Consequently, the throughput of the proposed Deep Convolutional Neural Network(DCNN) accelerator can be increased significantly. Furthermore, the power consumption for the transmission of weights between PEs can be saved. Based on the simulation results, the performance of the proposed method provides approximately 4.55%, 13.7%, and 2.27% throughput gains for each of the convolutional layers of AlexNet, VGG16, and ResNet50 using the DCNN accelerator with a (weights size) x (output data size) 2-D PE array compared to the conventional method. Additionally the proposed method provides approximately 63.21%, 52.46%, and 39.23% power savings.