• Title/Summary/Keyword: neural processing unit

Search Result 104, Processing Time 0.024 seconds

Rapid and Brief Communication GPU implementation of neural networks

  • Oh, Kyoung-Su;Jung, Kee-Chul
    • 한국HCI학회:학술대회논문집
    • /
    • 2007.02c
    • /
    • pp.322-325
    • /
    • 2007
  • Graphics processing unit (GPU) is used for a faster artificial neural network. It is used to implement the matrix multiplication of a neural network to enhance the time performance of a text detection system. Preliminary results produced a 20-fold performance enhancement using an ATI RADEON 9700 PRO board. The parallelism of a GPU is fully utilized by accumulating a lot of input feature vectors and weight vectors, then converting the many inner-product operations into one matrix operation. Further research areas include benchmarking the performance with various hardware and GPU-aware learning algorithms. (c) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

A layer-wise frequency scaling for a neural processing unit

  • Chung, Jaehoon;Kim, HyunMi;Shin, Kyoungseon;Lyuh, Chun-Gi;Cho, Yong Cheol Peter;Han, Jinho;Kwon, Youngsu;Gong, Young-Ho;Chung, Sung Woo
    • ETRI Journal
    • /
    • v.44 no.5
    • /
    • pp.849-858
    • /
    • 2022
  • Dynamic voltage frequency scaling (DVFS) has been widely adopted for runtime power management of various processing units. In the case of neural processing units (NPUs), power management of neural network applications is required to adjust the frequency and voltage every layer to consider the power behavior and performance of each layer. Unfortunately, DVFS is inappropriate for layer-wise run-time power management of NPUs due to the long latency of voltage scaling compared with each layer execution time. Because the frequency scaling is fast enough to keep up with each layer, we propose a layerwise dynamic frequency scaling (DFS) technique for an NPU. Our proposed DFS exploits the highest frequency under the power limit of an NPU for each layer. To determine the highest allowable frequency, we build a power model to predict the power consumption of an NPU based on a real measurement on the fabricated NPU. Our evaluation results show that our proposed DFS improves frame per second (FPS) by 33% and saves energy by 14% on average, compared with DVFS.

A Method for accelerating training of Convolutional Neural Network (합성곱 신경망의 학습 가속화를 위한 방법)

  • Choi, Se Jin;Jung, Jun Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.171-175
    • /
    • 2017
  • Recently, Training of the convolutional neural network (CNN) entails many iterative computations. Therefore, a method of accelerating the training speed through parallel processing using the hardware specifications of GPGPU is actively researched. In this paper, the operations of the feature extraction unit and the classification unit are divided into blocks and threads of GPGPU and processed in parallel. Convolution and Pooling operations of the feature extraction unit are processed in parallel at once without sequentially processing. As a result, proposed method improved the training time about 314%.

On the Digital Implementation of the Sigmoid function (시그모이드 함수의 디지털 구현에 관한 연구)

  • 이호선;홍봉화
    • The Journal of Information Technology
    • /
    • v.4 no.3
    • /
    • pp.155-163
    • /
    • 2001
  • In this paper, we implemented sigmoid active function which make it difficult to design of the digital neuron networks. Therefore, we designed of the high speed processing of the sigmoid function in order to digital neural networks. we designed of the MAC(Multiplier and Accumulator) operation unit used residue number system without carry propagation for the high speed operation. we designed of MAC operation unit and sigmoid processing unit are proved that it could run of the high speed. On the simulation, the faster than 4.6ns on the each order, we expected that it adapted to the implementation of the high speed digital neural network.

  • PDF

Use of High-performance Graphics Processing Units for Power System Demand Forecasting

  • He, Ting;Meng, Ke;Dong, Zhao-Yang;Oh, Yong-Taek;Xu, Yan
    • Journal of Electrical Engineering and Technology
    • /
    • v.5 no.3
    • /
    • pp.363-370
    • /
    • 2010
  • Load forecasting has always been essential to the operation and planning of power systems in deregulated electricity markets. Various methods have been proposed for load forecasting, and the neural network is one of the most widely accepted and used techniques. However, to obtain more accurate results, more information is needed as input variables, resulting in huge computational costs in the learning process. In this paper, to reduce training time in multi-layer perceptron-based short-term load forecasting, a graphics processing unit (GPU)-based computing method is introduced. The proposed approach is tested using the Korea electricity market historical demand data set. Results show that GPU-based computing greatly reduces computational costs.

The Implementation of Back Propagation Neural Network using the Residue Number System (잉여수계를 이용한 역전파 신경회로망 구현)

  • 홍봉화;이호선
    • The Journal of Information Technology
    • /
    • v.2 no.2
    • /
    • pp.145-161
    • /
    • 1999
  • This paper proposes a high speed back propagation neural networks which uses the residue number system. making the high speed operation possible without carry propagation Consisting of MAC(Multiplication and Accumulation) operator unit using Residue number system and sigmoid function operator unit using Mixed Residue Conversion is designed, The Designed circuits are descripted by VHDL and synthesized by Compass tools. Result of simulations shows that critical path delay time is about 19nsec and the size can be reduced to 40% compared to the neural networks implemented by the real number operation unit. The proposed design circuits can be implemented in parallel distributed processing system with desired real time processing.

  • PDF

Role of Carbon Monoxide in Neurovascular Repair Processing

  • Choi, Yoon Kyung
    • Biomolecules & Therapeutics
    • /
    • v.26 no.2
    • /
    • pp.93-100
    • /
    • 2018
  • Carbon monoxide (CO) is a gaseous molecule produced from heme by heme oxygenase (HO). Endogenous CO production occurring at low concentrations is thought to have several useful biological roles. In mammals, especially humans, a proper neurovascular unit comprising endothelial cells, pericytes, astrocytes, microglia, and neurons is essential for the homeostasis and survival of the central nervous system (CNS). In addition, the regeneration of neurovascular systems from neural stem cells and endothelial precursor cells after CNS diseases is responsible for functional repair. This review focused on the possible role of CO/HO in the neurovascular unit in terms of neurogenesis, angiogenesis, and synaptic plasticity, ultimately leading to behavioral changes in CNS diseases. CO/HO may also enhance cellular networks among endothelial cells, pericytes, astrocytes, and neural stem cells. This review highlights the therapeutic effects of CO/HO on CNS diseases involved in neurogenesis, synaptic plasticity, and angiogenesis. Moreover, the cellular mechanisms and interactions by which CO/HO are exploited for disease prevention and their therapeutic applications in traumatic brain injury, Alzheimer's disease, and stroke are also discussed.

Forecasting of erythrocyte sedimentation rate using gated recurrent unit (GRU) neural network (Gated recurrent unit (GRU) 신경망을 이용한 적혈구 침강속도 예측)

  • Lee, Jaejin;Hong, Hyeonji;Song, Jae Min;Yeom, Eunseop
    • Journal of the Korean Society of Visualization
    • /
    • v.19 no.1
    • /
    • pp.57-61
    • /
    • 2021
  • In order to determine erythrocyte sedimentation rate (ESR) indicating acute phase inflammation, a Westergren method has been widely used because it is cheap and easy to be implemented. However, the Westergren method requires quite a long time for 1 hour. In this study, a gated recurrent unit (GRU) neural network was used to reduce measurement time of ESR evaluation. The sedimentation sequences of the erythrocytes were acquired by the camera and data processed through image processing were used as an input data into the neural network models. The performance of a proposed models was evaluated based on mean absolute error. The results show that GRU model provides best accurate prediction than others within 30 minutes.

Design of the Digital Neuron Processor (디지털 뉴런프로세서의 설계에 관한 연구)

  • Hong, Bong-Wha;Lee, Ho-Sun;Park, Wha-Se
    • 전자공학회논문지 IE
    • /
    • v.44 no.3
    • /
    • pp.12-22
    • /
    • 2007
  • In this paper, we designed of the high speed digital neuron processor in order to digital neural networks. we designed of the MAC(Multiplier and Accumulator) operation unit used residue number system without carry propagation for the high speed operation. and we implemented sigmoid active function which make it difficult to design neuron processor. The Designed circuits are descripted by VHDL and synthesized by Compass tools. we designed of MAC operation unit and sigmoid processing unit are proved that it could run time 19.6 nsec on the simulation and decreased to hardware size about 50%, each order. Designed digital neuron processor can be implementation in parallel distributed processing system with desired real time processing, In this paper.

Cycle-accurate NPU Simulator and Performance Evaluation According to Data Access Strategies (Cycle-accurate NPU 시뮬레이터 및 데이터 접근 방식에 따른 NPU 성능평가)

  • Kwon, Guyun;Park, Sangwoo;Suh, Taeweon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.217-228
    • /
    • 2022
  • Currently, there are increasing demands for applying deep neural networks (DNNs) in the embedded domain such as classification and object detection. The DNN processing in embedded domain often requires custom hardware such as NPU for acceleration due to the constraints in power, performance, and area. Processing DNN models requires a large amount of data, and its seamless transfer to NPU is crucial for performance. In this paper, we developed a cycle-accurate NPU simulator to evaluate diverse NPU microarchitectures. In addition, we propose a novel technique for reducing the number of memory accesses when processing convolutional layers in convolutional neural networks (CNNs) on the NPU. The main idea is to reuse data with memory interleaving, which recycles the overlapping data between previous and current input windows. Data memory interleaving makes it possible to quickly read consecutive data in unaligned locations. We implemented the proposed technique to the cycle-accurate NPU simulator and measured the performance with LeNet-5, VGGNet-16, and ResNet-50. The experiment shows up to 2.08x speedup in processing one convolutional layer, compared to the baseline.