• Title/Summary/Keyword: 하드웨어 가속기

Search Result 125, Processing Time 0.031 seconds

Techniques for Performance Improvement of Convolutional Neural Networks using XOR-based Data Reconstruction Operation (XOR연산 기반의 데이터 재구성 기법을 활용한 컨볼루셔널 뉴럴 네트워크 성능 향상 기법)

  • Kim, Young-Ung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.1
    • /
    • pp.193-198
    • /
    • 2020
  • The various uses of the Convolutional Neural Network technology are accelerating the evolution of the computing area, but the opposite is causing serious hardware performance shortages. Neural network accelerators, next-generation memory device technologies, and high-bandwidth memory architectures were proposed as countermeasures, but they are difficult to actively introduce due to the problems of versatility, technological maturity, and high cost, respectively. This study proposes DRAM-based main memory technology that enables read operations to be completed without waiting until the end of the refresh operation using pre-stored XOR bit values, even when the refresh operation is performed in the main memory. The results showed that the proposed technique improved performance by 5.8%, saved energy by 1.2%, and improved EDP by 10.6%.

Accuracy Analysis of Fixed Point Arithmetic for Hardware Implementation of Binary Weight Network (이진 가중치 신경망의 하드웨어 구현을 위한 고정소수점 연산 정확도 분석)

  • Kim, Jong-Hyun;Yun, SangKyun
    • Journal of IKEEE
    • /
    • v.22 no.3
    • /
    • pp.805-809
    • /
    • 2018
  • In this paper, we analyze the change of accuracy when fixed point arithmetic is used instead of floating point arithmetic in binary weight network(BWN). We observed the change of accuracy by varying total bit size and fraction bit size. If the integer part is not changed after fixed point approximation, there is no significant decrease in accuracy compared to the floating-point operation. When overflow occurs in the integer part, the approximation to the maximum or minimum of the fixed point representation minimizes the decrease in accuracy. The results of this paper can be applied to the minimization of memory and hardware resource requirement in the implementation of FPGA-based BWN accelerator.

Design of RISC-based Transmission Wrapper Processor IP for TCP/IP Protocol Stack (TCP/IP프로토콜 스택을 위한 RISC 기반 송신 래퍼 프로세서 IP 설계)

  • 최병윤;장종욱
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.6
    • /
    • pp.1166-1174
    • /
    • 2004
  • In this paper, a design of RISC-based transmission wrapper processor for TCP/IP protocol stack is described. The processor consists of input and output buffer memory with dual bank structure, 32-bit RISC microprocessor core, DMA unit with on-the-fly checksum capability, and memory module. To handle the various modes of TCP/IP protocol, hardware-software codesign approach based on RISC processor is used rather than the conventional state machine design. To eliminate large delay time due to sequential executions of data transfer and checksum operation, DMA module which can execute the checksum operation along with data transfer operation is adopted. The designed processor exclusive of variable-size input/output buffer consists of about 23,700 gates and its maximum operating frequency is about 167MHz under 0.35${\mu}m$ CMOS technology.

Design and Implementation of CNN-Based Human Activity Recognition System using WiFi Signals (WiFi 신호를 활용한 CNN 기반 사람 행동 인식 시스템 설계 및 구현)

  • Chung, You-shin;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.4
    • /
    • pp.299-304
    • /
    • 2021
  • Existing human activity recognition systems detect activities through devices such as wearable sensors and cameras. However, these methods require additional devices and costs, especially for cameras, which cause privacy issue. Using WiFi signals that are already installed can solve this problem. In this paper, we propose a CNN-based human activity recognition system using channel state information of WiFi signals, and present results of designing and implementing accelerated hardware structures. The system defined four possible behaviors during studying in indoor environments, and classified the channel state information of WiFi using convolutional neural network (CNN), showing and average accuracy of 91.86%. In addition, for acceleration, we present the results of an accelerated hardware structure design for fully connected layer with the highest computation volume on CNN classifiers. As a result of performance evaluation on FPGA device, it showed 4.28 times faster calculation time than software-based system.

Design of a High-Speed Data Packet Allocation Circuit for Network-on-Chip (NoC 용 고속 데이터 패킷 할당 회로 설계)

  • Kim, Jeonghyun;Lee, Jaesung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.459-461
    • /
    • 2022
  • One of the big differences between Network-on-Chip (NoC) and the existing parallel processing system based on an off-chip network is that data packet routing is performed using a centralized control scheme. In such an environment, the best-effort packet routing problem becomes a real-time assignment problem in which data packet arriving time and processing time is the cost. In this paper, the Hungarian algorithm, a representative computational complexity reduction algorithm for the linear algebraic equation of the allocation problem, is implemented in the form of a hardware accelerator. As a result of logic synthesis using the TSMC 0.18um standard cell library, the area of the circuit designed through case analysis for the cost distribution is reduced by about 16% and the propagation delay of it is reduced by about 52%, compared to the circuit implementing the original operation sequence of the Hungarian algorithm.

  • PDF

Design and Implementation of BNN-based Gait Pattern Analysis System Using IMU Sensor (관성 측정 센서를 활용한 이진 신경망 기반 걸음걸이 패턴 분석 시스템 설계 및 구현)

  • Na, Jinho;Ji, Gisan;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.5
    • /
    • pp.365-372
    • /
    • 2022
  • Compared to sensors mainly used in human activity recognition (HAR) systems, inertial measurement unit (IMU) sensors are small and light, so can achieve lightweight system at low cost. Therefore, in this paper, we propose a binary neural network (BNN) based gait pattern analysis system using IMU sensor, and present the design and implementation results of an FPGA-based accelerator for computational acceleration. Six signals for gait are measured through IMU sensor, and a spectrogram is extracted using a short-time Fourier transform. In order to have a lightweight system with high accuracy, a BNN-based structure was used for gait pattern classification. It is designed as a hardware accelerator structure using FPGA for computation acceleration of binary neural network. The proposed gait pattern analysis system was implemented using 24,158 logics, 14,669 registers, and 13.687 KB of block memory, and it was confirmed that the operation was completed within 1.5 ms at the maximum operating frequency of 62.35 MHz and real-time operation was possible.

Performance Optimization Considering I/O Data Coherency in Stream Processing (Stream Processing에서 I/O데이터 일관성을 고려한 성능 최적화)

  • Na, Hana;Yi, Joonwhan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.8
    • /
    • pp.59-65
    • /
    • 2016
  • Performance optimization of applications with massive stream data processing has been performed by considering I/O data coherency problem where a memory is shared between processors and hardware accelerators. A formula for performance analyses is derived based on profiling results of system-level simulations. Our experimental results show that overall performance was improved by 1.40 times on average for various image sizes. Also, further optimization has been performed based on the parameters appeared in the derived formula. The final performance gain was 3.88 times comparing to the original design and we can find that the performance of the design with cacheable shared memory is not always.

Programmable Multimedia Platform for Video Processing of UHD TV (UHD TV 영상신호처리를 위한 프로그래머블 멀티미디어 플랫폼)

  • Kim, Jaehyun;Park, Goo-man
    • Journal of Broadcast Engineering
    • /
    • v.20 no.5
    • /
    • pp.774-777
    • /
    • 2015
  • This paper introduces the world's first programmable video-processing platform for the enhancement of the video quality of the 8K(7680x4320) UHD(Ultra High Definition) TV operating up to 60 frames per second. In order to support required computing capacity and memory bandwidth, the proposed platform implemented several key features such as symmetric multi-cluster architecture for parallel data processing, a ring-data path between the clusters for data pipelining and hardware accelerators for computing filter operations. The proposed platform based on RP(Reconfigurable Processor) processes video quality enhancement algorithms and handles effectively new UHD broadcasting standards and display panels.

3D Game Rendering Engine Degine using Empty space BSP tree (Empty space BSP트리를 이용한 3D 게임 렌더링 엔진 설계)

  • Kim Hak-Ran;Park Hwa-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.3 s.35
    • /
    • pp.345-352
    • /
    • 2005
  • This paper aims to design Game Rendering Engine for real-time 3D online games. Previous, in order to raise rendering speed, BSP tree was used to partitioned space in Quake Game Engine. A game engine is required to develop for rapidly escalating of 3D online games in Korea. too. Currently rendering time is saved with the hardware accelerator which is working on the high-level computer system. On the other hand, a game engine is needed to save rendering time for users with low-level computer system. Therefore, a game rendering engine is which reduces rendering time by PVS look-up table using Empty space BSP tree designed and implemented in this paper

  • PDF

Sparse Matrix Compression Technique and Hardware Design for Lightweight Deep Learning Accelerators (경량 딥러닝 가속기를 위한 희소 행렬 압축 기법 및 하드웨어 설계)

  • Kim, Sunhee;Shin, Dongyeob;Lim, Yong-Seok
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.4
    • /
    • pp.53-62
    • /
    • 2021
  • Deep learning models such as convolutional neural networks and recurrent neual networks process a huge amounts of data, so they require a lot of storage and consume a lot of time and power due to memory access. Recently, research is being conducted to reduce memory usage and access by compressing data using the feature that many of deep learning data are highly sparse and localized. In this paper, we propose a compression-decompression method of storing only the non-zero data and the location information of the non-zero data excluding zero data. In order to make the location information of non-zero data, the matrix data is divided into sections uniformly. And whether there is non-zero data in the corresponding section is indicated. In this case, section division is not executed only once, but repeatedly executed, and location information is stored in each step. Therefore, it can be properly compressed according to the ratio and distribution of zero data. In addition, we propose a hardware structure that enables compression and decompression without complex operations. It was designed and verified with Verilog, and it was confirmed that it can be used in hardware deep learning accelerators.