Search | Korea Science

Design and Implementation of Accelerator Architecture for Binary Weight Network on FPGA with Limited Resources (한정된 자원을 갖는 FPGA에서의 이진가중치 신경망 가속처리 구조 설계 및 구현)

Kim, Jong-Hyun;Yun, SangKyun
- Journal of IKEEE
- /
- v.24 no.1
- /
- pp.225-231
- /
- 2020
In this paper, we propose a method to accelerate BWN based on FPGA with limited resources for embedded system. Because of the limited number of logic elements available, a single computing unit capable of handling Conv-layer, FC-layer of various sizes must be designed and reused. Also, if the input feature map can not be parallel processed at one time, the output must be calculated by reading the inputs several times. Since the number of available BRAM modules is limited, the number of data bits in the BWN accelerator must be minimized. The image classification processing time of the BWN accelerator is superior when compared with a embedded CPU and is faster than a desktop PC and 50% slower than a GPU system. Since the BWN accelerator uses a slow clock of 50MHz, it can be seen that the BWN accelerator is advantageous in performance versus power.
https://doi.org/10.7471/ikeee.2020.24.1.225 인용 PDF KSCI

Low-Power Multiplication Processing Element Hardware to Support Parallel Convolutional Neural Network Processing (합성곱 신경망 병렬 연산처리를 지원하는 저전력 곱셈 프로세싱 엘리먼트 설계)

Eunpyoung Park;Jongsu Park
- Journal of Platform Technology
- /
- v.12 no.2
- /
- pp.58-63
- /
- 2024
CNNs tend to take a long time to learn and consume a lot of power due to lack of system resources with many data processing units when there are repetitive handles that do not have high performance in the image field. In this paper, we propose a handling method based on a low-power bus that can increase the exchange rate of multipliers and multiplicands within the convolution mixer, which is a tendency activity that occurs when a convolution mixer has multiplication, which is the core element of combination. Convolutional neural networks have proprietary low-power shared processor support and the design was implemented on an Intel DE1-SoC FPGA board using Verilog-HDL. The experiments validated the performance by comparing it with the exchange rate of the multiplier originally proposed by Shen on MNIST's numeric image database.
PDF

A study on the architecture and logic block design of FPGA (FPGA 구조 및 로직 블록의 설계에 관한 연구)

윤여환;문중석;문병모;안성근;정덕균
- Journal of the Korean Institute of Telematics and Electronics A
- /
- v.33A no.11
- /
- pp.140-151
- /
- 1996
In this study, we designed the routing structure and logic block of a SRAM cell-based FPGA with symmetrical-array architecture. The designed routing structure is composed of switch matrices, routing channels and I/O blocks, and the routing channels can be subdivided into single length channels, double length channels and global length channels. The interconnection between wires is made through SRAM cell-controlled pass transistors. To reduce the signal delay in pass transistors, we proposed a scheme raising the gate-control voltage to 7V. The designed SRAM cells have built-in shift register capability, so there is no need for separate shift registers. We designed SRAM cells in the LUTs(look-up tables) to enable the wirte operations to be performed synchronously with the clock for ease of system application. Each logic block (LFU) has four 4-input LUTs, flip-flops and other gates, and the LUTs can be used a sSRAM memory. The LFU also has a dedicated carry logic, so a 4-bit adder can be implemented in one LFU. We designed our FPGA using 0.6.mu.m CMOS technology, and simulation shows proper operation of a 4 bit counter at 100MHz.
PDF

An FPGA Implementation of High-Speed Adaptive Turbo Decoder

Kim, Min-Huyk;Jung, Ji-Won;Bae, Jong-Tae;Choi, Seok-Soon;Lee, In-Ki
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.32 no.4C
- /
- pp.379-388
- /
- 2007
In this paper, we propose an adaptive turbo decoding algorithm for high order modulation scheme combined with originally design for a standard rate-1/2 turbo decoder for B/QPSK modulation. A transformation applied to the incoming I-channel and Q-channel symbols allows the use of an off-the-shelf B/QPSK turbo decoder without any modifications. Adaptive turbo decoder process the received symbols recursively to improve the performance. As the number of iterations increase, the execution time and power consumption also increase as well. The source of the latency and power consumption reduction is from the combination of the radix-4, dual-path processing, parallel decoding, and early-stop algorithms. We implemented the proposed scheme on a field-programmable gate array (FPGA) and compared its decoding speed with that of a conventional decoder. From the result of implementation, we confirm that the decoding speed of proposed adaptive decoding is faster than conventional scheme by 6.4 times.
PDF KSCI

Implementation of a Fieldbus System Based on EIA-709.1 Control Network Protocol (EIA-709.1 Control Network Protocol을 이용한 필드버스 시스템 구현)

Park, Byoung-Wook;Kim, Jung-Sub;Lee, Chang-Hee;Kim, Jong-Bae;Lim, Kye-Young
- Journal of Institute of Control, Robotics and Systems
- /
- v.6 no.7
- /
- pp.594-601
- /
- 2000
EIA-709.1 Control Network Protocol is the basic protocol of LonWorks systems that is emerg-ing as a fieldbus device. In this paper the protocol is implemented by using VHDL with FPGA and C program on an Intel 8051 processor. The protocol from the physical layer to the network layer of EIA-709.1 is im-plemented in a hardware level,. So it decreases the load of the CPU for implementing the protocol. We verify the commercial feasibility of the hardware through the communication test with Neuron Chip. based on EIA-709.1 protocol which is used in industrial fields. The developed protocol based on FPGA becomes one of IP can be applicable to various industrial field because it is implemented by VHDL.
PDF

FPGA Implementation of Diode Clamped Multilevel Inverter for Speed Control of Induction Motor

Kuppuswamy, C.L.;Raghavendiran, T.A.
- Journal of Electrical Engineering and Technology
- /
- v.13 no.1
- /
- pp.362-371
- /
- 2018
This work proposes FPGA implementation of Carrier Disposition PWM for closed loop seven level diode clamped multilevel inverter in speed control of induction motor. VLSI architecture for carrier Disposition have been introduced through which PWM signals are fed to the neutral point seven level diode clamped multilevel using which the speed of the induction motor is controlled. This proposed VLSI architecture makes the power circuit to work better with reduced stresses across the switches and a very low voltage and current total harmonic distortion (THD). The output voltages, currents, torque & speed characteristics for seven level neutral point diode clamped multilevel inverter for AC drive was studied. It has observed the proposed scheme introduces less distortion and harmonics. The results were validated using real time results.
https://doi.org/10.5370/JEET.2018.13.1.362 인용 PDF KSCI HTML

Implementation of Segment_LCD display based on SoC design

Ling, Ma;Kim, Kab-Il;Son, Young-I.
- Proceedings of the KIEE Conference
- /
- 2003.11b
- /
- pp.59-62
- /
- 2003
The purpose of this paper is to present how to implement Segment_LCD display using SoC design. The SoC design is achieved by using an ARM_based Excalibur device. The Excalibur device offers an outstanding embedded development platform with ARM922T and FPA. The design in the Excailbur device uses the embedded AR띤 Processor core and the AMBA high-performance bus (AHH) to write to a memory-mapped slave peripheral in the FPGA portion of the device. Here, Segment_LCD is one kind of memory-mapped slave peripherals. In order to Implement the Segment_LCD display based on SoC design, four steps are fellowed. At first, IP modules are made by using Verilog HDL. Secondly, the ARM processor of the Excalibur is programmed using C in ADS (ARM Developer Suite). And in the third step, the whole system is simulated and verified. At last, modules are downloaded to SoCMaster kit. Both Quartus II software and ModelSim5.5e software are the key software tools during the design.
PDF

FPGA Design of Motion JPEG2000 Encoder for Digital Cinema (디지털 시네마용 Motion JPEG2000 인코더의 FPGA 설계)

Seo, Young-Ho;Choi, Hyun-Jun;Kim, Dong-Wook
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.32 no.3C
- /
- pp.297-305
- /
- 2007
In the paper, a Motion JPEG2000 coder which has been set as the standard for image compression by the Digital Cinema Initiatives (DCI), an organization composed of major movie studios was implemented into a target FPGA. The DWT (Discrete Wavelet Transform) based on lifting and the Tier 1 of EBCOT (Embedded Block Coding with Optimized Truncation) which are major functional modules of the JPEG2000 were setup with dedicated hardware. The Tier 2 process was implemented in software. For digital cinema the tile-size was set to support $1024\times1024$ pixels. To ensure the real-time operations, three entropy encoders were used. When Verilog-HDL was used for hardware, resources of 32,470 LEs in Altera's Stratix EP1S80 were used, and the hardware worked stably at the frequency of 150Mhz.
PDF KSCI

FPGA-based One-Chip Architecture and Design of Real-time Video CODEC with Embedded Blind Watermarking (블라인드 워터마킹을 내장한 실시간 비디오 코덱의 FPGA기반 단일 칩 구조 및 설계)

서영호;김대경;유지상;김동욱
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.8C
- /
- pp.1113-1124
- /
- 2004
In this paper, we proposed a hardware(H/W) structure which can compress and recontruct the input image in real time operation and implemented it into a FPGA platform using VHDL(VHSIC Hardware Description Language). All the image processing element to process both compression and reconstruction in a FPGA were considered each of them was mapped into H/W with the efficient structure for FPGA. We used the DWT(discrete wavelet transform) which transforms the data from spatial domain to the frequency domain, because use considered the motion JPEG2000 as the application. The implemented H/W is separated to both the data path part and the control part. The data path part consisted of the image processing blocks and the data processing blocks. The image processing blocks consisted of the DWT Kernel fur the filtering by DWT, Quantizer/Huffman Encoder, Inverse Adder/Buffer for adding the low frequency coefficient to the high frequency one in the inverse DWT operation, and Huffman Decoder. Also there existed the interface blocks for communicating with the external application environments and the timing blocks for buffering between the internal blocks The global operations of the designed H/W are the image compression and the reconstruction, and it is operated by the unit of a field synchronized with the A/D converter. The implemented H/W used the 69%(16980) LAB(Logic Array Block) and 9%(28352) ESB(Embedded System Block) in the APEX20KC EP20K600CB652-7 FPGA chip of ALTERA, and stably operated in the 70MHz clock frequency. So we verified the real time operation of 60 fields/sec(30 frames/sec).
PDF KSCI

FPGA Implementation of VME System Controller (VME 시스템 제어기의 FPGA 구현)

Bae, Sang-Hyun;Lee, Kang-Hyeon
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.11
- /
- pp.2914-2922
- /
- 1997
For FA (factory automation) and ATE (automatic test equipment) in the industrial area, the standard bus needs to increase the system performance of multiprocessor environment. VME(versa module european package format) bus is appropriated to the standard bus but has features of small package and low board density. Beside, the density of board and semiconductor have grown to become significant issues that affect development time, project cost and field diagnostics. To fit this trend, in this paper, we composed Revision C.1 (IEEE std. P1014-1987) of the integrated environment for the main function such as arbitration, interrupt and interface between, VMEbus and several control modules Also the designed, VME system controller is implemented on FPGA that can be located even into slot 1. The control and function modules are coded with VHDL mid-fixed description method and then those operations are verified by simulation. As a result of experiment, we confirmed the most important that is the operation of Bus timer about Bus error signal should occur within $56{\mu}m$, and both control and function modules have the reciprocal operation correctly. Thus, the constructed VHDL library will be able to apply the system based VMEbus and ASIC design.
PDF

Search Result 282, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)