• Title/Summary/Keyword: C-to-FPGA

Search Result 282, Processing Time 0.029 seconds

Cascade CNN with CPU-FPGA Architecture for Real-time Face Detection (실시간 얼굴 검출을 위한 Cascade CNN의 CPU-FPGA 구조 연구)

  • Nam, Kwang-Min;Jeong, Yong-Jin
    • Journal of IKEEE
    • /
    • v.21 no.4
    • /
    • pp.388-396
    • /
    • 2017
  • Since there are many variables such as various poses, illuminations and occlusions in a face detection problem, a high performance detection system is required. Although CNN is excellent in image classification, CNN operatioin requires high-performance hardware resources. But low cost low power environments are essential for small and mobile systems. So in this paper, the CPU-FPGA integrated system is designed based on 3-stage cascade CNN architecture using small size FPGA. Adaptive Region of Interest (ROI) is applied to reduce the number of CNN operations using face information of the previous frame. We use a Field Programmable Gate Array(FPGA) to accelerate the CNN computations. The accelerator reads multiple featuremap at once on the FPGA and performs a Multiply-Accumulate (MAC) operation in parallel for convolution operation. The system is implemented on Altera Cyclone V FPGA in which ARM Cortex A-9 and on-chip SRAM are embedded. The system runs at 30FPS with HD resolution input images. The CPU-FPGA integrated system showed 8.5 times of the power efficiency compared to systems using CPU only.

A Design and Implementation of 32-bit Five-Stage RISC-V Processor Using FPGA (FPGA를 이용한 32-bit RISC-V 5단계 파이프라인 프로세서 설계 및 구현)

  • Jo, Sangun;Lee, Jonghwan;Kim, Yongwoo
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.4
    • /
    • pp.27-32
    • /
    • 2022
  • RISC-V is an open instruction set architecture (ISA) developed in 2010 at UC Berkeley, and active research is being conducted as a processor to compete with ARM. In this paper, we propose an SoC system including an RV32I ISA-based 32-bit 5-stage pipeline processor and AHB bus master. The proposed RISC-V processor supports 37 instructions, excluding FENCE, ECALL, and EBREAK instructions, out of a total of 40 instructions based on RV32I ISA. In addition, the RISC-V processor can be connected to peripheral devices such as BRAM, UART, and TIMER using the AHB-lite bus protocol through the proposed AHB bus master. The proposed SoC system was implemented in Arty A7-35T FPGA with 1,959 LUTs and 1,982 flip-flops. Furthermore, the proposed hardware has a maximum operating frequency of 50 MHz. In the Dhrystone benchmark, the proposed processor performance was confirmed to be 0.48 DMIPS.

Design of Space Vector Modulation PWM and Digital Control of System On Programmable-Chip Using FPGA (FPGA를 이용한 공간벡터 변조 PWM 및 디지털 제어부의 System On Programmable Chip 설계)

  • Hwang, Jeong-Won;Kim, Seung-Ho;Yang, Bin;Lee, Cheon-Gi;Park, Seung-Yub
    • The Transactions of the Korean Institute of Electrical Engineers P
    • /
    • v.61 no.1
    • /
    • pp.47-54
    • /
    • 2012
  • This paper presents implementation of SVPWM technique for three phase Voltage Source Inverter using FPGA. Software-based vector-control calculations much this drawback, in order to improve the hardware-vector-control tries. Without the need for additional software, vector control algorithm is designed to be modular SOPC, and DSP will reduce most of the operations. In this paper, the SVPWM that using HDL for the AC motor vector control algorithm level, and the dead time part and the speed control in order to controled a speed detector and designed in the form of modules. Then ALTERA corporation Cyclone III series EP3C16F484 can be verified by implemented.

Design and FPGA Implementation of a High-Speed RSA Algorithm for Digital Signature (디지털 서명을 위한 고속 RSA 암호 시스템의 설계 및 FPGA 구현)

  • 강민섭;김동욱
    • The KIPS Transactions:PartC
    • /
    • v.8C no.1
    • /
    • pp.32-40
    • /
    • 2001
  • In this paper, we propose a high-speed modular multiplication algorithm which revises conventional Montgomery's algorithm. A hardware architecture is also presented to implement 1024-bit RSA cryptosystem for digital signature based on the proposed algorithm. Each iteration in our approach requires only one addition operation for two n-bit integers, while that in Montgomery's requires two addition operations for three n-bit integers. The system which is modelled in VHDL(VHSIC Hardware Description Language) is simulated in functionally through the use of $Synopsys^{TM}$ tools on a Axil-320 workstation, where Altera 10K libraries are used for logic synthesis. For FPGA implementation, timing simulation is also performed through the use of Altera MAX + PLUS II. Experimental results show that the proposed RSA cryptosystem has distinctive features that not only computation speed is faster but also hardware area is drastically reduced compared to conventional approach.

  • PDF

ASIC Design of OpenRISC-based Multimedia SoC Platform (OpenRISC 기반 멀티미디어 SoC 플랫폼의 ASIC 설계)

  • Kim, Sun-Chul;Ryoo, Kwang-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.281-284
    • /
    • 2008
  • This paper describes ASIC design of multimedia SoC Platform. The implemented Platform consists of 32-bit OpenRISC1200 Microprocessor, WISHBONE on-chip bus, VGA Controller, Debug Interface, SRAM Interface and UART. The 32-bit OpenRISC1200 processor has 5 stage pipeline and Harvard architecture with separated instruction/data bus. The VGA Controller can display RCB data on a CRT or LCD monitor. The Debug Interface supports a debugging function for the Platform. The SRAM Interface supports 18-bit address bus and 32-bit data bus. The UART provides RS232 protocol, which supports serial communication function. The Platform is design and verified on a Xilinx VERTEX-4 XC4VLX80 FPGA board. Test code is generated by a cross compiler' and JTAG utility software and gdb are used to download the test code to the FPGA board through parallel cable. Finally, the Platform is implemented into a single ASIC chip using Chatered 0.18um process and it can operate at 100MHz clock frequency.

  • PDF

GCC2Verilog Compiler Toolset for Complete Translation of C Programming Language into Verilog HDL

  • Huong, Giang Nguyen Thi;Kim, Seon-Wook
    • ETRI Journal
    • /
    • v.33 no.5
    • /
    • pp.731-740
    • /
    • 2011
  • Reconfigurable computing using a field-programmable gate-array (FPGA) device has become a promising solution in system design because of its power efficiency and design flexibility. To bring the benefit of FPGA to many application programmers, there has been intensive research about automatic translation from high-level programming languages (HLL) such as C and C++ into hardware. However, the large gap of syntaxes and semantics between hardware and software programming makes the translation challenging. In this paper, we introduce a new approach for the translation by using the widely used GCC compiler. By simply adding a hardware description language (HDL) backend to the existing state-of- the-art compiler, we could minimize an effort to implement the translator while supporting full features of HLL in the HLL-to-HDL translation and providing high performance. Our translator, called GCC2Verilog, was implemented as the GCC's cross compiler targeting at FPGAs instead of microprocessor architectures. Our experiment shows that we could achieve a speedup of up to 34 times and 17 times on average with 4-port memory over PICO microprocessor execution in selected EEMBC benchmarks.

Development Process of FPGA-based Departure from Nucleate Boiling Ratio Algorithm Using Systems Engineering Approach

  • Hwang, In Sok;Jung, Jae Cheon
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.14 no.2
    • /
    • pp.41-48
    • /
    • 2018
  • This paper describes the systems engineering development process for the Departure from Nucleate Boiling Ratio (DNBR) algorithm using FPGA. Current Core Protection Calculator System (CPCS) requirement and DNBR logic are analyzed in the reverse engineering phase and the new FPGA based DNBR algorithm is designed in the re-engineering phase. FPGA based DNBR algorithm is developed by VHSIC Hardware Description Language (VHDL) in the implementation phase and VHDL DNBR software is verified in the software Verification & Validation phase. Test cases are developed to perform the software module test for VHDL software modules. The APR 1400 simulator is used to collect the inputs data in 100%, 75%, and 50% reactor power condition. Test input signals are injected to the software modules following test case tables and output signals are compared with the expected test value. Minimum DNBR value from developed DNBR algorithm is validated by KEPCO E&C CPCS development facility. This paper summarizes the process to develop the FPGA-based DNBR calculation algorithm using systems engineering approach.

An Optimized Hardware Implementation of SHA-3 Hash Functions (SHA-3 해시 함수의 최적화된 하드웨어 구현)

  • Kim, Dong-Seong;Shin, Kyung-Wook
    • Journal of IKEEE
    • /
    • v.22 no.4
    • /
    • pp.886-895
    • /
    • 2018
  • This paper describes a hardware design of the Secure Hash Algorithm-3 (SHA-3) hash functions that are the latest version of the SHA family of standards released by NIST, and an implementation of ARM Cortex-M0 interface for security SoC applications. To achieve an optimized design, the tradeoff between hardware complexity and performance was analyzed for five hardware architectures, and the datapath of round block was determined to be 1600-bit on the basis of the analysis results. In addition, the padder with a 64-bit interface to round block was implemented in hardware. A SoC prototype that integrates the SHA-3 hash processor, Cortex-M0 and AHB interface was implemented in Cyclone-V FPGA device, and the hardware/software co-verification was carried out. The SHA-3 hash processor uses 1,672 slices of Virtex-5 FPGA and has an estimated maximum clock frequency of 289 Mhz, achieving a throughput of 5.04 Gbps.

The Middleware Extension for guaranteeing the Implementation-Independency between C++ and VHDL (SCA에서 C++/VHDL 구현 독립성을 보장하기 위한 미들웨어의 확장)

  • Bae, Myung-Nam;Lee, Byung-Bog;Park, Ae-Soon;Lee, In-Hwan;Kim, Nae-Soo
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.46 no.6
    • /
    • pp.66-77
    • /
    • 2009
  • In this paper, we propose a CORBA middleware extension which is suitable to SCA based communication environment. The extensions guarantee the components to interconnect others without consideration about its implementation way and enables the developers to easily achieve the performance improvements in comparison to the existing methodology. This extension includes the HAO, the IDL2VHDL compiler, and the improvement of ORBit. The HAO is ORB implemented in logic level and is limited the some function according to the characteristic of FPGA. In addition, the IDL2VHDL compiler provides the mapping from CORBA IDL to VHDL, the VHSIC hardware description language, and the additional procedures for processing the component. Finally, the improved ORBit, CORBA ORB on GPP, can be direct connecting with the HAO on FPGA.

Implementation of a Window-Masking Method and the Soft-core Processor based TDD Switching Control SoC FPGA System (윈도 마스킹 기법과 Soft-core Processor 기반 TDD 스위칭 제어 SoC 시스템 FPGA 구현)

  • Hee-Jin Yang;Jeung-Sub Lee;Han-Sle Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.3
    • /
    • pp.166-175
    • /
    • 2024
  • In this paper, the Window-Masking Method and HAT (Hardware Attached Top) CPU SoM (System on Module) are used to improve the performance and reduce the weight of the MANET (Mobile Ad-hoc Network) network synchronization system using time division redundancy. We propose converting it into a RISC-V based soft-core MCU and mounting it on an FPGA, a hardware accelerator. It was also verified through experiment. In terms of performance, by applying the proposed technique, the synchronization acquisition range is from -50dBm to +10dBm to -60dBm to +10dBm, the lowest input level for synchronization is increased by 20% from -50dBm to -60dBm, and the detection delay (Latency) is 220ns. Reduced by 43% to 125ns. In terms of weight reduction, computing resources (48%), size (33%), and weight (27%) were reduced by an average of 36% by replacing with soft-core MCU.