• 제목/요약/키워드: Programmable System-on-Chip

검색결과 88건 처리시간 0.029초

모바일 디바이스를 위한 소형 CNN 가속기의 마이크로코드 기반 컨트롤러 (Microcode based Controller for Compact CNN Accelerators Aimed at Mobile Devices)

  • 나용석;손현욱;김형원
    • 한국정보통신학회논문지
    • /
    • 제26권3호
    • /
    • pp.355-366
    • /
    • 2022
  • 본 논문은 프로그램 가능한 구조를 사용하여 재구성이 가능하고 저 전력 초소형의 장점을 모두 제공하는 인공지능 가속기를 위한 마이크로코드 기반 뉴럴 네트워크 가속기 컨트롤러를 제안한다. 대상 가속기가 다양한 뉴럴 네트워크 모델을 지원하도록 마이크로코드 컴파일러를 통해 뉴럴 네트워크 모델을 마이크로코드로 변환하여 가속기의 메모리 접근과 모든 연산기를 제어할 수 있다. 200MHz의 System Clock을 기준으로 설계하였으며, YOLOv2-Tiny CNN model을 구동하도록 컨트롤러를 구현하였다. 객체 감지를 위한 VOC 2012 dataset 추론용 컨트롤러를 구현할 경우 137.9ms/image, mask 착용 여부 감지를 위한 mask detection dataset 추론용으로 구현할 경우 99.5ms/image의 detection speed를 달성하였다. 제안된 컨트롤러를 탑재한 가속기를 실리콘칩으로 구현할 때 게이트 카운트는 618,388이며, 이는 CPU core로서 RISC-V (U5-MC2)를 탑재할 경우 대비 약 65.5% 감소한 칩 면적을 제공한다.

Core-A: A 32-bit Synthesizable Processor Core

  • Kim, Ji-Hoon;Lee, Jong-Yeol;Ki, Ando
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제4권2호
    • /
    • pp.83-88
    • /
    • 2015
  • Core-A is 32-bit synthesizable processor core with a unique instruction set architecture (ISA). In this paper, the Core-A ISA is introduced with discussion of useful features and the development environment, including the software tool chain and hardware on-chip debugger. Core-A is described using Verilog-HDL and can be customized for a given application and synthesized for an application-specific integrated circuit or field-programmable gate array target. Also, the GNU Compiler Collection has been ported to support Core-A, and various predesigned platforms are well equipped with the established design flow to speed up the hardware/software co-design for a Core-A-based system.

FPGA-Based Hardware Accelerator for Feature Extraction in Automatic Speech Recognition

  • Choo, Chang;Chang, Young-Uk;Moon, Il-Young
    • Journal of information and communication convergence engineering
    • /
    • 제13권3호
    • /
    • pp.145-151
    • /
    • 2015
  • We describe in this paper a hardware-based improvement scheme of a real-time automatic speech recognition (ASR) system with respect to speed by designing a parallel feature extraction algorithm on a Field-Programmable Gate Array (FPGA). A computationally intensive block in the algorithm is identified implemented in hardware logic on the FPGA. One such block is mel-frequency cepstrum coefficient (MFCC) algorithm used for feature extraction process. We demonstrate that the FPGA platform may perform efficient feature extraction computation in the speech recognition system as compared to the generalpurpose CPU including the ARM processor. The Xilinx Zynq-7000 System on Chip (SoC) platform is used for the MFCC implementation. From this implementation described in this paper, we confirmed that the FPGA platform is approximately 500× faster than a sequential CPU implementation and 60× faster than a sequential ARM implementation. We thus verified that a parallelized and optimized MFCC architecture on the FPGA platform may significantly improve the execution time of an ASR system, compared to the CPU and ARM platforms.

Hardware Approach to Fuzzy Inference―ASIC and RISC―

  • Watanabe, Hiroyuki
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1993년도 Fifth International Fuzzy Systems Association World Congress 93
    • /
    • pp.975-976
    • /
    • 1993
  • This talk presents the overview of the author's research and development activities on fuzzy inference hardware. We involved it with two distinct approaches. The first approach is to use application specific integrated circuits (ASIC) technology. The fuzzy inference method is directly implemented in silicon. The second approach, which is in its preliminary stage, is to use more conventional microprocessor architecture. Here, we use a quantitative technique used by designer of reduced instruction set computer (RISC) to modify an architecture of a microprocessor. In the ASIC approach, we implemented the most widely used fuzzy inference mechanism directly on silicon. The mechanism is beaded on a max-min compositional rule of inference, and Mandami's method of fuzzy implication. The two VLSI fuzzy inference chips are designed, fabricated, and fully tested. Both used a full-custom CMOS technology. The second and more claborate chip was designed at the University of North Carolina(U C) in cooperation with MCNC. Both VLSI chips had muliple datapaths for rule digital fuzzy inference chips had multiple datapaths for rule evaluation, and they executed multiple fuzzy if-then rules in parallel. The AT & T chip is the first digital fuzzy inference chip in the world. It ran with a 20 MHz clock cycle and achieved an approximately 80.000 Fuzzy Logical inferences Per Second (FLIPS). It stored and executed 16 fuzzy if-then rules. Since it was designed as a proof of concept prototype chip, it had minimal amount of peripheral logic for system integration. UNC/MCNC chip consists of 688,131 transistors of which 476,160 are used for RAM memory. It ran with a 10 MHz clock cycle. The chip has a 3-staged pipeline and initiates a computation of new inference every 64 cycle. This chip achieved an approximately 160,000 FLIPS. The new architecture have the following important improvements from the AT & T chip: Programmable rule set memory (RAM). On-chip fuzzification operation by a table lookup method. On-chip defuzzification operation by a centroid method. Reconfigurable architecture for processing two rule formats. RAM/datapath redundancy for higher yield It can store and execute 51 if-then rule of the following format: IF A and B and C and D Then Do E, and Then Do F. With this format, the chip takes four inputs and produces two outputs. By software reconfiguration, it can store and execute 102 if-then rules of the following simpler format using the same datapath: IF A and B Then Do E. With this format the chip takes two inputs and produces one outputs. We have built two VME-bus board systems based on this chip for Oak Ridge National Laboratory (ORNL). The board is now installed in a robot at ORNL. Researchers uses this board for experiment in autonomous robot navigation. The Fuzzy Logic system board places the Fuzzy chip into a VMEbus environment. High level C language functions hide the operational details of the board from the applications programme . The programmer treats rule memories and fuzzification function memories as local structures passed as parameters to the C functions. ASIC fuzzy inference hardware is extremely fast, but they are limited in generality. Many aspects of the design are limited or fixed. We have proposed to designing a are limited or fixed. We have proposed to designing a fuzzy information processor as an application specific processor using a quantitative approach. The quantitative approach was developed by RISC designers. In effect, we are interested in evaluating the effectiveness of a specialized RISC processor for fuzzy information processing. As the first step, we measured the possible speed-up of a fuzzy inference program based on if-then rules by an introduction of specialized instructions, i.e., min and max instructions. The minimum and maximum operations are heavily used in fuzzy logic applications as fuzzy intersection and union. We performed measurements using a MIPS R3000 as a base micropro essor. The initial result is encouraging. We can achieve as high as a 2.5 increase in inference speed if the R3000 had min and max instructions. Also, they are useful for speeding up other fuzzy operations such as bounded product and bounded sum. The embedded processor's main task is to control some device or process. It usually runs a single or a embedded processer to create an embedded processor for fuzzy control is very effective. Table I shows the measured speed of the inference by a MIPS R3000 microprocessor, a fictitious MIPS R3000 microprocessor with min and max instructions, and a UNC/MCNC ASIC fuzzy inference chip. The software that used on microprocessors is a simulator of the ASIC chip. The first row is the computation time in seconds of 6000 inferences using 51 rules where each fuzzy set is represented by an array of 64 elements. The second row is the time required to perform a single inference. The last row is the fuzzy logical inferences per second (FLIPS) measured for ach device. There is a large gap in run time between the ASIC and software approaches even if we resort to a specialized fuzzy microprocessor. As for design time and cost, these two approaches represent two extremes. An ASIC approach is extremely expensive. It is, therefore, an important research topic to design a specialized computing architecture for fuzzy applications that falls between these two extremes both in run time and design time/cost. TABLEI INFERENCE TIME BY 51 RULES {{{{Time }}{{MIPS R3000 }}{{ASIC }}{{Regular }}{{With min/mix }}{{6000 inference 1 inference FLIPS }}{{125s 20.8ms 48 }}{{49s 8.2ms 122 }}{{0.0038s 6.4㎲ 156,250 }} }}

  • PDF

Design of an FPGA-Based RTL-Level CAN IP Using Functional Simulation for FCC of a Small UAV System

  • Choe, Won Seop;Han, Dong In;Min, Chan Oh;Kim, Sang Man;Kim, Young Sik;Lee, Dae Woo;Lee, Ha-Joon
    • International Journal of Aeronautical and Space Sciences
    • /
    • 제18권4호
    • /
    • pp.675-687
    • /
    • 2017
  • In the aerospace industry, we have produced various models according to operational conditions and the environment after development of the base model is completed. Therefore, when design change is necessary, there are modification and updating costs of the circuit whenever environment variables change. For these reasons, recently, in various fields, system designs that can flexibly respond to changing environmental conditions using field programmable gate arrays (FPGAs) are attracting attention, and the rapidly changing aerospace industry also uses FPGAs to organize the system environment. In this paper, we design the controller area network (CAN) intellectual property (IP) protocol used instead of the avionics protocol that includes ARINC-429 and MIL-STD-1553, which are not suitable for small unmanned aerial vehicle (UAV) systems at the register transistor logic (RTL) level, which does not depend on the FPGA vender, and we verify the performance. Consequentially, a Spartan 6 FPGA model-based system on chip (SoC) including an embedded system is constructed by using the designed CAN communications IP and Xilinx Microblaze, and the configured SoC only recorded an average 32% logic element usage rate in the Spartan 6 FPGA model.

Method of SSO Noise Reduction on FPGA of Digital Optical Units in Optical Communication

  • Kim, Jae Wan;Eom, Doo Seop
    • 전자공학회논문지
    • /
    • 제50권1호
    • /
    • pp.97-101
    • /
    • 2013
  • There is a growing need for optical communication systems that convert large volumes of data to optical signals and that accommodate and transmit the signals across long distances. Digital optical communication consists of a master unit (MU) and a slave unit (SU). The MU transmits data to SU using digital optical signals. However, digital optical units that are commercially available or are under development transmit data using two's complement representation. At low input levels, a large number of SSOs (simultaneous switching outputs) are required because of the high rate of bit switching in two's complement, which thereby increases the power noise. This problem reduces the overall system capability because a DSP (digital signal processor) chip (FPGA, CPLD, etc.) cannot be used efficiently and power noise increases. This paper proposes a change from two's complement to a more efficient method that produces less SSO noise and can be applied to existing digital optical units.

다관절 휴머노이드 로봇 팔의 제어를 위한 시간지연 제어기의 FPGA 구현 및 실험 (FPGA Implementation and Experiment of a Time-Delayed Controller for Humanoid Robot Arm Control)

  • 이운규;전효원;정슬
    • 제어로봇시스템학회논문지
    • /
    • 제13권7호
    • /
    • pp.649-655
    • /
    • 2007
  • In this paper, a time-delayed controller for position control of humanoid robot arms is designed and implemented on a field programmable gate array(FPGA) chip. The time-delayed control algorithm is simple to implement, and robust to reject disturbances. The time-delayed control method uses the one sample time-delayed previous information to cancel out uncertainties in the system. Since the sampling time is so fast with the current hardware technology, the time-delayed controller can be implemented. However, inertia values should be correctly estimated to have the better performance. The position tracking tasks of humanoid robot arms are tested to compare performances of several control algorithms including the time-delayed controller.

고해상도 영상을 지원하는 휴대용 기기의 잡음 감소와 윤곽 강조 방법 및 구조 (Noise Reduction and Edge Enhancement Method and Architecture for Mobile Devices Supporting High Resolution Video)

  • 이금석;전병우
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2006년도 가을 학술발표논문집 Vol.33 No.2 (D)
    • /
    • pp.502-505
    • /
    • 2006
  • 본 논문은 고해상도의 영상을 처리하는 이동기기 등에 사용되는 SoC(System On a Chip)에 구현이 용이한 효과적인 화질 향상 (잡음감소와 윤곽강조) 을 위한 방법과 구조에 대한 것이다. 최근 이동기기의 발전과 진화에 따라 여러 형태의 이동기기가 개발되고 있는데 그 중 최근 인기를 끌고 있는 포터블 미디어 플레이어 (PMP)나 HD(Hight Definition)급 camcorder 등이 고해상도의 영상을 처리하는 이동기기로 분류될 수 있다. 이러한 이동기기에서 고해상도 영상에 대한 화질 향상을 기존의 복잡한 방법을 사용해 처리한다면 메모리 대역폭이나 하드웨어 크기 등의 증가로 이동기기에서 구현하는데 어려움이 따른다. 이에 본 논문에서는 이러한 이동기기에서의 고해상도의 화질 향상을 입력영상의 종류에 따라 선택적으로 메모리 대역폭 사용 없이 하드웨어 크기를 최소화하여 FPGA (field programmable gate array)나 ASIC (application specific integrated circuit)으로 구현이 용이하도록 하는 방법과 구조에 대해 설명하고 실제 영상을 가지고 실험한 결과로 주관적 화질 향상 효과를 가져 온 것을 확인할 수 있었다.

  • PDF

RS485 Star 구조의 비행체 탑재용 데이터 수집시스템 구현 및 성능시험 (Design and Test of On-Board Flight Data Acquisition System based on the RS485 Star Network)

  • 이상래;이재득
    • 한국항공우주학회지
    • /
    • 제32권7호
    • /
    • pp.83-90
    • /
    • 2004
  • 본 논문은 중, 대형 비행체에 분산 탑재되어 있는 많은 센서를 효과적으로 실시간 데이터를 수집할 수 있는 시스템의 구성도를 소개하고 각종 서브 장치의 회로, 장치들간의 통신 프로토콜에 대한 설계 내용 그리고 최종적으로 실험을 통해 성능 검증한 내용을 언급한다. 각 장치들은 FPGA 소자를 기본으로 하여 모든 제어 로직, 시퀀스 로직 등의 디지털 회로를 설계 및 시뮬레이션을 수행하였으며 그 결과를 바탕으로 실제 하드웨어를 제작하여 원거리 아날로그 및 디지털 데이터의 획득, 수집 및 포맷의 일련의 과정들이 정상 적으로 이루어지는지를 확인하였다.

Full HD TV를 위한 효율적인 VDP SoC 구조 (Effective SoC Architecture of a VDP for full HD TVs)

  • 김지훈;김영철
    • 스마트미디어저널
    • /
    • 제1권1호
    • /
    • pp.1-9
    • /
    • 2012
  • 본 논문에서는 Full HD TV를 위한 화질 개선 VDP( Video Display Processor)의 SoC( System on a Chip) 구현을 위한 효율적인 하드웨어 구조를 제안한다. 제안한 구조는 SoC 설계의 한 방법으로써 효율적인 버스 구조와 유연성 있는 인터페이스를 지원하여 실시간 비디오 처리를 가능하게 한다. VDP를 구성 하고 있는 비디오 IP 들은 고화질 영상 제공 및 화질 개선을 위한 목적으로 설계 되었고, 각각의 IP는 실시간성 보장 및 SoC의 하드웨어 통합을 위해서 Avalon 인터페이스가 사용되었다. 이는 설계시간을 단축하고, IP 검증과 특히 SoC를 구성하는데 있어서 IP 추가 삭제 및 변경 등이 용이함으로써 사용자의 편리성을 높여준다. 또한 SoC의 임베디드 소프트웨어는 실시간으로 비디오 세부 항목 설정 및 데이터 전송 방식 설정 등을 제어할 수 있음으로써 유연성 있는 실시간 처리 시스템을 구현할 수 있다. VDP의 SoC 구현은 CyclonIII SoPC(System on a Programmable Chip) 플랫폼 상에서 구현되었으며, 실험 결과 SD 해상도의 입력 영상을 Full HD 해상도로 변환시킴으로써 고화질 영상을 획득 할 수 있다.

  • PDF