• Title/Summary/Keyword: FLOPS

Search Result 128, Processing Time 0.021 seconds

Performance Analysis of Processors for Next Generation Satellites (차세대 위성 프로세서 선정을 위한 성능 분석)

  • Yoo, Bum-Soo;Choi, Jong-Wook;Jeong, Jae-Yeop;Kim, Sun-Wook
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.1
    • /
    • pp.51-61
    • /
    • 2019
  • There are strict evaluation processes before using new processors to satellites. Engineers evaluate processors from various viewpoints including specification, development environment, and cost. From a viewpoint of computation power, manufacturers provide benchmark results with processors, and engineers decide which processors are adequate to their satellites by comparing the benchmark results with requirements of their satellites. However, the benchmark results depends on a test environment of manufacturers, and it is quite difficult to achieve similar performance in a target environment. Therefore, it is necessary to evaluate the processors in the target environment. This paper compares performance of a processor, AT697F/LEON2, in software testbed (STB) with three development boards of XC2V/LEON3, GR712RC/LEON3, and GR740/LEON4. Seven benchmark functions of Dhrystone, Stanford, Coremark, Whetstone, Flops, NBench, and MiBench are selected. Results are analyzed with hardware and software properties: hardware properties of core architecture, number of cores, cache, and memory; and software properties of build options and compilers. Based on the analysis, this paper describes a guideline for choosing processors for next generation satellites.

Training-Free Hardware-Aware Neural Architecture Search with Reinforcement Learning

  • Tran, Linh Tam;Bae, Sung-Ho
    • Journal of Broadcast Engineering
    • /
    • v.26 no.7
    • /
    • pp.855-861
    • /
    • 2021
  • Neural Architecture Search (NAS) is cutting-edge technology in the machine learning community. NAS Without Training (NASWOT) recently has been proposed to tackle the high demand of computational resources in NAS by leveraging some indicators to predict the performance of architectures before training. The advantage of these indicators is that they do not require any training. Thus, NASWOT reduces the searching time and computational cost significantly. However, NASWOT only considers high-performing networks which does not guarantee a fast inference speed on hardware devices. In this paper, we propose a multi objectives reward function, which considers the network's latency and the predicted performance, and incorporate it into the Reinforcement Learning approach to search for the best networks with low latency. Unlike other methods, which use FLOPs to measure the latency that does not reflect the actual latency, we obtain the network's latency from the hardware NAS bench. We conduct extensive experiments on NAS-Bench-201 using CIFAR-10, CIFAR-100, and ImageNet-16-120 datasets, and show that the proposed method is capable of generating the best network under latency constrained without training subnetworks.

High Throughput Multiplier Architecture for Elliptic Cryptographic Applications

  • Swetha, Gutti Naga;Sandi, Anuradha M.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.414-426
    • /
    • 2022
  • Elliptic Curve Cryptography (ECC) is one of the finest cryptographic technique of recent time due to its lower key length and satisfactory performance with different hardware structures. In this paper, a High Throughput Multiplier architecture is introduced for Elliptic Cryptographic applications based on concurrent computations. With the aid of the concurrent computing approach, the High Throughput Concurrent Computation (HTCC) technology that was just presented improves the processing speed as well as the overall efficiency of the point-multiplier architecture. Here, first and second distinct group operation of point multiplier are combined together and synthesised concurrently. The synthesis of proposed HTCC technique is performed in Xilinx Virtex - 5 and Xilinx Virtex - 7 of Field-programmable gate array (FPGA) family. In terms of slices, flip flops, time delay, maximum frequency, and efficiency, the advantages of the proposed HTCC point multiplier architecture are outlined, and a comparison of these advantages with those of existing state-of-the-art point multiplier approaches is provided over GF(2163), GF(2233) and GF(2283). The efficiency using proposed HTCC technique is enhanced by 30.22% and 75.31% for Xilinx Virtex-5 and by 25.13% and 47.75% for Xilinx Virtex-7 in comparison according to the LC design as well as the LL design, in their respective fashions. The experimental results for Virtex - 5 and Virtex - 7 over GF(2233) and GF(2283)are also very satisfactory.

A Design and Implementation of 32-bit Five-Stage RISC-V Processor Using FPGA (FPGA를 이용한 32-bit RISC-V 5단계 파이프라인 프로세서 설계 및 구현)

  • Jo, Sangun;Lee, Jonghwan;Kim, Yongwoo
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.4
    • /
    • pp.27-32
    • /
    • 2022
  • RISC-V is an open instruction set architecture (ISA) developed in 2010 at UC Berkeley, and active research is being conducted as a processor to compete with ARM. In this paper, we propose an SoC system including an RV32I ISA-based 32-bit 5-stage pipeline processor and AHB bus master. The proposed RISC-V processor supports 37 instructions, excluding FENCE, ECALL, and EBREAK instructions, out of a total of 40 instructions based on RV32I ISA. In addition, the RISC-V processor can be connected to peripheral devices such as BRAM, UART, and TIMER using the AHB-lite bus protocol through the proposed AHB bus master. The proposed SoC system was implemented in Arty A7-35T FPGA with 1,959 LUTs and 1,982 flip-flops. Furthermore, the proposed hardware has a maximum operating frequency of 50 MHz. In the Dhrystone benchmark, the proposed processor performance was confirmed to be 0.48 DMIPS.

Design and Implementation of Multi-mode Sensor Signal Processor on FPGA Device (다중모드 센서 신호 처리 프로세서의 FPGA 기반 설계 및 구현)

  • Soongyu Kang;Yunho Jung
    • Journal of Sensor Science and Technology
    • /
    • v.32 no.4
    • /
    • pp.246-251
    • /
    • 2023
  • Internet of Things (IoT) systems process signals from various sensors using signal processing algorithms suitable for the signal characteristics. To analyze complex signals, these systems usually use signal processing algorithms in the frequency domain, such as fast Fourier transform (FFT), filtering, and short-time Fourier transform (STFT). In this study, we propose a multi-mode sensor signal processor (SSP) accelerator with an FFT-based hardware design. The FFT processor in the proposed SSP is designed with a radix-2 single-path delay feedback (R2SDF) pipeline architecture for high-speed operation. Moreover, based on this FFT processor, the proposed SSP can perform filtering and STFT operation. The proposed SSP is implemented on a field-programmable gate array (FPGA). By sharing the FFT processor for each algorithm, the required hardware resources are significantly reduced. The proposed SSP is implemented and verified on Xilinxh's Zynq Ultrascale+ MPSoC ZCU104 with 53,591 look-up tables (LUTs), 71,451 flip-flops (FFs), and 44 digital signal processors (DSPs). The FFT, filtering, and STFT algorithm implementations on the proposed SSP achieve 185x average acceleration.

The Design of the Ternary Sequential Logic Circuit Using Ternary Logic Gates (3치 논리 게이트를 이용한 3치 순차 논리 회로 설계)

  • 윤병희;최영희;이철우;김흥수
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.10
    • /
    • pp.52-62
    • /
    • 2003
  • This paper discusses ternary logic gate, ternary D flip-flop, and ternary four-digit parallel input/output register. The ternary logic gates consist of n-channel pass transistors and neuron MOS(νMOS) threshold inverters on voltage mode. They are designed with a transmission function using threshold inverter that are in turn, designed using Down Literal Circuit(DLC) that has various threshold voltages. The νMOS pass transistor is very suitable gate to the multiple-valued logic(MVL) and has the input signal of the multi-level νMOS threshold inverter. The ternary D flip-flop uses the storage element of the ternary data. The ternary four-digit parallel input/output register consists of four ternary D flip-flops which can temporarily store four-digit ternary data. In this paper, these circuits use 3.3V low power supply voltage and 0.35m process parameter, and also represent HSPICE simulation result.

Development of Optimized State Assignment Technique for Testing and Low Power (테스팅 및 저전력을 고려한 최적화된 상태할당 기술 개발)

  • Cho Sangwook;Yi Hyunbean;Park Sungju
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.1
    • /
    • pp.81-90
    • /
    • 2004
  • The state assignment for a finite state machine greatly affects the delay, area, power dissipation, and testabilities of the sequential circuits. In order to improve the testabilities and power consumption, a new state assignment technique . based on m-block partition is introduced in this paper. By the m-block partition algorithm, the dependencies among groups of state variables are minimized and switching activity is further reduced by assigning the codes of the states in the same group considering the state transition probability among the states. In the sequel the length and number of feedback cycles are reduced with minimal switching activity on state variables. It is inherently contradictory problem to optimize the testability and power consumption simultaneously, however our new state assignment technique is able to achieve high fault coverage with less number of scan nfp flops by reducing the number of feedback cycles while the power consumption is kept low upon the low switching activities among state variables. Experiment shows drastic improvement in testabilities and power dissipation for benchmark circuits.

Molecular Shuttle Memory System Based on Boron-Nitride Nanopeapod (질화붕소 나노피포드에 기반한 나노분자 메모리 시스템에 관한 연구)

  • Byun Ki Ryang;Kang Jeong Won;Choi Won Young;Hwang Ho Jung
    • Journal of the Korean Vacuum Society
    • /
    • v.14 no.1
    • /
    • pp.40-48
    • /
    • 2005
  • Bucky shuttle memory systems were investigated by the classical molecular dynamics(MD) simulations. Energetics and operating response of the shuttle-memory-elements u?ere examined by MD simulations of the C/sub 60/ shuttle in the nanomemory systems under various external force fields. Single-nanopeapod type was consisting of three fullerenes encapsulated in (10, 10) boron-nitride nanotube and filled Cu electrode. Studied systems could be applied to nonvolatile memory. MD simulation results showed that the stable bit flops could be achieved from the external force fields of 0.1 eV/Å for single-nanopeapod type.

High Performance Coprocessor Architecture for Real-Time Dense Disparity Map (실시간 Dense Disparity Map 추출을 위한 고성능 가속기 구조 설계)

  • Kim, Cheong-Ghil;Srini, Vason P.;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.14A no.5
    • /
    • pp.301-308
    • /
    • 2007
  • This paper proposes high performance coprocessor architecture for real time dense disparity computation based on a phase-based binocular stereo matching technique called local weighted phase-correlation(LWPC). The algorithm combines the robustness of wavelet based phase difference methods and the basic control strategy of phase correlation methods, which consists of 4 stages. For parallel and efficient hardware implementation, the proposed architecture employs SIMD(Single Instruction Multiple Data Stream) architecture for each functional stage and all stages work on pipelined mode. Such that the newly devised pipelined linear array processor is optimized for the case of row-column image processing eliminating the need for transposed memory while preserving generality and high throughput. The proposed architecture is implemented with Xilinx HDL tool and the required hardware resources are calculated in terms of look up tables, flip flops, slices, and the amount of memory. The result shows the possibility that the proposed architecture can be integrated into one chip while maintaining the processing speed at video rate.

Design and FPGA Implementation of FBMC Transmitter by using Clock Gating Technique based QAM, Inverse FFT and Filter Bank for Low Power and High Speed Applications

  • Sivakumar, M.;Omkumar, S.
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.6
    • /
    • pp.2479-2484
    • /
    • 2018
  • The filter bank multicarrier modulation (FBMC) technique is one of multicarrier modulation technique (MCM), which is mainly used to improve channel capacity of cognitive radio (CR) network and frequency spectrum access technique. The existing FBMC System contains serial to parallel converter, normal QAM modulation, Radix2 inverse FFT, parallel to serial converter and poly phase filter. It needs high area, delay and power consumption. To further reduce the area, delay and power of FBMC structure, a new clock gating technique is applied in the QAM modulation, radix2 multipath delay commutator (R2MDC) based inverse FFT and unified addition and subtraction (UAS) based FIR filter with parallel asynchronous self time adder (PASTA). The clock gating technique is mainly used to reduce the unwanted clock switching activity. The clock gating is nothing but clock signal of flip-flops is controlled by gate (i.e.) AND gate. Hence speed is high and power consumption is low. The comparison between existing QAM and proposed QAM with clock gating technique is carried out to analyze the results. Conversely, the proposed inverse R2MDC FFT with clock gating technique is compared with the existing radix2 inverse FFT. Also the comparison between existing poly phase filter and proposed UAS based FIR filter with PASTA adder is carried out to analyze the performance, area and power consumption individually. The proposed FBMC with clock gating technique offers low power and high speed than the existing FBMC structures.