• Title/Summary/Keyword: implementation algorithm

Search Result 4,233, Processing Time 0.03 seconds

A Lightweight Hardware Accelerator for Public-Key Cryptography (공개키 암호 구현을 위한 경량 하드웨어 가속기)

  • Sung, Byung-Yoon;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1609-1617
    • /
    • 2019
  • Described in this paper is a design of hardware accelerator for implementing public-key cryptographic protocols (PKCPs) based on Elliptic Curve Cryptography (ECC) and RSA. It supports five elliptic curves (ECs) over GF(p) and three key lengths of RSA that are defined by NIST standard. It was designed to support four point operations over ECs and six modular arithmetic operations, making it suitable for hardware implementation of ECC- and RSA-based PKCPs. In order to achieve small-area implementation, a finite field arithmetic circuit was designed with 32-bit data-path, and it adopted word-based Montgomery multiplication algorithm, the Jacobian coordinate system for EC point operations, and the Fermat's little theorem for modular multiplicative inverse. The hardware operation was verified with FPGA device by implementing EC-DH key exchange protocol and RSA operations. It occupied 20,800 gate equivalents and 28 kbits of RAM at 50 MHz clock frequency with 180-nm CMOS cell library, and 1,503 slices and 2 BRAMs in Virtex-5 FPGA device.

Implementation of FPGA-based Accelerator for GRU Inference with Structured Compression (구조적 압축을 통한 FPGA 기반 GRU 추론 가속기 설계)

  • Chae, Byeong-Cheol
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.850-858
    • /
    • 2022
  • To deploy Gate Recurrent Units (GRU) on resource-constrained embedded devices, this paper presents a reconfigurable FPGA-based GRU accelerator that enables structured compression. Firstly, a dense GRU model is significantly reduced in size by hybrid quantization and structured top-k pruning. Secondly, the energy consumption on external memory access is greatly reduced by the proposed reuse computing pattern. Finally, the accelerator can handle a structured sparse model that benefits from the algorithm-hardware co-design workflows. Moreover, inference tasks can be flexibly performed using all functional dimensions, sequence length, and number of layers. Implemented on the Intel DE1-SoC FPGA, the proposed accelerator achieves 45.01 GOPs in a structured sparse GRU network without batching. Compared to the implementation of CPU and GPU, low-cost FPGA accelerator achieves 57 and 30x improvements in latency, 300 and 23.44x improvements in energy efficiency, respectively. Thus, the proposed accelerator is utilized as an early study of real-time embedded applications, demonstrating the potential for further development in the future.

Time Domain Response of Random Electromagnetic Signals for Electromagnetic Topology Analysis Technique

  • Han, Jung-hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.135-144
    • /
    • 2022
  • Electromagnetic topology (EMT) technique is a method to analyze each component of the electromagnetic propagation environment and combine them in the form of a network in order to effectively model the complex propagation environment. In a typical commercial communication channel model, since the propagation environment is complex and difficult to predict, a probabilistic propagation channel model that utilizes an average solution, although with low accuracy, is used. However, modeling techniques using EMT technique are considered for application of propagation and coupling analysis of threat electromagnetic waves such as electromagnetic pulses, radio wave models used in electronic warfare, local communication channel models used in 5G and 6G communications that require relatively high accuracy electromagnetic wave propagation characteristics. This paper describes the effective implementation method, algorithm, and program implementation of the electromagnetic topology (EMT) method analyzed in the frequency domain. Also, a method of deriving a response in the time domain to an arbitrary applied signal source with respect to the EMT analysis result in the frequency domain will be discussed.

Implementation of an alarm system with AI image processing to detect whether a helmet is worn or not and a fall accident (헬멧 착용 여부 및 쓰러짐 사고 감지를 위한 AI 영상처리와 알람 시스템의 구현)

  • Yong-Hwa Jo;Hyuek-Jae Lee
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.23 no.3
    • /
    • pp.150-159
    • /
    • 2022
  • This paper presents an implementation of detecting whether a helmet is worn and there is a fall accident through individual image analysis in real-time from extracting the image objects of several workers active in the industrial field. In order to detect image objects of workers, YOLO, a deep learning-based computer vision model, was used, and for whether a helmet is worn or not, the extracted images with 5,000 different helmet learning data images were applied. For whether a fall accident occurred, the position of the head was checked using the Pose real-time body tracking algorithm of Mediapipe, and the movement speed was calculated to determine whether the person fell. In addition, to give reliability to the result of a falling accident, a method to infer the posture of an object by obtaining the size of YOLO's bounding box was proposed and implemented. Finally, Telegram API Bot and Firebase DB server were implemented for notification service to administrators.

How Supernovae Ejecta Is Transported In A Galaxy: DependenceOn Hydrodynamic Schemes In Numerical Simulations

  • Shin, Eun-jin;Kim, Ji-hoon
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.48.4-48.4
    • /
    • 2019
  • We studied the metal-distribution of isolated Milky-way mass galaxy using various hydrodynamic solvers and investigated the difference of the result between AMR and SPH codes. In particle-based codes, physical quantities like mass or metallicity defined in each particle are conserved unless being injected explicitly by the effect of the supernova, whereas in the Eulerian codes the diffusion is simply accomplished by hydro-equation. Therefore, without including explicit physics of diffusion on the SPH- codes, the metal mixing in the galaxy or CGM only can be accomplished by the direct motion of the particles, however, the standard-SPH codes depress the instability of the turbulent fluid mixing. In this work, we simulated under common initial conditions, common gas-physics like cooling-heating models, and star-formation feedback using ENZO(AMR) GIZMO and GADGET-2 codes. We additionally included a metal-diffusion algorithm on the SPH-codes, which follows the subgrid-turbulent mixing model investigated by Shen et al. (2010) and compared the effect of the metal-outflow on the halo region of the galaxy in different hydro-solvers. We also found that for the implementation of the diffusion scheme in the SPH-codes, the existence of a sufficient number of the gas-particles, which is the carrier of the metals, is necessary. So we tested a new initial condition for proper implementation of the diffusion scheme on the SPH simulations. By comparing the metal-contamination of the circumgalactic medium with different hydrodynamics models, we quantify the diffusion strength of AMR codes using diffusion parameterization of the SPH codes and also suggest the calibration solutions in the different behavior of codes in metal-outflow.

  • PDF

Design and Implementation of Automotive Intrusion Detection System Using Ultra-Lightweight Convolutional Neural Network (초경량 Convolutional Neural Network를 이용한 차량용 Intrusion Detection System의 설계 및 구현)

  • Myeongjin Lee;Hyungchul Im;Minseok Choi;Minjae Cha;Seongsoo Lee
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.524-530
    • /
    • 2023
  • This paper proposes an efficient algorithm to detect CAN (Controller Area Network) bus attack based on a lightweight CNN (Convolutional Neural Network), and an IDS(Intrusion Detection System) was designed, implemented, and verified with FPGA. Compared to conventional CNN-based IDS, the proposed IDS detects CAN bus attack on a frame-by-frame basis, enabling accurate and rapid response. Furthermore, the proposed IDS can significantly reduce hardware since it exploits only one convolutional layer, compared to conventional CNN-based IDS. Simulation and implementation results show that the proposed IDS effectively detects various attacks on the CAN bus.

Real-Time Implementation of Active Classification Using Cumulative Processing (누적처리기법을 이용한 능동표적식별 시스템의 실시간 구현)

  • Park, Gyu-Tae;Bae, Eun-Hyon;Lee, Kyun-Kyung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.2
    • /
    • pp.87-94
    • /
    • 2007
  • In active sonar system, aspect angle and length of a target can be estimated by calculating the cross-correlation between left and right split-beams of a LFM(Linear Frequency Modulated) signal. However, high-resolution performances in bearing and range are required to estimate the information of a remote target. Because a certain higher sampling frequency than the Nyquist sampling frequency is required in this performance, an over-sampling process through interpolation method should be required. However, real-time implementation of split-beam processing with over-sampled split-beam outputs on a COTS(commercial off-the-shelf) DSP platform limits its performance because of given throughput and memory capacity. This paper proposes a cumulative processing algorithm for split-beam processing to solve the problems. The performance of the proposed method was verified through some simulation tests. Also, the proposed method was implemented as a real-time system using an ADSP-TS101.

Montgomery Multiplier with Very Regular Behavior

  • Yoo-Jin Baek
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.1
    • /
    • pp.17-28
    • /
    • 2024
  • As listed as one of the most important requirements for Post-Quantum Cryptography standardization process by National Institute of Standards and Technology, the resistance to various side-channel attacks is considered very critical in deploying cryptosystems in practice. In fact, cryptosystems can easily be broken by side-channel attacks, even though they are considered to be secure in the mathematical point of view. The timing attack(TA) and the simple power analysis attack(SPA) are such side-channel attack methods which can reveal sensitive information by analyzing the timing behavior or the power consumption pattern of cryptographic operations. Thus, appropriate measures against such attacks must carefully be considered in the early stage of cryptosystem's implementation process. The Montgomery multiplier is a commonly used and classical gadget in implementing big-number-based cryptosystems including RSA and ECC. And, as recently proposed as an alternative of building blocks for implementing post quantum cryptography such as lattice-based cryptography, the big-number multiplier including the Montgomery multiplier still plays a role in modern cryptography. However, in spite of its effectiveness and wide-adoption, the multiplier is known to be vulnerable to TA and SPA. And this paper proposes a new countermeasure for the Montgomery multiplier against TA and SPA. Briefly speaking, the new measure first represents a multiplication operand without 0 digits, so the resulting multiplication operation behaves in a very regular manner. Also, the new algorithm removes the extra final reduction (which is intrinsic to the modular multiplication) to make the resulting multiplier more timing-independent. Consequently, the resulting multiplier operates in constant time so that it totally removes any TA and SPA vulnerabilities. Since the proposed method can process multi bits at a time, implementers can also trade-off the performance with the resource usage to get desirable implementation characteristics.

Implementation of Parallel Local Alignment Method for DNA Sequence using Apache Spark (Apache Spark을 이용한 병렬 DNA 시퀀스 지역 정렬 기법 구현)

  • Kim, Bosung;Kim, Jinsu;Choi, Dojin;Kim, Sangsoo;Song, Seokil
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.10
    • /
    • pp.608-616
    • /
    • 2016
  • The Smith-Watrman (SW) algorithm is a local alignment algorithm which is one of important operations in DNA sequence analysis. The SW algorithm finds the optimal local alignment with respect to the scoring system being used, but it has a problem to demand long execution time. To solve the problem of SW, some methods to perform SW in distributed and parallel manner have been proposed. The ADAM which is a distributed and parallel processing framework for DNA sequence has parallel SW. However, the parallel SW of the ADAM does not consider that the SW is a dynamic programming method, so the parallel SW of the ADAM has the limit of its performance. In this paper, we propose a method to enhance the parallel SW of ADAM. The proposed parallel SW (PSW) is performed in two phases. In the first phase, the PSW splits a DNA sequence into the number of partitions and assigns them to multiple nodes. Then, the original Smith-Waterman algorithm is performed in parallel at each node. In the second phase, the PSW estimates the portion of data sequence that should be recalculated, and the recalculation is performed on the portions in parallel at each node. In the experiment, we compare the proposed PSW to the parallel SW of the ADAM to show the superiority of the PSW.

An Implementation of 3D Graphic Accelerator for Phong Shading (퐁 음영법을 위한 3차원 그래픽 가속기의 구현)

  • Lee, Hyung;Park, Youn-Ok;Park, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.5
    • /
    • pp.526-534
    • /
    • 2000
  • There have been many researches on the 3D graphic accelerator for high speed by needs of CAD/CAM,3D modeling, virtual reality or medical image. In this paper, an SIMD processor architecture for 3D graphic accelerator is proposed in order to improve the processing time of the 3D graphics, and a parallel Phong shading algorithm is presented to estimate performance of the proposed architecture. The proposed SIMD processor architecture for 3D graphic accelerator consists of PCI local bus interface, 16 Processing Elements (PE's), and Park's multi-access memory system (NAMS) that has 17 memory modules. A serial algorithm for Phong shading is modified for the architecture and the main key is to divide a polygon into $4\times{4}$ squares. And, for processing a square, 4 PE's are regarded as a PE Grou logically. Since MAMS can support block access type with interval 1, it is possible that 4 PE Groups process a square at a time. In consequence, 16 pixels are processed simultaneously. The proposed SIMD processor architecture is simulated by CADENCE Verilog-XL that is a package for the hardware simulation. With the same simulated results as that of the serial algorithm, the speed enhancement by the parallel algorithm to the serial one is 5.68.

  • PDF