• Title/Summary/Keyword: 병렬 연산 처리

Search Result 554, Processing Time 0.029 seconds

A Design of AES-based CCMP Core for IEEE 802.11i Wireless LAN Security (IEEE 802.11i 무선 랜 보안을 위한 AES 기반 CCMP Core 설계)

  • Hwang Seok-Ki;Lee Jin-Woo;Kim Chay-Hyeun;Song You-Soo;Shin Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.4
    • /
    • pp.798-803
    • /
    • 2005
  • This paper describes a design of AES(Advanced Encryption Standard)-based CCMP core for IEEE 802.1li wireless LAN security. To maximize its performance, two AES cores ate used, one is for counter mode for data confidentiality and the other is for CBC(Cipher Block Chaining) mode for authentication and data integrity. The S-box that requires the largest hardware in AES core is implemented using composite field arithmetic, and the gate count is reduced by about $20\%$ compared with conventional LUT(Lookup Table)-based design. The CCMP core designed in Verilog-HDL has 13,360 gates, and the estimated throughput is about 168 Mbps at 54-MHz clock frequency. The functionality of the CCMP core is verified by Excalibur SoC implementation.

Research on efficient HW/SW co-design method of light-weight cryptography using GEZEL (경량화 암호의 GEZEL을 이용한 효율적인 하드웨어/소프트웨어 통합 설계 기법에 대한 연구)

  • Kim, Sung-Gon;Kim, Hyun-Min;Hong, Seok-Hie
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.4
    • /
    • pp.593-605
    • /
    • 2014
  • In this paper, we propose the efficient HW/SW co-design method of light-weight cryptography such as HIGHT, PRESENT and PRINTcipher using GEZEL. At first the symmetric cryptographic algorithms were designed using the GEZEL language which is efficiently used for HW/SW co-design. And for the improvement of performance the HW optimization theory such as unfolding, retiming and so forth were adapted to the cryptographic HW module conducted by FSMD. Also, the operation modes of those algorithms were implemented using C language in 8051 microprocessor, it can be compatible to various platforms. For providing reliable communication between HW/SW and preventing the time delay the improved handshake protocol was chosen for enhancing the performance of the connection between HW/SW. The improved protocol can process the communication-core and cryptography-core on the HW in parallel so that the messages can be transmitted to SW after HW operation and received from SW during encryption operation.

Development and Speed Comparison of Convolutional Neural Network Using CUDA (CUDA를 이용한 Convolutional Neural Network의 구현 및 속도 비교)

  • Ki, Cheol-min;Cho, Tai-Hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.335-338
    • /
    • 2017
  • Currently Artificial Inteligence and Deep Learning are social issues, and These technologies are applied to various fields. A good method among the various algorithms in Artificial Inteligence is Convolutional Neural Network. Convolutional Neural Network is a form that adds convolution layers that extracts features by convolution operation on a general neural network method. If you use Convolutional Neural Network as small amount of data, or if the structure of layers is not complicated, you don't have to pay attention to speed. But the learning time is long as the size of the learning data is large and the structure of layers is complicated. So, GPU-based parallel processing is a lot. In this paper, we developed Convolutional Neural Network using CUDA and Learning speed is faster and more efficient than the method using the CPU.

  • PDF

A Design of AES-based CCMP core for IEEE 802.11i Wireless LAN Security (IEEE 802.11i 무선 랜 보안을 위한 AES 기반 CCMP 코어 설계)

  • Hwang Seok-Ki;Kim Jong-Whan;Shin Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.6A
    • /
    • pp.640-647
    • /
    • 2006
  • This paper describes a design of AES-based CCMP(Counter mode with CBC-MAC Protocol) core for IEEE 802.11i wireless LAN security. To maximize the performance of CCMP core, two AES cores are used, one is the counter mode for data confidentiality and the other is the CBC node for authentication and data integrity. The S-box that requires the largest hardware in ARS core is implemented using composite field arithmetic, and the gate count is reduced by about 27% compared with conventional LUT(Lookup Table)-based design. The CCMP core was verified using Excalibur SoC kit, and a MPW chip is fabricated using a 0.35-um CMOS standard cell technology. The test results show that all the function of the fabricated chip works correctly. The CCMP processor has 17,000 gates, and the estimated throughput is about 353-Mbps at 116-MHz@3.3V, satisfying 54-Mbps data rate of the IEEE 802.11a and 802.11g specifications.

A Design of AES-based CCMP Core for IEEE 802.11i Wireless LAN Security (IEEE 802.11i 무선 랜 보안을 위한 AES 기반 CCMP Core 설계)

  • Hwang, Seok-Ki;Lee, Jin-Woo;Kim, Chay-Hyeun;Song, You-Soo;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.367-370
    • /
    • 2005
  • This paper describes a design of AES(Advanced Encryption Standard)-based CCMP core for IEEE 802.11i wireless LAN security. To maximize its performance, two AES cores are used, one is for counter mode for data confidentiality and the other is for CBC(Cipher Block Chaining)mode for authentication and data integrity. The S-box that requires the largest hardware in AES core is implemented using composite field arithmetic, and the gate count is reduced by about 25% compared with conventional LUT(Lookup Table)-based design. The CCMP core designed in Verilog-HDL has 15,450 gates, and the estimated throughput is about 128 Mbps at 50-MHz clock frequency). The functionality of the CCMP core is verified by Excalibur SoC implementation.

  • PDF

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

  • Ahn, Saehyun;Kang, Suk-Ju
    • Journal of Broadcast Engineering
    • /
    • v.26 no.2
    • /
    • pp.167-174
    • /
    • 2021
  • Recently, deep learning technology is widely used in various computer vision applications, such as object recognition, classification, and image generation. In particular, the deep learning-based super-resolution has been gaining significant performance improvement. Fast super-resolution convolutional neural network (FSRCNN) is a well-known model as a deep learning-based super-resolution algorithm that output image is generated by a deconvolutional layer. In this paper, we propose an FPGA-based convolutional neural networks accelerator that considers parallel computing efficiency. In addition, the proposed method proposes Optimal-FSRCNN, which is modified the structure of FSRCNN. The number of multipliers is compressed by 3.47 times compared to FSRCNN. Moreover, PSNR has similar performance to FSRCNN. We developed a real-time image processing technology that implements on FPGA.

GLSL based Additional Learning Nearest Neighbor Algorithm suitable for Locating Unpaved Road (추가 학습이 빈번히 필요한 비포장도로에서 주행로 탐색에 적합한 GLSL 기반 ALNN Algorithm)

  • Ku, Bon Woo;Kim, Jun kyum;Rhee, Eun Joo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.1
    • /
    • pp.29-36
    • /
    • 2019
  • Unmanned Autonomous Vehicle's driving road in the national defense includes not only paved roads, but also unpaved roads which have rough and unexpected changes. This Unmanned Autonomous Vehicles monitor and recon rugged or remote areas, and defend own position, they frequently encounter environments roads of various and unpredictable. Thus, they need additional learning to drive in this environment, we propose a Additional Learning Nearest Neighbor (ALNN) which is modified from Approximate Nearest Neighbor to allow for quick learning while avoiding the 'Forgetting' problem. In addition, since the Execution speed of the ALNN algorithm decreases as the learning data accumulates, we also propose a solution to this problem using GPU parallel processing based on OpenGL Shader Language. The ALNN based on GPU algorithm can be used in the field of national defense and other similar fields, which require frequent and quick application of additional learning in real-time without affecting the existing learning data.

A Execution Performance Analysis of Applications using Multi-Process Service over GPU (다중 프로세스 서비스를 이용한 GPU 응용 동시 실행 성능 분석)

  • Kim, Se-Jin;Oh, Ji-Sun;Kim, Yoonhee
    • KNOM Review
    • /
    • v.22 no.1
    • /
    • pp.60-67
    • /
    • 2019
  • Graphical Processing Units(GPUs) achieve high performance undertaking from relatively uniformed computation in parallel. The technology related to General Purpose GPU(GPGPU) has been enhanced, which provides concurrent kernel execution of multi and diverse applications at the same time, but it is still limited to support resource sharing or planning. NVIDIA recently introduces Multi-Process Service(MPS), which allows kernels from different applications can be execute concurrently. However, the strength of MPS comes along with the characteristics of applications and the order of their execution. This paper shows the performance analysis of diverse scientific applications in real world. Based on the analysis, we prove that it is important to the identify characteristics of co-run applications, and to schedule multiple applications via profiling to maximize MPS functionality.

Efficiently Hybrid $MSK_k$ Method for Multiplication in $GF(2^n)$ ($GF(2^n)$ 곱셈을 위한 효율적인 $MSK_k$ 혼합 방법)

  • Ji, Sung-Yeon;Chang, Nam-Su;Kim, Chang-Han;Lim, Jong-In
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.9
    • /
    • pp.1-9
    • /
    • 2007
  • For an efficient implementation of cryptosystems based on arithmetic in a finite field $GF(2^n)$, their hardware implementation is an important research topic. To construct a multiplier with low area complexity, the divide-and-conquer technique such as the original Karatsuba-Ofman method and multi-segment Karatsuba methods is a useful method. Leone proposed an efficient parallel multiplier with low area complexity, and Ernst at al. proposed a multiplier of a multi-segment Karatsuba method. In [1], the authors proposed new $MSK_5$ and $MSK_7$ methods with low area complexity to improve Ernst's method. In [3], the authors proposed a method which combines $MSK_2$ and $MSK_3$. In this paper we propose an efficient multiplication method by combining $MSK_2,\;MSK_3\;and\;MSK_5$ together. The proposed method reduces $116{\cdot}3^l$ gates and $2T_X$ time delay compared with Gather's method at the degree $25{\cdot}2^l-2^l with l>0.

Memory Delay Comparison between 2D GPU and 3D GPU (2차원 구조 대비 3차원 구조 GPU의 메모리 접근 효율성 분석)

  • Jeon, Hyung-Gyu;Ahn, Jin-Woo;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.7
    • /
    • pp.1-11
    • /
    • 2012
  • As process technology scales down, the number of cores integrated into a processor increases dramatically, leading to significant performance improvement. Especially, the GPU(Graphics Processing Unit) containing many cores can provide high computational performance by maximizing the parallelism. In the GPU architecture, the access latency to the main memory becomes one of the major reasons restricting the performance improvement. In this work, we analyze the performance improvement of the 3D GPU architecture compared to the 2D GPU architecture quantitatively and investigate the potential problems of the 3D GPU architecture. In general, memory instructions account for 30% of total instructions, and global/local memory instructions constitutes 60% of total memory instructions. Therefore, the performance of the 3D GPU is expected to be improved significantly compared to the 2D GPU by reducing the delay of memory instructions. However, according to our experimental results, the 3D architecture improves the GPU performance only by 2% compared to the 2D architecture due to the memory bottleneck, since the performance reduction due to memory bottleneck in the 3D GPU architecture increases by 245% compared to the 2D architecture. This paper provides the guideline for suitable memory design by analyzing the efficiency of the memory architecture in 3D GPU architecture.