• Title/Summary/Keyword: Efficient implementation

Search Result 2,934, Processing Time 0.03 seconds

Efficient FFT Algorithm and Hardware Implementation for High Speed Multimedia Communication Systems (고속 멀티미디어 통신시스템을 위한 효율적인 FFT 알고리즘 및 하드웨어 구현)

  • 정윤호;김재석
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.3
    • /
    • pp.55-64
    • /
    • 2004
  • In this paper, we propose an efficient FFT algorithm for high speed multimedia communication systems, and present its pipeline implementation results. Since the proposed algorithm is based on the radix-4 butterfly unit, the processing rate can be twice as fast as that based on the radix-2$^3$ algorithm. Also, its implementation is more area-efficient than the implementation from conventional radix-4 algorithm due to reduced number of nontrivial multipliers like using the radix-23 algorithm. In order to compare the proposed algorithm with the conventional radix-4 algorithm, the 64-point MDC pipelined FFT processor based on the proposed algorithm was implemented. After the logic synthesis using 0.6${\mu}{\textrm}{m}$ technology, the logic gate count for the processor with the proposed algorithm is only about 70% of that for the processor with the conventional radix-4 algorithm. Since the proposed algorithm can be achieve higher processing rate and better efficiency than the conventional algorithm, it is very suitable for the high speed multimedia communication systems such as WLAN, DAB, DVB, and ADSL/VDSL systems.

Optimization Study of Toom-Cook Algorithm in NIST PQC SABER Utilizing ARM/NEON Processor (ARM/NEON 프로세서를 활용한 NIST PQC SABER에서 Toom-Cook 알고리즘 최적화 구현 연구)

  • Song, JinGyo;Kim, YoungBeom;Seo, Seog Chung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.463-471
    • /
    • 2021
  • Since 2016, National Institute of Standards and Technology (NIST) has been conducting a post quantum cryptography standardization project in preparation for a quantum computing environment. Three rounds are currently in progress, and most of the candidates (5/7) are lattice-based. Lattice-based post quantum cryptography is evaluated to be applicable even in an embedded environment where resources are limited by providing efficient operation processing and appropriate key length. Among them, SABER KEM provides the efficient modulus and Toom-Cook to process polynomial multiplication with computation-intensive tasks. In this paper, we present the optimized implementation of evaluation and interpolation in Toom-Cook algorithm of SABER utilizing ARM/NEON in ARMv8-A platform. In the evaluation process, we propose an efficient interleaving method of ARM/NEON, and in the interpolation process, we introduce an optimized implementation methodology applicable in various embedded environments. As a result, the proposed implementation achieved 3.5 times faster performance in the evaluation process and 5 times faster in the interpolation process than the previous reference implementation.

A Novel Recursive Algorithm for Efficient ZF-OSIC Detection in a V-BLAST System

  • Yin, Zuo-Liang;Mao, Xing-Peng;Zhang, Qin-Yu;Zhang, Nai-Tong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.12
    • /
    • pp.2326-2339
    • /
    • 2011
  • To reduce the implementation complexity of the Vertical Bell Labs layered space-time (V-BLAST) systems with respect to the zero-forcing (ZF) criterion, a computationally efficient recursive algorithm is proposed. A fast implementation of the proposed algorithm is developed and its complexity is analyzed in detail. The proposed algorithm matches the ZF-OSIC detection well, and its three significant advantages can be demonstrated by analyses and simulations. Firstly, its speedups over the conventional ZF-OSIC with norm-based ordering, the original fast recursive algorithm (FRA) and the fastest known algorithm (FKA) in the number of flops are 1.58, 2.33 and 1.22, respectively. Secondly, a much simpler implementation than FRA and FKA can be expected. Finally, the storage requirements are lower than those of FRA and FKA. These advantages make the proposed algorithm more efficient and practical.

A Study of Implementing Efficient Rotation for ARX Lightweight Block Cipher on Low-level Microcontrollers (저사양 마이크로 컨트롤러에서 ARX 경량 암호를 위한 효율적인 Rotation 구현 방법 연구)

  • Kim, Minwoo;Kwon, Taekyoung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.3
    • /
    • pp.623-630
    • /
    • 2016
  • Heterogeneous IoT devices must satisfy a certain level of security for mutual connections and communications. However, a performance degradation of cryptographic algorithms in resource constrained devices is inevitable and so an optimization or efficient implementation method is necessary. In this paper, we study an efficient implementation method for rotation operations regarding registers for running ARX lightweight block ciphers. In a practical sense, we investigate the performance of modified rotation operations through experiments using real experiment devices. We show the improved performance of modified rotation operations and discover the significant difference in measured performance between simulations and real experiments, particularly for 16-bit MSP microcontrollers.

Design and Implementation of a Latency Efficient Encoder for LTE Systems

  • Hwang, Soo-Yun;Kim, Dae-Ho;Jhang, Kyoung-Son
    • ETRI Journal
    • /
    • v.32 no.4
    • /
    • pp.493-502
    • /
    • 2010
  • The operation time of an encoder is one of the critical implementation issues for satisfying the timing requirements of Long Term Evolution (LTE) systems because the encoder is based on binary operations. In this paper, we propose a design and implementation of a latency efficient encoder for LTE systems. By virtue of 8-bit parallel processing of the cyclic redundancy checking attachment, code block (CB) segmentation, and a parallel processor, we are able to construct engines for turbo codings and rate matchings of each CB in a parallel fashion. Experimental results illustrate that although the total area and clock period of the proposed scheme are 19% and 6% larger than those of a conventional method based on a serial scheme, respectively, our parallel structure decreases the latency by about 32% to 65% compared with a serial structure. In particular, our approach is more latency efficient when the encoder processes a number of CBs. In addition, we apply the proposed scheme to a real system based on LTE, so that the timing requirement for ACK/NACK transmission is met by employing the encoder based on the parallel structure.

An Efficient Hardware Implementation of Block Cipher Algorithm LEA (블록암호 알고리듬 LEA의 효율적인 하드웨어 구현)

  • Sung, Mi-ji;Park, Jang-nyeong;Shin, Kyung-wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.777-779
    • /
    • 2014
  • The LEA(Lightweight Encryption Algorithm) is a 128-bit high-speed/lightweight block cipher algorithm developed by National Security Research Institute(NSRI) in 2012. The LEA encrypts plain text of 128-bit using cipher key of 128/192/256-bit, and produces cipher text of 128-bit, and vice versa. To reduce hardware complexity, we propose an efficient architecture which shares hardware resources for encryption and decryption in round transformation block. Hardware sharing technique for key scheduler was also devised to achieve area-efficient and low-power implementation. The designed LEA cryptographic processor was verified by using FPGA implementation.

  • PDF

Analysis of Sorting Algorithm for Efficient Hardware Implementation (효율적인 하드웨어 구현을 위한 정렬 알고리즘에 대한 분석)

  • Kim, Han Kyeol;Kang, Bongsoon
    • Journal of IKEEE
    • /
    • v.23 no.3
    • /
    • pp.978-983
    • /
    • 2019
  • Under the influence of Autonomous Driving and AI, it is important to accurately recognize and judge objects through cameras. In particular, since a method of recognizing an object using a camera can obtain a large amount of information visually compared to other methods, many image signal processing methods have been studied to extract an accurate image. In addition, a lot of research is being carried out to implementation about hardware. In this work, we compare the principles and characteristics of the sorting algorithms that are frequently used in image signal processing and summarize the performance evaluation. Based on this, we define an efficient algorithm when implemented in hardware among efficient sorting algorithms.

Efficient LDPC coding using a hybrid H-matrix

  • Kim Tae Jin;Lee Chan Ho;Yeo Soon Il;Roh Tae Moon
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.473-476
    • /
    • 2004
  • Low-Density Parity-Check (LDPC) codes are recently emerged due to its excellent performance to use. However, the parity check matrices (H) of the previous works are not adequate for hardware implementation of encoders or decoders. This paper proposes a hybrid parity check matrix for partially parallel decoder structures, which is efficient in hardware implementation of both decoders and encoders. Using proposed methods, the encoding design can become practical while keeping the hardware complexity of partially parallel decoder structures.

  • PDF

Implementation of CNN in the view of mini-batch DNN training for efficient second order optimization (효과적인 2차 최적화 적용을 위한 Minibatch 단위 DNN 훈련 관점에서의 CNN 구현)

  • Song, Hwa Jeon;Jung, Ho Young;Park, Jeon Gue
    • Phonetics and Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.23-30
    • /
    • 2016
  • This paper describes some implementation schemes of CNN in view of mini-batch DNN training for efficient second order optimization. This uses same procedure updating parameters of DNN to train parameters of CNN by simply arranging an input image as a sequence of local patches, which is actually equivalent with mini-batch DNN training. Through this conversion, second order optimization providing higher performance can be simply conducted to train the parameters of CNN. In both results of image recognition on MNIST DB and syllable automatic speech recognition, our proposed scheme for CNN implementation shows better performance than one based on DNN.

An Efficient Multiprocessor Implementation of Digital Filtering Algorithms (다중 프로세서 시스템을 이용한 디지털 필터링 알고리즘의 효율적 구현)

  • Won Yong Sung
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.28B no.5
    • /
    • pp.343-356
    • /
    • 1991
  • An efficient real-time implementation of digital filtering algorithms using a multiprocessor system in a ring network is investigated. The development time and cost for implementing a high speed signal processing system can be considerably reduced because algorithm are implemented in software using commercially available digital signal processors. This method is based on a parallel block processing approach, where a continuously supplied input data is divided into blocks, and the blocks are processed concurrently by being assigned to each processor in the system. This approach not only requires a simple interconnection network but also reduces the number of communications among the processors very much. The data dependency of the blocks to be processed concurrently brings on dependency problems between the processors in the system. A systematic scheduling method has been developed by using a processors which can be used efficiently, the methods for solving dependency problems between the processors are investigated. Implementation procedures and results for FIR, recursive (IIR), and adaptive filtering algorithms are illustrated.

  • PDF