• Title/Summary/Keyword: FPGA design

Search Result 1,021, Processing Time 0.023 seconds

Design of Hardwired Variable Length Decoder for H.264/AVC (하드웨어 구조의 H.264/AVC 가변길이 복호기 설계)

  • Yu, Yong-Hoon;Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.11
    • /
    • pp.71-76
    • /
    • 2008
  • H.264(or MPEG-4/AVC pt.10) is a high performance video coding standard, and is widely used. Variable length code (VLC) of the H.264 standard compresses data using the statistical distribution of values. A decoder parses the compressed bit stream and searches decoded values in lookup tables, and the decoding process is not easy to implement by hardware. We propose an architecture of variable length decoder(VLD) for the H.264 baseline profile(BP) L4. The CAVLD decodes syntax elements using the combination of arithmetic units and lookup tables for the optimized hardware architecture. A barral shifter and a first 1's detector parse NAL bit stream, and are shared by Exp-Golomb decoder and CAVLD. A FIFO memory between CAVLD and the reorder unit and a buffer at the output of the reorder unit eliminate the bottleneck of data stream. The proposed VLD is designed using Verilog-HDL and is implemented using an FPGA. The synthesis result using a 0.18um standard CMOS technology shows that the gate count is 22,604 and the decoder can process HD($1920{\times}1080$) video at 120MHz.

A Design of PRESENT Crypto-Processor Supporting ECB/CBC/OFB/CTR Modes of Operation and Key Lengths of 80/128-bit (ECB/CBC/OFB/CTR 운영모드와 80/128-비트 키 길이를 지원하는 PRESENT 암호 프로세서 설계)

  • Kim, Ki-Bbeum;Cho, Wook-Lae;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.6
    • /
    • pp.1163-1170
    • /
    • 2016
  • A hardware implementation of ultra-lightweight block cipher algorithm PRESENT which was specified as a standard for lightweight cryptography ISO/IEC 29192-2 is described. The PRESENT crypto-processor supports two key lengths of 80 and 128 bits, as well as four modes of operation including ECB, CBC, OFB, and CTR. The PRESENT crypto-processor has on-the-fly key scheduler with master key register, and it can process consecutive blocks of plaintext/ciphertext without reloading master key. In order to achieve a lightweight implementation, the key scheduler was optimized to share circuits for key lengths of 80 bits and 128 bits. The round block was designed with a data-path of 64 bits, so that one round transformation for encryption/decryption is processed in a clock cycle. The PRESENT crypto-processor was verified using Virtex5 FPGA device. The crypto-processor that was synthesized using a $0.18{\mu}m$ CMOS cell library has 8,100 gate equivalents(GE), and the estimated throughput is about 908 Mbps with a maximum operating clock frequency of 454 MHz.

A small-area implementation of public-key cryptographic processor for 224-bit elliptic curves over prime field (224-비트 소수체 타원곡선을 지원하는 공개키 암호 프로세서의 저면적 구현)

  • Park, Byung-Gwan;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.6
    • /
    • pp.1083-1091
    • /
    • 2017
  • This paper describes a design of cryptographic processor supporting 224-bit elliptic curves over prime field defined by NIST. Scalar point multiplication that is a core arithmetic function in elliptic curve cryptography(ECC) was implemented by adopting the modified Montgomery ladder algorithm. In order to eliminate division operations that have high computational complexity, projective coordinate was used to implement point addition and point doubling operations, which uses addition, subtraction, multiplication and squaring operations over GF(p). The final result of the scalar point multiplication is converted to affine coordinate and the inverse operation is implemented using Fermat's little theorem. The ECC processor was verified by FPGA implementation using Virtex5 device. The ECC processor synthesized using a 0.18 um CMOS cell library occupies 2.7-Kbit RAM and 27,739 gate equivalents (GEs), and the estimated maximum clock frequency is 71 MHz. One scalar point multiplication takes 1,326,985 clock cycles resulting in the computation time of 18.7 msec at the maximum clock frequency.

A Hardware Design of Ultra-Lightweight Block Cipher Algorithm PRESENT for IoT Applications (IoT 응용을 위한 초경량 블록 암호 알고리듬 PRESENT의 하드웨어 설계)

  • Cho, Wook-Lae;Kim, Ki-Bbeum;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.7
    • /
    • pp.1296-1302
    • /
    • 2016
  • A hardware implementation of ultra-lightweight block cipher algorithm PRESENT that was specified as a block cipher standard for lightweight cryptography ISO/IEC 29192-2 is described in this paper. Two types of crypto-core that support master key size of 80-bit are designed, one is for encryption-only function, and the other is for encryption and decryption functions. The designed PR80 crypto-cores implement the basic cipher mode of operation ECB (electronic code book), and it can process consecutive blocks of plaintext/ciphertext without reloading master key. The PR80 crypto-cores were designed in soft IP with Verilog HDL, and they were verified using Virtex5 FPGA device. The synthesis results using $0.18{\mu}m$ CMOS cell library show that the encryption-only core has 2,990 GE and the encryption/decryption core has 3,687 GE, so they are very suitable for IoT security applications requiring small gate count. The estimated maximum clock frequency is 500 MHz for the encryption-only core and 444 MHz for the encryption/decryption core.

Design of a Binding for the performance Improvement of 3D Engine based on the Embedded Mobile Java Environment (자바 기반 휴대용 임베디드 기기의 삼차원 엔진 성능 향상을 위한 바인딩 구현)

  • Kim, Young-Ouk;Roh, Young-Sup
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.11
    • /
    • pp.1460-1471
    • /
    • 2007
  • A 3-Dimensional engine in a mobile embedded device is divided into a C-based OpenGL/ES and a Java-based JSR184 which interprets and executes a byte code in a real-time. In these two standards, the JSR184 supporting Java objects uses more processor resources than an OpenGL/ES and thus has a constraint when it is used in an embedded device with a limited computing power. On the other hand, 3-Dimensional contents employed in existing personal computer are created by utilizing advantages of Java and secured numerous users in European market, due to the good quality in contents and extensive service in a commercial network, GSM. Because of the reason, a mobile embedded device used in a GSM network needs a JSR184 which can provide an existing Java-based 3-Dimensional contents without extra conversion processes, but the current version of Java-based 3-Dimensional engine has drawbacks in application to commercial products because it requires more computing power than the mobile embedded device. This paper proposes a binding technique with the advantages of Java objects to improve a processing speed of 3-Dimensional contents in limited resources of a mobile embedded device. The technique supports a JSR184 standard interface in the upper layer to utilize 3-Dimensional contents using Java, employs a different code-conversion language, KNI(Kilo Native Interface), in the middle layer to interface between OpenGL/ES and JSR184, and embodies an OpenGL/ES standard in the lower layer. The validity of the binding technique is demonstrated through a simulator and a FPGA embedding an ARM.

  • PDF

Low-power FFT/IFFT Processor for Wireless LAN Modem (무선 랜 모뎀용 저전력 FFT/IFFT프로세서 설계)

  • Shin Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.11A
    • /
    • pp.1263-1270
    • /
    • 2004
  • A low-power 64-point FFT/IFFT processor core is designed, which is an essential block in OFDM-based wireless LAM modems. The radix-2/418 DIF (Decimation-ln-Frequency) FFT algorithm is implemented using R2SDF (Radix-2 Single-path Delay Feedback) structure. Some design techniques for low-power implementation are considered from algorithm level to circuit level. Based on the analysis on infernal data flow, some unnecessary switching activities have been eliminated to minimize power dissipation. In circuit level, constant multipliers and complex-number multiplier in data-path are designed using truncation structure to reduce gate counts and power dissipation. The 64-point FFT/IFFT core designed in Verilog-HDL has about 28,100 gates, and timing simulation results using gate-level netlist with extracted SDF data show that it can safely operate up to 50-MHz@2.5-V, resulting that a 64-point FFT/IFFT can be computed every 1.3-${\mu}\textrm{s}$. The functionality of the core was fully verified by FPGA implementation using various test vectors. The average SQNR of over 50-dB is achieved, and the average power consumption is about 69.3-mW with 50-MHz@2.5-V.

A Unified ARIA-AES Cryptographic Processor Supporting Four Modes of Operation and 128/256-bit Key Lengths (4가지 운영모드와 128/256-비트 키 길이를 지원하는 ARIA-AES 통합 암호 프로세서)

  • Kim, Ki-Bbeum;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.4
    • /
    • pp.795-803
    • /
    • 2017
  • This paper describes a dual-standard cryptographic processor that efficiently integrates two block ciphers ARIA and AES into a unified hardware. The ARIA-AES crypto-processor was designed to support 128-b and 256-b key sizes, as well as four modes of operation including ECB, CBC, OFB, and CTR. Based on the common characteristics of ARIA and AES algorithms, our design was optimized by sharing hardware resources in substitution layer and in diffusion layer. It has on-the-fly key scheduler to process consecutive blocks of plaintext/ciphertext without reloading key. The ARIA-AES crypto-processor that was implemented with a $0.18{\mu}m$ CMOS cell library occupies 54,658 gate equivalents (GEs), and it can operate up to 95 MHz clock frequency. The estimated throughputs at 80 MHz clock frequency are 787 Mbps, 602 Mbps for ARIA with key size of 128-b, 256-b, respectively. In AES mode, it has throughputs of 930 Mbps, 682 Mbps for key size of 128-b, 256-b, respectively. The dual-standard crypto-processor was verified by FPGA implementation using Virtex5 device.

A Study on the VLSI Design of Efficient Color Interpolation Technique Using Spatial Correlation for CCD/CMOS Image Sensor (화소 간 상관관계를 이용한 CCD/CMOS 이미지 센서용 색 보간 기법 및 VLSI 설계에 관한 연구)

  • Lee, Won-Jae;Lee, Seong-Joo;Kim, Jae-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.11 s.353
    • /
    • pp.26-36
    • /
    • 2006
  • In this paper, we propose a cost-effective color filter may (CFA) demosaicing method for digital still cameras in which a single CCD or CMOS image sensor is used. Since a CFA is adopted, we must interpolate missing color values in the red, green and blue channels at each pixel location. While most state-of-the-art algorithms invest a great deal of computational effort in the enhancement of the reconstructed image to overcome the color artifacts, we focus on eliminating the color artifacts with low computational complexity. Using spatial correlation of the adjacent pixels, the edge-directional information of the neighbor pixels is used for determining the edge direction of the current pixel. We apply our method to the state-of-the-art algorithms which use edge-directed methods to interpolate the missing color channels. The experiment results show that the proposed method enhances the demosaiced image qualify from $0.09{\sim}0.47dB$ in PSNR depending on the basis algorithm by removing most of the color artifacts. The proposed method was implemented and verified successfully using verilog HDL and FPGA. It was synthesized to gate-level circuits using 0.25um CMOS standard cell library. The total logic gate count is 12K, and five line memories are used.

Design and Optimization of Mu1ti-codec Video Decoder using ASIP (ASIP를 이용한 다중 비디오 복호화기 설계 및 최적화)

  • Ahn, Yong-Jo;Kang, Dae-Beom;Jo, Hyun-Ho;Ji, Bong-Il;Sim, Dong-Gyu;Eum, Nak-Woong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.1
    • /
    • pp.116-126
    • /
    • 2011
  • In this paper, we present a multi-media processor which can decode multiple-format video standards. The designed processor is evaluated with optimized MPEG-2, MPEG-4, and AVS (Audio video standard). There are two approaches for developing of real-time video decoders. First, hardware-based system is much superior to a processor-based one in execution time. However, it takes long time to implement and modify hardware systems. On the contrary, the software-based video codecs can be easily implemented and flexible, however, their performance is not so good for real-time applications. In this paper, in order to exploit benefits related to two approaches, we designed a processor called ASIP(Application specific instruction-set processor) for video decoding. In our work, we extracted eight common modules from various video decoders, and added several multimedia instructions to the processor. The developed processor for video decoders is evaluated with the Synopsys platform simulator and a FPGA board. In our experiment, we can achieve about 37% time saving in total decoding time.

A New Hardware Design for Generating Digital Holographic Video based on Natural Scene (실사기반 디지털 홀로그래픽 비디오의 실시간 생성을 위한 하드웨어의 설계)

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.11
    • /
    • pp.86-94
    • /
    • 2012
  • In this paper we propose a hardware architecture of high-speed CGH (computer generated hologram) generation processor, which particularly reduces the number of memory access times to avoid the bottle-neck in the memory access operation. For this, we use three main schemes. The first is pixel-by-pixel calculation rather than light source-by-source calculation. The second is parallel calculation scheme extracted by modifying the previous recursive calculation scheme. The last one is a fully pipelined calculation scheme and exactly structured timing scheduling by adjusting the hardware. The proposed hardware is structured to calculate a row of a CGH in parallel and each hologram pixel in a row is calculated independently. It consists of input interface, initial parameter calculator, hologram pixel calculators, line buffer, and memory controller. The implemented hardware to calculate a row of a $1,920{\times}1,080$ CGH in parallel uses 168,960 LUTs, 153,944 registers, and 19,212 DSP blocks in an Altera FPGA environment. It can stably operate at 198MHz. Because of the three schemes, the time to access the external memory is reduced to about 1/20,000 of the previous ones at the same calculation speed.