• Title/Summary/Keyword: 1/ 4 LUT Generation

Search Result 3, Processing Time 0.016 seconds

An Efficient Hardware Architecture of Coordinate Transformation for Panorama Unrolling of Catadioptric Omnidirectional Images

  • Lee, Seung-Ho
    • Journal of IKEEE
    • /
    • v.15 no.1
    • /
    • pp.10-14
    • /
    • 2011
  • In this paper, we present an efficient hardware architecture of unrolling image mapper of catadioptric omnidirectional imaging systems. The catadioptric omnidirectional imaging systems generate images of 360 degrees of view and need to be transformed into panorama images in rectangular coordinate. In most application, it has to perform the panorama unrolling in real-time and at low-cost, especially for high-resolution images. The proposed hardware architecture adopts a software/hardware cooperative structure and employs several optimization schemes using look-up-table(LUT) of coordinate conversion. To avoid the on-line division operation caused by the coordinate transformation algorithm, the proposed architecture has the LUT which has pre-computed division factors. And then, the amount of memory used by the LUT is reduced to 1/4 by using symmetrical characteristic compared with the conventional architecture. Experimental results show that the proposed hardware architecture achieves an effective real-time performance and lower implementation cost, and it can be applied to other kinds of catadioptric omnidirectional imaging systems.

Energy-Efficient DNN Processor on Embedded Systems for Spontaneous Human-Robot Interaction

  • Kim, Changhyeon;Yoo, Hoi-Jun
    • Journal of Semiconductor Engineering
    • /
    • v.2 no.2
    • /
    • pp.130-135
    • /
    • 2021
  • Recently, deep neural networks (DNNs) are actively used for action control so that an autonomous system, such as the robot, can perform human-like behaviors and operations. Unlike recognition tasks, the real-time operation is essential in action control, and it is too slow to use remote learning on a server communicating through a network. New learning techniques, such as reinforcement learning (RL), are needed to determine and select the correct robot behavior locally. In this paper, we propose an energy-efficient DNN processor with a LUT-based processing engine and near-zero skipper. A CNN-based facial emotion recognition and an RNN-based emotional dialogue generation model is integrated for natural HRI system and tested with the proposed processor. It supports 1b to 16b variable weight bit precision with and 57.6% and 28.5% lower energy consumption than conventional MAC arithmetic units for 1b and 16b weight precision. Also, the near-zero skipper reduces 36% of MAC operation and consumes 28% lower energy consumption for facial emotion recognition tasks. Implemented in 65nm CMOS process, the proposed processor occupies 1784×1784 um2 areas and dissipates 0.28 mW and 34.4 mW at 1fps and 30fps facial emotion recognition tasks.

A Security SoC embedded with ECDSA Hardware Accelerator (ECDSA 하드웨어 가속기가 내장된 보안 SoC)

  • Jeong, Young-Su;Kim, Min-Ju;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.7
    • /
    • pp.1071-1077
    • /
    • 2022
  • A security SoC that can be used to implement elliptic curve cryptography (ECC) based public-key infrastructures was designed. The security SoC has an architecture in which a hardware accelerator for the elliptic curve digital signature algorithm (ECDSA) is interfaced with the Cortex-A53 CPU using the AXI4-Lite bus. The ECDSA hardware accelerator, which consists of a high-performance ECC processor, a SHA3 hash core, a true random number generator (TRNG), a modular multiplier, BRAM, and control FSM, was designed to perform the high-performance computation of ECDSA signature generation and signature verification with minimal CPU control. The security SoC was implemented in the Zynq UltraScale+ MPSoC device to perform hardware-software co-verification, and it was evaluated that the ECDSA signature generation or signature verification can be achieved about 1,000 times per second at a clock frequency of 150 MHz. The ECDSA hardware accelerator was implemented using hardware resources of 74,630 LUTs, 23,356 flip-flops, 32kb BRAM, and 36 DSP blocks.