• Title/Summary/Keyword: hardware architecture

Search Result 1,325, Processing Time 0.025 seconds

Development of C-Model Simulator for H.264/SVC Decoder (H.264/SVC 복호기 C-Model 시뮬레이터 개발)

  • Cheong, Cha-Keon
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.3
    • /
    • pp.9-19
    • /
    • 2009
  • In this paper, we propose a novel hardware architecture to facilitate the applicable SoC chip design of H.264/SVC which has a great deal of advancement in the international standardization in recent. Moreover, a new C-model simulator based on the proposed hardware system will be presented to support optimal SoC circuit development. Since the proposed SVC decoder is consist of some hardware engine for processing of major decoding tools and core processor for software processing, the system is simply implemented with the conventional embedded system. To improve the feasibility and applicability, and reduce the decoder complexity, the hardware decoder architecture is constructed with only the consideration of IPPP structure scalability without using the full B-picture. Finally, we present results of decoder hardware implementation and decoded picture to show the effectiveness of the proposed hardware architecture and C-model simulator.

A VLSI architecture for fast motion estimation algorithm (고속 움직임 추정 알고리즘에 적합한 VLSI 구조 연구)

  • 이재헌;라종범
    • Proceedings of the IEEK Conference
    • /
    • 1998.06a
    • /
    • pp.717-720
    • /
    • 1998
  • In this paper, we propose a VLSI architecture for implementing a crecently proposed fast block matching algorithm, which is called the HSBMA3S. The proposed architecture consists of a systolic array based basic unit and two shift register arrays. And it covers a search range of -32 ~+31. By using a basic unit repeatedly, we can redcue the number of gates. To implement the basic unit, we can select one among various conventional systolic arrays by trading-off between speed and hardware cost. In this paper, the architecture for the basic unit is selected so that the hardware cost can be minimized. The proposed architecture is fast enough for low bit-rate applications (frame size of 352x288, 30 frames/sec) and can be implemented by less than 20,000 gates. Moreover, by simply modifying the basic unit, the architecture can be used for the higher bit-rate application of the frame size of 720*480 and 30 frames/sec.

  • PDF

Design of a new VLSI architecture for morphological filters (새로운 수리형태학 필터 VLSI 구조 설계)

  • 웅수환;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.8
    • /
    • pp.22-38
    • /
    • 1997
  • This paper proposes a new VLSI architecture for morphological filters and presents its chip design and implementation. The proposed architecture can significantly reduce hardware costs compared with existing architecture by using a feedback loop path to reuse partial results and a decoder/encoder pair to detect maximum/minimum values. In addition, the proposed architecture requires one common architecture for both diltion and erosion and fewer number of operations. Moreover, it can be easily extended for larger size morphologica operations. We developed VHDL (VHSIC hardware description language) models, performed logic synthesis using the SYNOPSYS CAD tool. We used the SOG (sea-of-gate) cell library and implemented the actual chip. The total number of gates is only 2,667 and the clock frequency is 30 MHz that meets real-time image processing requirements.

  • PDF

Test Vector Generator of timing simulation for 224-bit ECDSA hardware (224비트 ECDSA 하드웨어 시간 시뮬레이션을 위한 테스트벡터 생성기)

  • Kim, Tae Hun;Jung, Seok Won
    • Journal of Internet of Things and Convergence
    • /
    • v.1 no.1
    • /
    • pp.33-38
    • /
    • 2015
  • Hardware are developed in various architecture. It is necessary to verifying value of variables in modules generated in each clock cycles for timing simulation. In this paper, a test vector generator in software type generates test vectors for timing simulation of 224-bit ECDSA hardware modules in developing stage. It provides test vectors with GUI format and text file format.

Optimization Design Method for Inner Product Using CSHM Algorithm and its Application to 1-D DCT Processor (연산공유 승산 알고리즘을 이용한 내적의 최적화 및 이를 이용한 1차원 DCT 프로세서 설계)

  • 이태욱;조상복
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.2
    • /
    • pp.86-93
    • /
    • 2004
  • The DCT algorithm needs an efficient hardware architecture to compute inner product. The conventional design method, like ROM-based DA(Distributed Arithmetic), has large hardware complexity. Because of this reason, a CSHM(Computation Sharing Multiplication) was proposed for implementing inner product by Park. However, the Park's CSHM has inefficient hardware architecture in the precomputer and select units. Therefore it degrades the performance of the multiplier. In this paper, we presents the optimization design method for inner product using CSHM algorithm and applied it to implementation of 1-D DCT processor. The experimental results show that the proposed multiplier is more efficient than Park's when hardware architectures and logic synthesis results were compared. The designed 1-D DCT processor by using proposed design method is more high performance than typical methods.

FPGA Design of High-performance Display Converter (고성능 디스플레이 변환기의 FPGA 설계)

  • Choi, Hyun-Jun;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.8
    • /
    • pp.1895-1900
    • /
    • 2010
  • In this paper, we propose the hardware architecture of a display converter which is consisted of four functional blocks. The four functional blocks consists of a set of color space converter, de-interacer, video display scaler, and gamma corrector. After the proposed architecture was implemented into hardware, we verified that it operated exactly. The designed hardware has 7,629 LUT and 6,800 Logic Register in Stratix device of Altera and operates in 270 MHz clock frequency.

Scalable Big Data Pipeline for Video Stream Analytics Over Commodity Hardware

  • Ayub, Umer;Ahsan, Syed M.;Qureshi, Shavez M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.4
    • /
    • pp.1146-1165
    • /
    • 2022
  • A huge amount of data in the form of videos and images is being produced owning to advancements in sensor technology. Use of low performance commodity hardware coupled with resource heavy image processing and analyzing approaches to infer and extract actionable insights from this data poses a bottleneck for timely decision making. Current approach of GPU assisted and cloud-based architecture video analysis techniques give significant performance gain, but its usage is constrained by financial considerations and extremely complex architecture level details. In this paper we propose a data pipeline system that uses open-source tools such as Apache Spark, Kafka and OpenCV running over commodity hardware for video stream processing and image processing in a distributed environment. Experimental results show that our proposed approach eliminates the need of GPU based hardware and cloud computing infrastructure to achieve efficient video steam processing for face detection with increased throughput, scalability and better performance.

Trends in Lightweight Neural Network Algorithms and Hardware Acceleration Technologies for Transformer-based Deep Neural Networks (Transformer를 활용한 인공신경망의 경량화 알고리즘 및 하드웨어 가속 기술 동향)

  • H.J. Kim;C.G. Lyuh
    • Electronics and Telecommunications Trends
    • /
    • v.38 no.5
    • /
    • pp.12-22
    • /
    • 2023
  • The development of neural networks is evolving towards the adoption of transformer structures with attention modules. Hence, active research focused on extending the concept of lightweight neural network algorithms and hardware acceleration is being conducted for the transition from conventional convolutional neural networks to transformer-based networks. We present a survey of state-of-the-art research on lightweight neural network algorithms and hardware architectures to reduce memory usage and accelerate both inference and training. To describe the corresponding trends, we review recent studies on token pruning, quantization, and architecture tuning for the vision transformer. In addition, we present a hardware architecture that incorporates lightweight algorithms into artificial intelligence processors to accelerate processing.

High-Performance Hardware Architecture for Stereo Matching (스테레오 정합을 위한 고성능 하드웨어 구조)

  • Seo, Young-Ho;Kim, Woo-Youl;Lee, Yoon-Hyuk;Koo, Ja-Myung;Kim, Bo-Ra;Kim, Yoon-Ju;An, Ho-Myung;Choi, Hyun-Jun;Kim, Dong-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.05a
    • /
    • pp.635-637
    • /
    • 2013
  • This paper proposed a new hardware architecture for stereo matching in real time. We minimized the amount of calculation and the number of memory accesses through analyzing calculation of stereo matching. From this, we proposed a new stereo matching calculating cell and a new hardware architecture by expanding it in parallel, which concurrently calculates cost function for all pixels in a search range. After expanding it, we proposed a new hardware architecture to calculate cost function for 2-dimensional region. The implemented hardware can be operated with minimum 250Mhz clock frequence in FPGA environment, and has the performance of 813fps in case of the search range of 64 pixels and the image size of $640{\times}480$.

  • PDF

Hardware Implementation of Past Multi-resolution Motion Estimator for MPEG-4 AVC (MPEG-4 AVC를 위한 고속 다해상도 움직임 추정기의 하드웨어 구현)

  • Lim Young-hun;Jeong Yong-jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.11C
    • /
    • pp.1541-1550
    • /
    • 2004
  • In this paper, we propose an advanced hardware architecture for fast multi-resolution motion estimation of the video coding standard MPEG-1,2 and MPEG-4 AVC. We describe the algorithm and derive hardware architecture emphasizing the importance of area for low cost and fast operation by using the shared memory, the special ram architecture, the motion vector for 4 pixel x 4 pixel, the spiral search and so on. The proposed architecture has been verified by ARM-interfaced emulation board using Excalibur Altera FPGA and also by ASIC synthesis using Samsung 0.18 m CMOS cell library. The ASIC synthesis result shows that the proposed hardware can operate at 140 MHz, processing more than 1,100 QCIF video frames or 70 4CIF video frames per second. The hardware is going to be used as a core module when implementing a complete MPEG-4 AVC video encoder ASIC for real-time multimedia application.