• Title/Summary/Keyword: Hardware architecture

Search Result 1,325, Processing Time 0.029 seconds

High-Performance Line-Based Filtering Architecture Using Multi-Filter Lifting Method (다중필터 리프팅 방식을 이용한 고성능 라인기반 필터링 구조)

  • 서영호;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.8
    • /
    • pp.75-84
    • /
    • 2004
  • In this paper, we proposed an efficient hardware architecture of line-based lifting algorithm for Motion JPEG2000. We proposed a new architecture of a lifting-based filtering cell which has an optimized and simplified structure. It was implemented in a hardware accommodating both (9,7) and (5,4) filter. Since the output rate is linearly proportional to the input rate, one can obtain the high throughput through parallel operation simply by adding the hardware units. It was implemented into both of ASIC and FPGA The 0.35${\mu}{\textrm}{m}$ CMOS library from Samsung was used for ASIC and Altera was the target for FRGA. In ASIC, the proposed architecture used 41,592 gates for the lifting arithmetic and 128 Kbit memory. For FPGA it used 6,520 LEs(Logic Elements) and 128 ESBs(Embedded System Blocks). The implementations were stably operated in the clock frequency of 128MHz and 52MHz, respectively.

Hardware Design of High Performance ALF in HEVC Encoder for Efficient Filter Coefficient Estimation (효율적인 필터 계수 추출을 위한 HEVC 부호화기의 고성능 ALF 하드웨어 설계)

  • Shin, Seungyong;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.2
    • /
    • pp.379-385
    • /
    • 2015
  • This paper proposes the hardware architecture of high performance ALF(Adaptive Loop Filter) for efficient filter coefficient estimation. In order to make the original image which has high resolution and high quality into highly compressed image effectively and also, subjective image quality into improved image, the ALF technique of HEVC performs a filtering by estimating filter coefficients using statistical characteristics of image. The proposed ALF hardware architecture is designed with a 2-step pipelined architecture for a reduction in performance cycle by analysing an operation relationship of Cholesky decomposition for the filter coefficient estimation. Also, in the operation process of the Cholesky decomposition, a square root operation is designed to reduce logic area, computation time and computation complexity by using the multiplexer, subtracter and comparator. The proposed hardware architecture is designed using Xilinx ISE 14.3 Vertex-7 XC7VCX485T FPGA device and can support 4K UHD@40fps in real time at a maximum operation frequency of 186MHz.

A Methodology for Estimating Optimum Hardware Capacity E-learning System Development (E-러닝시스템 구축 프로젝트의 적정 하드웨어 산정방법론 연구)

  • Jung, Ji-Young;Baek, Dong-Hyun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.34 no.3
    • /
    • pp.49-56
    • /
    • 2011
  • Estimating optimum hardware capacity of an e-learning system is very important process to grasp reasonable size of designing technique architecture and budget during step of ISP(information strategic planning) and development. It hugely influences cost and quality of the whole project. While investment on information system hardware has been continuously increased, there was no certified hardware capacity estimating method in e-learning system development. A guideline for hardware sizing of information systems was established by Telecommunication Technology Association in 2008. However, the guideline is not appropriate for estimating optimum hardware capacity of an e-learning system because it was designed to provide general standards for estimating hardware capacity of various types of projects. The purpose of this paper is to provide a methodology for estimating optimum hardware capacity in e-learning system development. To develop the methodology, this study, first of all, analyzes two e-learning development projects, in which the guideline was applied to estimate optimum hardware capacity. Then, this study finds out several key factors influencing on hardware capacity. Finally, this study suggests a methodology for estimating optimum hardware capacity of an e-learning system, in which weights for the factors are determined through AHP analysis.

Area Efficient Hardware Design for Performance Improvement of SAO (SAO의 성능개선을 위한 저면적 하드웨어 설계)

  • Choi, Jisoo;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.2
    • /
    • pp.391-396
    • /
    • 2013
  • In this paper, for HEVC decoding, an SAO hardware design with less processing time and reduced area is proposed. The proposed SAO hardware architecture introduces the design processing $8{\times}8$ CU to reduce the hardware area and uses internal registers to support $64{\times}64$ CU processing. Instead of previous top-down block partitioning, it uses bottom-up block partitioning to minimize the amount of calculation and processing time. As a result of synthesizing the proposed architecture with TSMC $0.18{\mu}m$ library, the gate area is 30.7k and the maximum frequency is 250MHz. The proposed SAO hardware architecture can process the decode of a macroblock in 64 cycles.

Hardware Implementation of a Fast Inter Prediction Engine for MPEG-4 AVC (MPEG-4 AVC를 위한 고속 인터 예측기의 하드웨어 구현)

  • Lim Young hun;Lee Dae joon;Jeong Yong jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.3C
    • /
    • pp.102-111
    • /
    • 2005
  • In this paper, we propose an advanced hardware architecture for the fast inter prediction engine of the video coding standard MPEG-4 AVC. We describe the algorithm and derive the hardware architecture emphasizing and real time operation of the quarter_pel based motion estimation. The fast inter prediction engine is composed of block segmentation, motion estimation, motion compensation, and the fast quarter_pel calculator. The proposed architecture has been verified by ARM-interfaced emulation board using Excalibur & Virtex2 FPGA, and also by synthesis on Samsung 0.18 um CMOS technology. The synthesis result shows that the proposed hardware can operate at 62.5MHz. In this case, it can process about 88 QCIF video frames per second. The hardware is being used as a core module when implementing a complete MPEG-4 AVC video encoder chip for real-time multimedia application.

The Architecture of Intra-prediction & DCTQ Hardware for H.264 Encoder (H.264 부호화기를 위한 Intra-prediction & DCTQ Hardware 구조)

  • Suh, Ki-Bum
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.5
    • /
    • pp.1-9
    • /
    • 2010
  • In this paper, the novel architecture of Intra-prediction & DCTQ hardware, which can process for the Full HD image($1980{\times}1088$@30fps) in realtime, is proposed. The cycle optimization method for the overall cycle of prediction, transform, scaling, descaling, and reconstruction is proposed. To reduce the cycle in the $4{\times}4$ prediction, the quantization process is performed during the prediction cycle and pre-selection of 2 modes among the 9 modes is performed to reduce the hardware area. To reduce the hardware of $16{\times}16$ and $8{\times}8$ prediction, the sharing logic between 2 prediction is utilized. The proposed architecture can process the 30frame/sec of full HD image in 108 MHz clock and operate 425 cycle for one macroblock.

A Hardware Architecture of Hough Transform Using an Improved Voting Scheme (개선된 보팅 정책을 적용한 허프 변환 하드웨어 구조)

  • Lee, Jeong-Rok;Bae, Kyeong-Ryeol;Moon, Byungin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.9
    • /
    • pp.773-781
    • /
    • 2013
  • The Hough transform for line detection is widely used in many machine vision applications due to its robustness against data loss and distortion. However, it is not appropriate for real-time embedded vision systems, because it has inefficient computation structure and demands a large number of memory accesses. Thus, this paper proposes an improved voting scheme of the Hough transform, and then applies this scheme to a Hough transform hardware architecture so that it can provide real-time performance with less hardware resource. The proposed voting scheme reduces computation overhead of the voting procedure using correlation between adjacent pixels, and improves computational efficiency by increasing reusability of vote values. The proposed hardware architecture, which adopts this improved scheme, maximizes its throughput by computing and storing vote values for many adjacent pixels in parallel. This parallelization for throughput improvement is accomplished with little hardware overhead compared with sequential computation.

Hardware Design of Efficient SAO for High Performance In-loop filters (고성능 루프내 필터를 위한 효율적인 SAO 하드웨어 설계)

  • Park, Seungyong;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.543-545
    • /
    • 2017
  • This paper describes the SAO hardware architecture design for high performance in-loop filters. SAO is an inner module of in-loop filter, which compensates for information loss caused by block-based image compression and quantization. However, HEVC's SAO requires a high computation time because it performs pixel-unit operations. Therefore, the SAO hardware architecture proposed in this paper is based on a $4{\times}4$ block operation and a 2-stage pipeline structure for high-speed operation. The information generation and offset computation structure for SAO computation is designed in a parallel structure to minimize computation time. The proposed hardware architecture was designed with Verilog HDL and synthesized with TSMC chip process 130nm and 65nm cell library. The proposed hardware design achieved a maximum frequency of 476MHz yielding 163k gates and 312.5MHz yielding 193.6k gates on the 130nm and 65nm processes respectively.

  • PDF

A VLSI Design of High Performance H.264 CAVLC Decoder Using Pipeline Stage Optimization (파이프라인 최적화를 통한 고성능 H.264 CAVLC 복호기의 VLSI 설계)

  • Lee, Byung-Yup;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.12
    • /
    • pp.50-57
    • /
    • 2009
  • This paper proposes a VLSI architecture of CAVLC hardware decoder which is a tool eliminating statistical redundancy in H.264/AVC video compression. The previous CAVLC hardware decoder used four stages to decode five code symbols. The previous CAVLC hardware architectures decreased decoding performance because there was an unnecessary idle cycle in between state transitions. Likewise, the computation of valid bit length includes an unnecessary idle cycle. This paper proposes hardware architecture to eliminate the idle cycle efficiently. Two methods are applied to the architecture. One is a method which eliminates an unnecessary things of buffers storing decoded codes and then makes efficient pipeline architecture. The other one is a shifter control to simplify operations and controls in the process of calculating valid bit length. The experimental result shows that the proposed architecture needs only 89 cycle in average for one macroblock decoding. This architecture improves the performance by about 29% than previous designs. The synthesis result shows that the design achieves the maximum operating frequency at 140Mhz and the hardware cost is about 11.5K under a 0.18um CMOS process. Comparing with the previous design, it can achieve low-power operation because this design is implemented with high throughputs and low gate count.

The New Architecture of Low Power Inner Product Processor for Reconfigurable Neural Networks (재구성 가능한 뉴럴 네트워크 구현을 위한 새로운 저전력 내적연산 프로세서 구조)

  • 임국찬;이현수
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.5
    • /
    • pp.61-70
    • /
    • 2004
  • The operation mode of neural network is divided into learning and recognition process. Learning is updating process of weight until neural network archives target result from input pattern. Recognition is arithmetic process of input pattern and weight. Traditional inner product process is focused to improve processing speed and hardware complexity. There is no hardware architecture to distinguish between loaming and recognition mode of neural network. In this paper we propose the new architecture of low power inner product processor for reconfigurable neural network. The proposed architecture is similar with bit-serial inner product processor on learning mode. It have several advantages which are fast processing base on bit-level, suitability of hardware implementation and pipeline architecture to compute data. And proposed architecture minimizes active units and reduces consumption power on recognition mode. Result of simulation shows that active units is depend on bit representation of weight, but we can reduce active units about 50 precent.