Search | Korea Science

Hardware Design of In-loop Filter for High Performance HEVC Encoder (고성능 HEVC 부호기를 위한 루프 내 필터 하드웨어 설계)

Park, Seungyong;Im, Junseong;Ryoo, Kwangki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.20 no.2
- /
- pp.335-342
- /
- 2016
This paper proposes efficient hardware structure of in-loop filter for a high-performance HEVC (High Efficiency Video Coding) encoder. HEVC uses in-loop filter consisting of deblocking filter and SAO (Sample Adaptive Offset) to improve the picture quality in a reconstructed image due to a quantization error. However, in-loop filter causes an increase in complexity due to the additional encoder and decoder operations. A proposed in-loop filter is implemented as a three-stage pipeline to perform the deblocking filtering and SAO operation with a reduced number of cycles. The proposed deblocking filter is also implemented as a six-stage pipeline to improve efficiency and performs a new filtering order for efficient memory architecture. The proposed SAO processes six pixels parallelly at a time to reduce execution cycles. The proposed in-loop filter encoder architecture is designed by Verilog HDL, and implemented by 131K logic gates in TSMC $0.13{\mu}m$ process. At 164MHz, the proposed in-loop filter encoder can support 4K Ultra HD video encoding at 60fps in real time.
https://doi.org/10.6109/jkiice.2016.20.2.335 인용 PDF KSCI

Hardware Design of High Performance In-loop Filter in HEVC Encoder for Ultra HD Video Processing in Real Time (UHD 영상의 실시간 처리를 위한 고성능 HEVC In-loop Filter 부호화기 하드웨어 설계)

Im, Jun-seong;Dennis, Gookyi;Ryoo, Kwang-ki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.10a
- /
- pp.401-404
- /
- 2015
This paper proposes a high-performance in-loop filter in HEVC(High Efficiency Video Coding) encoder for Ultra HD video processing in real time. HEVC uses in-loop filter consisting of deblocking filter and SAO(Sample Adaptive Offset) to solve the problems of quantization error which causes image degradation. In the proposed in-loop filter encoder hardware architecture, the deblocking filter and SAO has a 2-level hybrid pipeline structure based on the $32{\times}32CTU$ to reduce the execution time. The deblocking filter is performed by 6-stage pipeline structure, and it supports minimization of memory access and simplification of reference memory structure using proposed efficient filtering order. Also The SAO is implemented by 2-statge pipeline for pixel classification and applying SAO parameters and it uses two three-layered parallel buffers to simplify pixel processing and reduce operation cycle. The proposed in-loop filter encoder architecture is designed by Verilog HDL, and implemented by 205K logic gates in TSMC 0.13um process. At 110MHz, the proposed in-loop filter encoder can support 4K Ultra HD video encoding at 30fps in realtime.
PDF

An Intra Prediction Hardware Design for High Performance HEVC Encoder (고성능 HEVC 부호기를 위한 화면내 예측 하드웨어 설계)

Park, Seung-yong;Guard, Kanda;Ryoo, Kwang-ki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.10a
- /
- pp.875-878
- /
- 2015
In this paper, we propose an intra prediction hardware architecture with less processing time, computations and reduced hardware area for a high performance HEVC encoder. The proposed intra prediction hardware architecture uses common operation units to reduce computational complexity and uses $4{\times}4$ block unit to reduce hardware area. In order to reduce operation time, common operation unit uses one operation unit to generate predicted pixels and filtered pixels in all prediction modes. Intra prediction hardware architecture introduces the $4{\times}4$ PU design processing to reduce the hardware area and uses intemal registers to support $32{\times}32$ PU processmg. The proposed hardware architecture uses ten common operation units which can reduce execution cycles of intra prediction. The proposed Intra prediction hardware architecture is designed using Verilog HDL(Hardware Description Language), and has a total of 41.5k gates in TSMC $0.13{\mu}m$ CMOS standard cell library. At 150MHz, it can support 4K UHD video encoding at 30fps in real time, and operates at a maximum of 200MHz.
PDF

An effective transform hardware design for real-time HEVC encoder (HEVC 부호기의 실시간처리를 위한 효율적인 변환기 하드웨어 설계)

Jo, Heung-seon;Kumi, Fred Adu;Ryoo, Kwang-ki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.10a
- /
- pp.416-419
- /
- 2015
In this paper, we propose an effective design of transform hardware for real-time HEVC(High Efficiency Video Coding) encoder. HEVC encoder determines the transform mode($4{\times}4$, $8{\times}8$, $16{\times}16$, $32{\times}32$) by comparing RDCost. RDCost require a significant amount of computation and time because it is determined by bit-rate and distortion which is computated via transform, quantization, dequantization, and inverse transform. This paper therefore proposes a new method for transform mode determination using sum of transform coefficient. Also, proposed hardware architecture is implemented with multiplexer, recursive adder/subtracter, and shifter only to derive reduction of the computation. Proposed method for transform mode determination results in an increase of 0.096 in BD-PSNR, 0.057 in BD-Bitrate, and decrease of 9.3% in encoding time by comparing HM 10.0. The hardware which is proposed is implemented by 256K logic gates in TSMC 130nm process. Its maximum operation frequency is 200MHz. At 140MHz, the proposed hardware can support 4K Ultra HD video encoding at 60fps in real time.
PDF

Design of Message Passing Engine Based on Processing Node Status for MPI Collective Communication (MPI 집합통신을 위한 프로세싱 노드 상태 기반의 메시지 전달 엔진 설계)

Chung, Won-Young;Lee, Yong-Surk
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.37 no.8B
- /
- pp.668-676
- /
- 2012
In this paper, on the assumption that MPI collective communication function is converted into a group of point-to-point communication functions in the transaction level, an algorithm that optimizes broadcast, scatter and gather function among MPI collective communication is proposed. The MPI hardware engine that operates the proposed algorithm was designed, and it was named the OCC-MPE (Optimized Collective Communication Message Passing Engine). The OCC-MPE operates point-to-point communication by using the standard send mode. The transmission order is arranged according to the algorithm that proposes the most frequently used broadcast, scatter and gather functions among the collective communications, so the whole communication time is reduced. To measure the performance of the proposed algorithm, the OCC-MPE with the Bus Functional Model (BFM) based on SystemC was designed. After evaluating the performance through the BFM based on SystemC, the proposed OCC-MPE is designed by using VerilogHDL. As a result of synthesizing with the TSMC $0.18{\mu}m$, the gate count of each OCC-MPE is approximately 1978.95 with four processing nodes. That occupies approximately 4.15% in the whole system, which means it takes up a relatively small amount. Improved performance is expected with relatively small amounts of area increase if the OCC-MPE operated by the proposed algorithm is added to the MPSoC (Multi-Processor System on a Chip).
https://doi.org/10.7840/kics.2012.37B.8.668 인용 PDF KSCI

Design of an Improved Anti-Collision Unit for an RFID Reader System Based on Gen2 (Gen2 리더 시스템의 개선된 충돌방지 유닛 설계)

Sim, Jae-Hee;Lee, Yong-Joo;Lee, Yong-Surk
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.34 no.2A
- /
- pp.177-183
- /
- 2009
In this paper, we propose an improved anti-collision algorithm. We have designed an anti-collision unit using this algorithm for the 18000-6 Type C Class 1 Generation 2 standard (Gen2). The Gen2 standard uses a Q-algorithm for incremental method on the Dynamic Slot-Aloha algorithm. It has basically enhanced performance over the Slot-Aloha algorithm. Unfortunately, there are several non-clarified parts: initial $Q_{fp}$ value, weighted C, and the ending point of the algorithm. If an incorrect value is selected, it causes degradation in performance. Thus we propose an improved anti-collision algorithm by clearly defining the vague parts of the existing algorithm. Simulation results showed an improved performance of up to 34.8% using an optimized value of C and the initial $Q_{fp}$ value. With the ending condition, performance is 34.7%. The anti-collision unit is designed using the Verilog HDL. The module was synthesized using Synopsys' Design Compiler and the TSMC $0.2{\mu}m$ standard cell library. The synthesized result yielded 3,847 gates, and was guaranteed under the proposed working frequency of 19.2MHz.
PDF KSCI

An Intra Prediction Hardware Architecture Design for Computational Complexity Reduction of HEVC Decoder (HEVC 복호기의 연산 복잡도 감소를 위한 화면내 예측 하드웨어 구조 설계)

Jung, Hongkyun;Ryoo, Kwangki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.17 no.5
- /
- pp.1203-1212
- /
- 2013
In this paper, an intra prediction hardware architecture is proposed to reduce computational complexity of intra prediction in HEVC decoder. The architecture uses shared operation units and common operation units and adopts a fast smoothing decision algorithm and a fast algorithm to generate coefficients of a filter. The shared operation unit shares adders processing common equations to remove the computational redundancy. The unit computes an average value in DC mode for reducing the number of execution cycles in DC mode. In order to reduce operation units, the common operation unit uses one operation unit generating predicted pixels and filtered pixels in all prediction modes. In order to reduce processing time and operators, the decision algorithm uses only bit-comparators and the fast algorithm uses LUT instead of multiplication operators. The proposed architecture using four shared operation units and eight common operation units which can reduce execution cycles of intra prediction. The architecture is synthesized using TSMC 0.13um CMOS technology. The gate count and the maximum operating frequency are 40.5k and 164MHz, respectively. As the result of measuring the performance of the proposed architecture using the extracted data from HM 7.1, the execution cycle of the architecture is about 93.7% less than the previous design.
https://doi.org/10.6109/jkiice.2013.17.5.1203 인용 PDF KSCI

Hardware Design of High Performance HEVC Deblocking Filter for UHD Videos (UHD 영상을 위한 고성능 HEVC 디블록킹 필터 설계)

Park, Jaeha;Ryoo, Kwangki
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.19 no.1
- /
- pp.178-184
- /
- 2015
This paper proposes a hardware architecture for high performance Deblocking filter(DBF) in High Efficiency Video Coding for UHD(Ultra High Definition) videos. This proposed hardware architecture which has less processing time has a 4-stage pipelined architecture with two filters and parallel boundary strength module. Also, the proposed filter can be used in low-voltage design by using clock gating architecture in 4-stage pipeline. The segmented memory architecture solves the hazard issue that arises when single port SRAM is accessed. The proposed order of filtering shortens the delay time that arises when storing data into the single port SRAM at the pre-processing stage. The DBF hardware proposed in this paper was designed with Verilog HDL, and was implemented with 22k logic gates as a result of synthesis using TSMC 0.18um CMOS standard cell library. Furthermore, the dynamic frequency can process UHD 8k($7680{\times}4320$) samples@60fps using a frequency of 150MHz with an 8K resolution and maximum dynamic frequency is 285MHz. Result from analysis shows that the proposed DBF hardware architecture operation cycle for one process coding unit has improved by 32% over the previous one.
https://doi.org/10.6109/jkiice.2015.19.1.178 인용 PDF KSCI KPUBS HTML

The Hardware Design of Effective Deblocking Filter for HEVC Encoder (HEVC 부호기를 위한 효율적인 디블록킹 하드웨어 설계)

Park, Jae-Ha;Park, Seung-yong;Ryoo, Kwang-ki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2014.10a
- /
- pp.755-758
- /
- 2014
In this paper, we propose effective Deblocking Filter hardware architecture for High Efficiency Video Coding encoder. we propose Deblocking Filter hardware architecture with less processing time, filter ordering for low area design, effective memory architecture and four-pipeline for a high performance HEVC(High Efficiency Video Coding) encoder. Proposed filter ordering can be used to reduce delay according to preprocessing. It can be used for realtime single-port SRAM read and write. it can be used in parallel processing by using two filters. Using 10 memory is effective for solving the hazard caused by a single-port SRAM. Also the proposed filter can be used in low-voltage design by using clock gating architecture in 4-pipeline. The proposed Deblocking Filter encoder architecture is designed by Verilog HDL, and implemented by 100k logic gates in TSMC $0.18{\mu}m$ process. At 150MHz, the proposed Deblocking Filter encoder can support 4K Ultra HD video encoding at 30fps, and can be operated at a maximum speed of 200MHz.
PDF

Hardware Design of High-Performance SAO in HEVC Encoder for Ultra HD Video Processing in Real Time (UHD 영상의 실시간 처리를 위한 고성능 HEVC SAO 부호화기 하드웨어 설계)

Cho, Hyun-pyo;Park, Seung-yong;Ryoo, Kwang-ki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2014.10a
- /
- pp.271-274
- /
- 2014
This paper proposes high-performance SAO(Sample Adaptive Offset) in HEVC(High Efficiency Video Coding) encoder for Ultra HD video processing in real time. SAO is a newly adopted technique belonging to the in-loop filter in HEVC. The proposed SAO encoder hardware architecture uses three-layered buffers to minimize memory access time and to simplify pixel processing and also uses only adder, subtractor, shift register and feed-back comparator to reduce area. Furthermore, the proposed architecture consists of pipelined pixel classification and applying SAO parameters, and also classifies four consecutive pixels into EO and BO concurrently. These result in the reduction of processing time and computation. The proposed SAO encoder architecture is designed by Verilog HDL, and implemented by 180k logic gates in TSMC $0.18{\mu}m$ process. At 110MHz, the proposed SAO encoder can support 4K Ultra HD video encoding at 30fps in real time.
PDF

Search Result 188, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)