• Title/Summary/Keyword: video encoder

Search Result 447, Processing Time 0.025 seconds

DCT-domain MPEG-2/H.264 Video Transcoder System Architecture for DMB Services (DMB 서비스를 위한 DCT 기반 MPEG-2/H.264 비디오 트랜스코더 시스템 구조)

  • Lee Joo-Kyong;Kwon Soon-Young;Park Seong-Ho;Kim Young-Ju;Chung Ki-Dong
    • The KIPS Transactions:PartB
    • /
    • v.12B no.6 s.102
    • /
    • pp.637-646
    • /
    • 2005
  • Most of the multimedia contents for DBM services art provided as MPEG-2 bit streams. However, they have to be transcoded to H.264 bit streams for practical services because the standard video codec for DMB is H.264. The existing transcoder architecture is Cascaded Pixel-Domain Transcoding Architecture, which consists of the MPEG-2 dacoding phase and the H.264 encoding phase. This architecture can be easily implemented using MPEG-2 decoder and H.264 encoder without source modifying. However. It has disadvantages in transcoding time and DCT-mismatch problem. In this paper, we propose two kinds of transcoder architecture, DCT-OPEN and DCT-CLOSED, to complement the CPDT architecture. Although DCT-OPEN has lower PSNR than CPDT due to drift problem, it is efficient for real-time transcoding. On the contrary, the DCT-CLOSED architecture has the advantage of PSNR over CPDT at the cost of transcoding time.

Selective Inter-layer Residual Prediction Coding and Fast Mode Decision for Spatial Enhancement Layers in Scalable Video Coding (스케일러블 비디오 부호화에서 선택적 계층간 차분 신호 부호화 및 공간적 향상 계층에서의 모드 결정)

  • Lee, Bum-Shik;Hahm, Sang-Jin;Park, Chang-Seob;Park, Keun-Soo;Kim, Mun-Churl
    • Journal of Broadcast Engineering
    • /
    • v.12 no.6
    • /
    • pp.596-610
    • /
    • 2007
  • In order to reduce the complexity of SVC encoding, we introduce a fast mode decision method in the enhancement layers of spatial scalability by selectively performing the inter-layer residual prediction of SVC. The Inter-layer residual prediction coding in Scalable Video Coding has a large advantage of enhancing the coding efficiency since it utilizes the correlation between two residuals from a lower spatial layer and its next higher spatial layer. However, this entails the dramatical increase in the complexity of SVC encoders. The proposed method is to analyze the characteristics of integer transform coefficients for the subtracted signal for two residuals from lower and upper spatial layers. Then it selectively performs the inter-layer residual prediction coding and rate-distortion optimizations in the upper spatial enhancement layer if the SAD values of residuals exceed adaptive threshold values. Therefore, by classifying the residuals according to the properties of integer-transform coefficients only with SAD of residuals between two layers, the SVC encoder can perform the inter-layer residual coding selectively, thus significantly reducing the total required encoding time. The proposed method results in reduction of the total encoding time with 51.5% in average while maintaining the RD performance with negligible amounts of quality degradation.

Signaling Method of Multiple Motion Vector Resolutions Using Contradiction Testing (모순 검증을 통한 다중 움직임 벡터 해상도 시그널링 방법)

  • Won, Kwanghyun;Park, Younghyeon;Jeon, Byeungwoo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.7
    • /
    • pp.107-118
    • /
    • 2015
  • Although most current video coding standards set a fixed motion vector resolution like quarter-pel accuracy, a scheme supporting multiple motion vector resolutions can improve the coding efficiency of video since it can allow to use just required motion vector accuracy depending on the video content and at the same time to generate more accurate motion predictor. However, the selected motion vector resolution for each motion vector is a signaling overhead. This paper proposes a contradiction testing-based signaling scheme of the motion vector resolution. The proposed method selects a best resolution for each motion vector among multiple candidates in such a way to produce the minimum amount of coded bits for the motion vector. The signaling overhead is reduced by contradiction testing that operates under a predefined criterion at both encoder and decoder with a purpose of pruning irrelevant candidate motion vector resolutions from signaling responsibility. Experimental results verified that the proposed scheme is effective in reducing coded motion information by achieving its $Bj{\o}ntegaard$ delta bit rate (BDBR) gain of about 4.01% on average (and up to 15.17%) compared to the conventional scheme with a fixed motion vector resolution.

Fast Mode Decision using Block Size Activity for H.264/AVC (블록 크기 활동도를 이용한 H.264/AVC 부호화 고속 모드 결정)

  • Jung, Bong-Soo;Jeon, Byeung-Woo;Choi, Kwang-Pyo;Oh, Yun-Je
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.2 s.314
    • /
    • pp.1-11
    • /
    • 2007
  • H.264/AVC uses variable block sizes to achieve significant coding gain. It has 7 different coding modes having different motion compensation block sizes in Inter slice, and 2 different intra prediction modes in Intra slice. This fine-tuned new coding feature has achieved far more significant coding gain compared with previous video coding standards. However, extremely high computational complexity is required when rate-distortion optimization (RDO) algorithm is used. This computational complexity is a major problem in implementing real-time H.264/AVC encoder on computationally constrained devices. Therefore, there is a clear need for complexity reduction algorithm of H.264/AVC such as fast mode decision. In this paper, we propose a fast mode decision with early $P8\times8$ mode rejection based on block size activity using large block history map (LBHM). Simulation results show that without any meaningful degradation, the proposed method reduces whole encoding time on average by 53%. Also the hybrid usage of the proposed method and the early SKIP mode decision in H.264/AVC reference model reduces whole encoding time by 63% on average.

An Intra Prediction Hardware Design for High Performance HEVC Encoder (고성능 HEVC 부호기를 위한 화면내 예측 하드웨어 설계)

  • Park, Seung-yong;Guard, Kanda;Ryoo, Kwang-ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.875-878
    • /
    • 2015
  • In this paper, we propose an intra prediction hardware architecture with less processing time, computations and reduced hardware area for a high performance HEVC encoder. The proposed intra prediction hardware architecture uses common operation units to reduce computational complexity and uses $4{\times}4$ block unit to reduce hardware area. In order to reduce operation time, common operation unit uses one operation unit to generate predicted pixels and filtered pixels in all prediction modes. Intra prediction hardware architecture introduces the $4{\times}4$ PU design processing to reduce the hardware area and uses intemal registers to support $32{\times}32$ PU processmg. The proposed hardware architecture uses ten common operation units which can reduce execution cycles of intra prediction. The proposed Intra prediction hardware architecture is designed using Verilog HDL(Hardware Description Language), and has a total of 41.5k gates in TSMC $0.13{\mu}m$ CMOS standard cell library. At 150MHz, it can support 4K UHD video encoding at 30fps in real time, and operates at a maximum of 200MHz.

  • PDF

Multimodal Sentiment Analysis Using Review Data and Product Information (리뷰 데이터와 제품 정보를 이용한 멀티모달 감성분석)

  • Hwang, Hohyun;Lee, Kyeongchan;Yu, Jinyi;Lee, Younghoon
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.1
    • /
    • pp.15-28
    • /
    • 2022
  • Due to recent expansion of online market such as clothing, utilizing customer review has become a major marketing measure. User review has been used as a tool of analyzing sentiment of customers. Sentiment analysis can be largely classified with machine learning-based and lexicon-based method. Machine learning-based method is a learning classification model referring review and labels. As research of sentiment analysis has been developed, multi-modal models learned by images and video data in reviews has been studied. Characteristics of words in reviews are differentiated depending on products' and customers' categories. In this paper, sentiment is analyzed via considering review data and metadata of products and users. Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Self Attention-based Multi-head Attention models and Bidirectional Encoder Representation from Transformer (BERT) are used in this study. Same Multi-Layer Perceptron (MLP) model is used upon every products information. This paper suggests a multi-modal sentiment analysis model that simultaneously considers user reviews and product meta-information.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

A Three-Step Mode Selection Algorithm for Fast Encoding in H.264/AVC (H.264/AVC에서 빠른 부호화를 위한 3단계 모드 선택 기법)

  • Jeon, Hyun-Gi;Kim, Sung-Min;Kang, Jin-Mi;Chung, Ki-Dong
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.2
    • /
    • pp.163-174
    • /
    • 2008
  • The H.264/AVC provides gains in compression efficiency of up to 50% over a wide range of bit rates and video resolutions compared to previous standards. However, to achieve such high coding efficiency, the complexity of H.264/AVC encoder is also increased drastically than previous ones, mainly because of mode decision. In this paper, we propose a three-step mode decision algorithm for fast encoding in H.264/AVC. In the first step, we select skip mode or inter mode by considering the temporal correlation and spatial correlation. In the second step, if the result of the first step is INTER mode, we select one group between two groups for final mode. In the third step, we select final mode by exploiting the pixel values of error macroblock or the modes of adjacent macroblocks. Simulations show that the proposed method reduces the encoding time by 42% on average without any significant PSNR losses.

  • PDF

A Study On Development of Fast Image Detector System (고속 영상 검지기 시스템 개발에 관한 연구)

  • Kim Byung Chul;Ha Dong Mun;Kim Yong Deak
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.41 no.1
    • /
    • pp.25-32
    • /
    • 2004
  • Nowadays image processing is very useful for some field of traffic applications. The one reason is we can construct the system in a low price, the other is the improvement of hardware processing power, it can be more fast to processing the data. In traffic field, the development of image using system is interesting issue. Because it has the advantage of price of installation and it does not obstruct traffic during the installation. In this study, 1 propose the traffic monitoring system that implement on the embedded system environment. The whole system consists of two main part, one is host controller board, the other is image processing board. The part of host controller board take charge of control the total system interface of external environment, and OSD(On screen display). The part of image processing board takes charge of image input and output using video encoder and decoder, Image classification and memory control of using FPGA, control of mouse signal. And finally, for stable operation of host controller board, uC/OS-II operating system is ported on the board.

Fast Sub-pixel Search Control by using Neighbor Motion Vector in H.264 (H.264에서 주변 움직임 벡터를 이용한 고속 부 화소 탐색 제어 기법)

  • La, Byeong-Du;Eom, Min-Young;Choe, Yoon-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.3
    • /
    • pp.16-22
    • /
    • 2007
  • Motion Estimation time in the H.264 has a large portion of encoding time and must be improved for real time application. Most of proposed motion estimation algorithm including Sub-pixel search use the fast search algorithm to speed up motion estimation by targeting the performance of full search in the reference code. This paper proposes a novel fast sub-pixel search control algorithm for H.264 encoder by using neighbor motion vector after analyzing the encoded Motion vector of video sequence. In addition the horizontal/vertical searching method is proposed with the horizontal/vertical directionality of motion vector. And the evaluation is performed with the proposed algorithms and other reference algorithms.