• Title/Summary/Keyword: Encoder-decoder

Search Result 447, Processing Time 0.038 seconds

VLSI Design of DWT-based Image Processor for Real-Time Image Compression and Reconstruction System (실시간 영상압축과 복원시스템을 위한 DWT기반의 영상처리 프로세서의 VLSI 설계)

  • Seo, Young-Ho;Kim, Dong-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.102-110
    • /
    • 2004
  • In this paper, we propose a VLSI structure of real-time image compression and reconstruction processor using 2-D discrete wavelet transform and implement into a hardware which use minimal hardware resource using ASIC library. In the implemented hardware, Data path part consists of the DWT kernel for the wavelet transform and inverse transform, quantizer/dequantizer, the huffman encoder/huffman decoder, the adder/buffer for the inverse wavelet transform, and the interface modules for input/output. Control part consists of the programming register, the controller which decodes the instructions and generates the control signals, and the status register for indicating the internal state into the external of circuit. According to the programming condition, the designed circuit has the various selective output formats which are wavelet coefficient, quantization coefficient or index, and Huffman code in image compression mode, and Huffman decoding result, reconstructed quantization coefficient, and reconstructed wavelet coefficient in image reconstructed mode. The programming register has 16 stages and one instruction can be used for a horizontal(or vertical) filtering in a level. Since each register automatically operated in the right order, 4-level discrete wavelet transform can be executed by a programming. We synthesized the designed circuit with synthesis library of Hynix 0.35um CMOS fabrication using the synthesis tool, Synopsys and extracted the gate-level netlist. From the netlist, timing information was extracted using Vela tool. We executed the timing simulation with the extracted netlist and timing information using NC-Verilog tool. Also PNR and layout process was executed using Apollo tool. The Implemented hardware has about 50,000 gate sizes and stably operates in 80MHz clock frequency.

Adaptive Intra Prediction Method using Modified Cubic-function and DCT-IF (변형된 3차 함수와 DCT-IF를 이용한 적응적 화면내 예측 방법)

  • Lee, Han-Sik;Lee, Ju-Ock;Moon, Joo-Hee
    • Journal of Broadcast Engineering
    • /
    • v.17 no.5
    • /
    • pp.756-764
    • /
    • 2012
  • In current HEVC, prediction pixels are finally calculated by linear-function interpolation on two reference pixels. It is hard to expect good performance on the case of occurring large difference between two reference pixels. This paper decides more accurate prediction pixel values than current HEVC using linear function. While existing prediction process only uses two reference pixels, proposed method uses DCT-IF. DCT-IF analyses frequency characteristics of more than two reference pixels in frequency domain. And proposed method calculates prediction value adaptively by using linear-function, DCT-IF and cubic-function to decide more accurate interpolation value than to only use linear function. Cubic-function has a steep slope than linear-function. So, using cubic-function is utilized on edge in prediction unit. The complexity of encoder and decoder in HM6.0 has increased 3% and 1%, respectively. BD-rate has decreased 0.4% in luma signal Y, 0.3% in chroma signal U and 0.3% in chroma signal V in average. Through this experiment, proposed adaptive intra prediction method using DCT-IF and cubic-function shows increased performance than HM6.0.

Adaptive In-loop Filter Method for High-efficiency Video Coding (고효율 비디오 부호화를 위한 적응적 인-루프 필터 방법)

  • Jung, Kwang-Su;Nam, Jung-Hak;Lim, Woong;Jo, Hyun-Ho;Sim, Dong-Gyu;Choi, Byeong-Doo;Cho, Dae-Sung
    • Journal of Broadcast Engineering
    • /
    • v.16 no.1
    • /
    • pp.1-13
    • /
    • 2011
  • In this paper, we propose an adaptive in-loop filter to improve the coding efficiency. Recently, there are post-filter hint SEI and block-based adaptive filter control (BAFC) methods based on the Wiener filter which can minimize the mean square error between the input image and the decoded image in video coding standards. However, since the post-filter hint SEI is applied only to the output image, it cannot reduce the prediction errors of the subsequent frames. Because BAFC is also conducted with a deblocking filter, independently, it has a problem of high computational complexity on the encoder and decoder sides. In this paper, we propose the low-complexity adaptive in-loop filter (LCALF) which has lower computational complexity by using H.264/AVC deblocking filter, adaptively, as well as shows better performance than the conventional method. In the experimental results, the computational complexity of the proposed method is reduced about 22% than the conventional method. Furthermore, the coding efficiency of the proposed method is about 1% better than the BAFC.

Real-time Implementation of the AMR Speech Coder Using $OakDSPCore^{\circledR}$ ($OakDSPCore^{\circledR}$를 이용한 적응형 다중 비트 (AMR) 음성 부호화기의 실시간 구현)

  • 이남일;손창용;이동원;강상원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.6
    • /
    • pp.34-39
    • /
    • 2001
  • An adaptive multi-rate (AMR) speech coder was adopted as a standard of W-CDMA by 3GPP and ETSI. The AMR coder is based on the CELP algorithm operating at rates ranging from 12.2 kbps down to 4.75 kbps, and it is a source controlled codec according to the channel error conditions and the traffic loading. In this paper, we implement the DSP S/W of the AMR coder using OakDSPCore. The implementation is based on the CSD17C00A chip developed by C&S Technology, and it is tested using test vectors, for the AMR speech codec, provided by ETSI for the bit exact implementation. The DSP B/W requires 20.6 MIPS for the encoder and 2.7 MIPS for the decoder. Memories required by the Am coder were 21.97 kwords, 6.64 kwords and 15.1 kwords for code, data sections and data ROM, respectively. Also, actual sound input/output test using microphone and speaker demonstrates its proper real-time operation without distortions or delays.

  • PDF

A Real-time H.264 to MPEG-2 Transcoding for Ship to Shore Communication (선박-육지간 통신을 위한 실시간 H.264 to MPEG-2 트랜스코딩)

  • Son, Nam-Rye;Jeong, Min-A;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.1
    • /
    • pp.90-102
    • /
    • 2011
  • Recently, the grade of users using wireless communication services which transmits and re-transmits to the signal via the broadcasting satellite have a variety. However the ships not preparing of H.264 standard devices should not received the realtime data because the broadcasting stations have transmitted the compressed video data through the satellite communication. Therefore this paper proposes H.264 to MPEG-2 transcoding for the ships using MPEG-2 devices. Proposed method improves a speed and object quality in H.264 to MPEG-2 transcoding by analysis features of macroblock modes in H.264. In the Intra mode of P-frame, it proposes new method by computing coincidence proportion after comparing of Intra mode methods of H.264 and MPEG-2. In the Inter mode, it proposes a PMV(predictive motion vector) considering movement of motion vectors in H.264 decoder. we reuses a PMV directly as like the final MV in MPEG-2 encoder and refinements the MV after coincidence ratio comparing of variable motion vectors of H.264 and these of MPEG-2. The experimental results from proposed method show a considerable reduction in processing time, as much as 70% and 67% respectively, with a small objective quality reduction in PSNR.

Semi-supervised domain adaptation using unlabeled data for end-to-end speech recognition (라벨이 없는 데이터를 사용한 종단간 음성인식기의 준교사 방식 도메인 적응)

  • Jeong, Hyeonjae;Goo, Jahyun;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.29-37
    • /
    • 2020
  • Recently, the neural network-based deep learning algorithm has dramatically improved performance compared to the classical Gaussian mixture model based hidden Markov model (GMM-HMM) automatic speech recognition (ASR) system. In addition, researches on end-to-end (E2E) speech recognition systems integrating language modeling and decoding processes have been actively conducted to better utilize the advantages of deep learning techniques. In general, E2E ASR systems consist of multiple layers of encoder-decoder structure with attention. Therefore, E2E ASR systems require data with a large amount of speech-text paired data in order to achieve good performance. Obtaining speech-text paired data requires a lot of human labor and time, and is a high barrier to building E2E ASR system. Therefore, there are previous studies that improve the performance of E2E ASR system using relatively small amount of speech-text paired data, but most studies have been conducted by using only speech-only data or text-only data. In this study, we proposed a semi-supervised training method that enables E2E ASR system to perform well in corpus in different domains by using both speech or text only data. The proposed method works effectively by adapting to different domains, showing good performance in the target domain and not degrading much in the source domain.

Signaling Method of Multiple Motion Vector Resolutions Using Contradiction Testing (모순 검증을 통한 다중 움직임 벡터 해상도 시그널링 방법)

  • Won, Kwanghyun;Park, Younghyeon;Jeon, Byeungwoo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.7
    • /
    • pp.107-118
    • /
    • 2015
  • Although most current video coding standards set a fixed motion vector resolution like quarter-pel accuracy, a scheme supporting multiple motion vector resolutions can improve the coding efficiency of video since it can allow to use just required motion vector accuracy depending on the video content and at the same time to generate more accurate motion predictor. However, the selected motion vector resolution for each motion vector is a signaling overhead. This paper proposes a contradiction testing-based signaling scheme of the motion vector resolution. The proposed method selects a best resolution for each motion vector among multiple candidates in such a way to produce the minimum amount of coded bits for the motion vector. The signaling overhead is reduced by contradiction testing that operates under a predefined criterion at both encoder and decoder with a purpose of pruning irrelevant candidate motion vector resolutions from signaling responsibility. Experimental results verified that the proposed scheme is effective in reducing coded motion information by achieving its $Bj{\o}ntegaard$ delta bit rate (BDBR) gain of about 4.01% on average (and up to 15.17%) compared to the conventional scheme with a fixed motion vector resolution.

Automatic Text Summarization based on Selective Copy mechanism against for Addressing OOV (미등록 어휘에 대한 선택적 복사를 적용한 문서 자동요약)

  • Lee, Tae-Seok;Seon, Choong-Nyoung;Jung, Youngim;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.58-65
    • /
    • 2019
  • Automatic text summarization is a process of shortening a text document by either extraction or abstraction. The abstraction approach inspired by deep learning methods scaling to a large amount of document is applied in recent work. Abstractive text summarization involves utilizing pre-generated word embedding information. Low-frequent but salient words such as terminologies are seldom included to dictionaries, that are so called, out-of-vocabulary(OOV) problems. OOV deteriorates the performance of Encoder-Decoder model in neural network. In order to address OOV words in abstractive text summarization, we propose a copy mechanism to facilitate copying new words in the target document and generating summary sentences. Different from the previous studies, the proposed approach combines accurate pointing information and selective copy mechanism based on bidirectional RNN and bidirectional LSTM. In addition, neural network gate model to estimate the generation probability and the loss function to optimize the entire abstraction model has been applied. The dataset has been constructed from the collection of abstractions and titles of journal articles. Experimental results demonstrate that both ROUGE-1 (based on word recall) and ROUGE-L (employed longest common subsequence) of the proposed Encoding-Decoding model have been improved to 47.01 and 29.55, respectively.

Development of Fender Segmentation System for Port Structures using Vision Sensor and Deep Learning (비전센서 및 딥러닝을 이용한 항만구조물 방충설비 세분화 시스템 개발)

  • Min, Jiyoung;Yu, Byeongjun;Kim, Jonghyeok;Jeon, Haemin
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.26 no.2
    • /
    • pp.28-36
    • /
    • 2022
  • As port structures are exposed to various extreme external loads such as wind (typhoons), sea waves, or collision with ships; it is important to evaluate the structural safety periodically. To monitor the port structure, especially the rubber fender, a fender segmentation system using a vision sensor and deep learning method has been proposed in this study. For fender segmentation, a new deep learning network that improves the encoder-decoder framework with the receptive field block convolution module inspired by the eccentric function of the human visual system into the DenseNet format has been proposed. In order to train the network, various fender images such as BP, V, cell, cylindrical, and tire-types have been collected, and the images are augmented by applying four augmentation methods such as elastic distortion, horizontal flip, color jitter, and affine transforms. The proposed algorithm has been trained and verified with the collected various types of fender images, and the performance results showed that the system precisely segmented in real time with high IoU rate (84%) and F1 score (90%) in comparison with the conventional segmentation model, VGG16 with U-net. The trained network has been applied to the real images taken at one port in Republic of Korea, and found that the fenders are segmented with high accuracy even with a small dataset.

Prediction of Music Generation on Time Series Using Bi-LSTM Model (Bi-LSTM 모델을 이용한 음악 생성 시계열 예측)

  • Kwangjin, Kim;Chilwoo, Lee
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.65-75
    • /
    • 2022
  • Deep learning is used as a creative tool that could overcome the limitations of existing analysis models and generate various types of results such as text, image, and music. In this paper, we propose a method necessary to preprocess audio data using the Niko's MIDI Pack sound source file as a data set and to generate music using Bi-LSTM. Based on the generated root note, the hidden layers are composed of multi-layers to create a new note suitable for the musical composition, and an attention mechanism is applied to the output gate of the decoder to apply the weight of the factors that affect the data input from the encoder. Setting variables such as loss function and optimization method are applied as parameters for improving the LSTM model. The proposed model is a multi-channel Bi-LSTM with attention that applies notes pitch generated from separating treble clef and bass clef, length of notes, rests, length of rests, and chords to improve the efficiency and prediction of MIDI deep learning process. The results of the learning generate a sound that matches the development of music scale distinct from noise, and we are aiming to contribute to generating a harmonistic stable music.