• Title/Summary/Keyword: encoder-decoder

Search Result 451, Processing Time 0.024 seconds

Hyperparameter experiments on end-to-end automatic speech recognition

  • Yang, Hyungwon;Nam, Hosung
    • Phonetics and Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.45-51
    • /
    • 2021
  • End-to-end (E2E) automatic speech recognition (ASR) has achieved promising performance gains with the introduced self-attention network, Transformer. However, due to training time and the number of hyperparameters, finding the optimal hyperparameter set is computationally expensive. This paper investigates the impact of hyperparameters in the Transformer network to answer two questions: which hyperparameter plays a critical role in the task performance and training speed. The Transformer network for training has two encoder and decoder networks combined with Connectionist Temporal Classification (CTC). We have trained the model with Wall Street Journal (WSJ) SI-284 and tested on devl93 and eval92. Seventeen hyperparameters were selected from the ESPnet training configuration, and varying ranges of values were used for experiments. The result shows that "num blocks" and "linear units" hyperparameters in the encoder and decoder networks reduce Word Error Rate (WER) significantly. However, performance gain is more prominent when they are altered in the encoder network. Training duration also linearly increased as "num blocks" and "linear units" hyperparameters' values grow. Based on the experimental results, we collected the optimal values from each hyperparameter and reduced the WER up to 2.9/1.9 from dev93 and eval93 respectively.

KI-HABS: Key Information Guided Hierarchical Abstractive Summarization

  • Zhang, Mengli;Zhou, Gang;Yu, Wanting;Liu, Wenfen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4275-4291
    • /
    • 2021
  • With the unprecedented growth of textual information on the Internet, an efficient automatic summarization system has become an urgent need. Recently, the neural network models based on the encoder-decoder with an attention mechanism have demonstrated powerful capabilities in the sentence summarization task. However, for paragraphs or longer document summarization, these models fail to mine the core information in the input text, which leads to information loss and repetitions. In this paper, we propose an abstractive document summarization method by applying guidance signals of key sentences to the encoder based on the hierarchical encoder-decoder architecture, denoted as KI-HABS. Specifically, we first train an extractor to extract key sentences in the input document by the hierarchical bidirectional GRU. Then, we encode the key sentences to the key information representation in the sentence level. Finally, we adopt key information representation guided selective encoding strategies to filter source information, which establishes a connection between the key sentences and the document. We use the CNN/Daily Mail and Gigaword datasets to evaluate our model. The experimental results demonstrate that our method generates more informative and concise summaries, achieving better performance than the competitive models.

Implementation of Encoder and Decoder for MPEG-7 BiM (MPEG-7 BiM 부호화기 및 복호화기의 구현)

  • Yeom, Ji-Hyeon;Kim, Min-Je;Lee, Han-Kyu;Kim, Hyeok-Man
    • Journal of Broadcast Engineering
    • /
    • v.12 no.2
    • /
    • pp.159-176
    • /
    • 2007
  • In the paper, we implemented a software system that encodes XML instance documents conforming to a schema document according to the MPEG-7 BiM compression method, and decodes the encoded documents vice versa. We designed software structures of BiM encoder and decoder as class hierarchies, and then implemented the structures. The implemented BiM encoder shows a compression ratio of 9.44% on the average. The BiM encoder is a general-purpose XML compressor that can encode any instance documents conforming to a schema document described in XML Schema language including the MPEG-7 schema. The BiM encoder thus can be used in many application fields including digital broadcasting environment, where encoding XML instance documents is needed.

A Study on Residual U-Net for Semantic Segmentation based on Deep Learning (딥러닝 기반의 Semantic Segmentation을 위한 Residual U-Net에 관한 연구)

  • Shin, Seokyong;Lee, SangHun;Han, HyunHo
    • Journal of Digital Convergence
    • /
    • v.19 no.6
    • /
    • pp.251-258
    • /
    • 2021
  • In this paper, we proposed an encoder-decoder model utilizing residual learning to improve the accuracy of the U-Net-based semantic segmentation method. U-Net is a deep learning-based semantic segmentation method and is mainly used in applications such as autonomous vehicles and medical image analysis. The conventional U-Net occurs loss in feature compression process due to the shallow structure of the encoder. The loss of features causes a lack of context information necessary for classifying objects and has a problem of reducing segmentation accuracy. To improve this, The proposed method efficiently extracted context information through an encoder using residual learning, which is effective in preventing feature loss and gradient vanishing problems in the conventional U-Net. Furthermore, we reduced down-sampling operations in the encoder to reduce the loss of spatial information included in the feature maps. The proposed method showed an improved segmentation result of about 12% compared to the conventional U-Net in the Cityscapes dataset experiment.

Low Complexity Video Encoding Using Turbo Decoding Error Concealments for Sensor Network Application (센서네트워크상의 응용을 위한 터보 복호화 오류정정 기법을 이용한 경량화 비디오 부호화 방법)

  • Ko, Bong-Hyuck;Shim, Hyuk-Jae;Jeon, Byeung-Woo
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.1
    • /
    • pp.11-21
    • /
    • 2008
  • In conventional video coding, the complexity of encoder is much higher than that of decoder. However, as more needs arises for extremely simple encoder in environments having constrained energy such as sensor network, much investigation has been carried out for eliminating motion prediction/compensation claiming most complexity and energy in encoder. The Wyner-Ziv coding, one of the representative schemes for the problem, reconstructs video at decoder by correcting noise on side information using channel coding technique such as turbo code. Since the encoder generates only parity bits without performing any type of processes extracting correlation information between frames, it has an extremely simple structure. However, turbo decoding errors occur in noisy side information. When there are high-motion or occlusion between frames, more turbo decoding errors appear in reconstructed frame and look like Salt & Pepper noise. This severely deteriorates subjective video quality even though such noise rarely occurs. In this paper, we propose a computationally extremely light encoder based on symbol-level Wyner-Ziv coding technique and a new corresponding decoder which, based on a decision whether a pixel has error or not, applies median filter selectively in order to minimize loss of texture detail from filtering. The proposed method claims extremely low encoder complexity and shows improvements both in subjective quality and PSNR. Our experiments have verified average PSNR gain of up to 0.8dB.

Performance Analysis of the PCAE and PCAD in FO-CDMA Communication Network (FO-CDMA 통신망에서 PCAE와 PCAD 동작특성 분석)

  • Kang, Tae-Gu;Choi, Young-Wan
    • Journal of The Institute of Information and Telecommunication Facilities Engineering
    • /
    • v.2 no.4
    • /
    • pp.5-16
    • /
    • 2003
  • We have analyzed the performance of optical matched filters in the fiber-optic code division multiple access (FO-CDMA) system based on optical parallel coupler access encoder (PCAE) and parallel coupler access decoder (PCAD) by experiment. In previous studies, the performance evaluation of the FO-CDMA system using SCAE and SCAD was not sufficiently accurate because they analyzed system performance only considering the first order signals. Since optical SCAE and SCAD intrinsically have high order signals of various patterns as the number of coupler increases, they change auto- and cross-correlation intensities. Thus, it is necessary to investigate properties of the PCAE and PCAD so that we may analyze the exact performance of system. In this paper, it is found that the peak to sidelobe ratio using the PCAE and PCAD increases as $\alpha$ (coupling coefficient) value increases. Also, we found that the proposed PCAE and PCAD are superior to SCAE and SCAD in performance improvement.

  • PDF

Real-time Implementation of Variable Transmission Bit Rate Vocoder Improved Speech Quality in SOLA-B Algorithm & G.729A Vocoder Using on the TMS320C5416 (TMS320C5416을 이용한 SOLA-B 알고리즘과 G.729A 보코더의 음질 향상된 가변 전송률 보코더의 실시간 구현)

  • Ham, Myung-Kyu;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.241-250
    • /
    • 2003
  • In this paper, we implemented the vocoder of variable rate by applying the SOLA-B algorithm to the G.729A to the TMS320C5416 in real-time. This method using the SOLA-B algorithm is that it is reduced the duration of the speech in encoding and is played at the speed of normal by extending the duration of the speech in decoding. But the method applied to the existed G.729A and SOLA-B algorithm is caused the loss of speech quality in G.729A which is not reflected about length variation of speech. Therefore the proposed method is encoded according as it is modified the structure of LSP quantization table about the length of speech is reduced by using the SOLA-B algorithm. The vocoder of variable rate by applying the G.729A and SOLA-B algorithm is represented the maximum complexity of 10.2MIPS about encoder and 2.8MIPS about decoder in 8kbps transmission rate. Also it is evaluated 17.3MIPS about encoder, 9.9MIPS about decoder in 6kbps and 18.5MIPS about encoder, 11.1MIPS about decoder in 4kbps according to the transmission rate. The used memory is about program ROM 9.7kwords, table ROM 4.69kwords, RAM 5.2kwords. The waveform of output is showed by the result of C simulator and Bit Exact. Also, the result of MOS test for evaluation of speech quality of the vocoder of variable rate which is implemented in real-time, it is estimated about 3.68 in 4kbps.

  • PDF

Two-dimensional OCDMA Encoder/Decoder Composed of Double Ring Add/Drop Filters and All-pass Delay Filters (이중 링 Add/Drop 필터와 All-pass 지연 필터로 구성된 이차원 OCDMA 인코더/디코더)

  • Chung, Youngchul
    • Korean Journal of Optics and Photonics
    • /
    • v.33 no.3
    • /
    • pp.106-112
    • /
    • 2022
  • A two-dimensional optical code division multiple access (OCDMA) encoder/decoder, which is composed of add/drop filters and all-pass filters for delay operation, is proposed. An example design is presented, and its feasibility is illustrated through numerical simulations. The chip area of the proposed OCDMA encoder/decoder could be about one-third that of a previous OCDMA device employing delay waveguides. Its performance is numerically investigated using the transfer-matrix method combined with the fast Fourier transform. The autocorrelation peak level over the maximum cross-correlation level for incorrect wavelength hopping and spectral phase code combinations is greater than 3 at the center of the correctly decoded pulse, which assures a bit error rate lower than 10-3, corresponding to the forward error-correction limit.

Prediction of aerodynamic force coefficients and flow fields of airfoils using CNN and Encoder-Decoder models (합성곱 신경망과 인코더-디코더 모델들을 이용한 익형의 유체력 계수와 유동장 예측)

  • Janghoon, Seo;Hyun Sik, Yoon;Min Il, Kim
    • Journal of the Korean Society of Visualization
    • /
    • v.20 no.3
    • /
    • pp.94-101
    • /
    • 2022
  • The evaluation of the drag and lift as the aerodynamic performance of airfoils is essential. In addition, the analysis of the velocity and pressure fields is needed to support the physical mechanism of the force coefficients of the airfoil. Thus, the present study aims at establishing two different deep learning models to predict force coefficients and flow fields of the airfoil. One is the convolutional neural network (CNN) model to predict drag and lift coefficients of airfoil. Another is the Encoder-Decoder (ED) model to predict pressure distribution and velocity vector field. The images of airfoil section are applied as the input data of both models. Thus, the computational fluid dynamics (CFD) is adopted to form the dataset to training and test of both CNN models. The models are established by the convergence performance for the various hyperparameters. The prediction capability of the established CNN model and ED model is evaluated for the various NACA sections by comparing the true results obtained by the CFD, resulting in the high accurate prediction. It is noted that the predicted results near the leading edge, where the velocity has sharp gradient, reveal relatively lower accuracies. Therefore, the more and high resolved dataset are required to improve the highly nonlinear flow fields.

Design of Reed Solomon Encoder/Decoder for Compact Disks (컴팩트 디스크를 위한 Reed Solomon 부호기/복호기 설계)

  • 김창훈;박성모
    • Proceedings of the IEEK Conference
    • /
    • 2000.11b
    • /
    • pp.281-284
    • /
    • 2000
  • This paper describes design of a (32, 28) Reed Solomon decoder for optical compact disk with double error detecting and correcting capability. A variety of error correction codes(ECCs) have been used in magnetic recordings, and optical recordings. Among the various types of ECCs, Reed Solomon(RS) codes has emerged as one the most important ones. The most complex circuit in the RS decoder is the part for finding the error location numbers by solving error location polynomial, and the circuit has great influence on overall decoder complexity. We use RAM based architecture with Euclid's algorithm, Chien search algorithm and Forney algorithm. We have developed VHDL model and peformed logic synthesis using the SYNOPSYS CAD tool. The total umber of gate is about 11,000 gates.

  • PDF