• Title/Summary/Keyword: Encoder Model

Search Result 364, Processing Time 0.027 seconds

Prediction of aerodynamic force coefficients and flow fields of airfoils using CNN and Encoder-Decoder models (합성곱 신경망과 인코더-디코더 모델들을 이용한 익형의 유체력 계수와 유동장 예측)

  • Janghoon, Seo;Hyun Sik, Yoon;Min Il, Kim
    • Journal of the Korean Society of Visualization
    • /
    • v.20 no.3
    • /
    • pp.94-101
    • /
    • 2022
  • The evaluation of the drag and lift as the aerodynamic performance of airfoils is essential. In addition, the analysis of the velocity and pressure fields is needed to support the physical mechanism of the force coefficients of the airfoil. Thus, the present study aims at establishing two different deep learning models to predict force coefficients and flow fields of the airfoil. One is the convolutional neural network (CNN) model to predict drag and lift coefficients of airfoil. Another is the Encoder-Decoder (ED) model to predict pressure distribution and velocity vector field. The images of airfoil section are applied as the input data of both models. Thus, the computational fluid dynamics (CFD) is adopted to form the dataset to training and test of both CNN models. The models are established by the convergence performance for the various hyperparameters. The prediction capability of the established CNN model and ED model is evaluated for the various NACA sections by comparing the true results obtained by the CFD, resulting in the high accurate prediction. It is noted that the predicted results near the leading edge, where the velocity has sharp gradient, reveal relatively lower accuracies. Therefore, the more and high resolved dataset are required to improve the highly nonlinear flow fields.

Semi-supervised learning of speech recognizers based on variational autoencoder and unsupervised data augmentation (변분 오토인코더와 비교사 데이터 증강을 이용한 음성인식기 준지도 학습)

  • Jo, Hyeon Ho;Kang, Byung Ok;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.6
    • /
    • pp.578-586
    • /
    • 2021
  • We propose a semi-supervised learning method based on Variational AutoEncoder (VAE) and Unsupervised Data Augmentation (UDA) to improve the performance of an end-to-end speech recognizer. In the proposed method, first, the VAE-based augmentation model and the baseline end-to-end speech recognizer are trained using the original speech data. Then, the baseline end-to-end speech recognizer is trained again using data augmented from the learned augmentation model. Finally, the learned augmentation model and end-to-end speech recognizer are re-learned using the UDA-based semi-supervised learning method. As a result of the computer simulation, the augmentation model is shown to improve the Word Error Rate (WER) of the baseline end-to-end speech recognizer, and further improve its performance by combining it with the UDA-based learning method.

Zero-anaphora resolution in Korean based on deep language representation model: BERT

  • Kim, Youngtae;Ra, Dongyul;Lim, Soojong
    • ETRI Journal
    • /
    • v.43 no.2
    • /
    • pp.299-312
    • /
    • 2021
  • It is necessary to achieve high performance in the task of zero anaphora resolution (ZAR) for completely understanding the texts in Korean, Japanese, Chinese, and various other languages. Deep-learning-based models are being employed for building ZAR systems, owing to the success of deep learning in the recent years. However, the objective of building a high-quality ZAR system is far from being achieved even using these models. To enhance the current ZAR techniques, we fine-tuned a pretrained bidirectional encoder representations from transformers (BERT). Notably, BERT is a general language representation model that enables systems to utilize deep bidirectional contextual information in a natural language text. It extensively exploits the attention mechanism based upon the sequence-transduction model Transformer. In our model, classification is simultaneously performed for all the words in the input word sequence to decide whether each word can be an antecedent. We seek end-to-end learning by disallowing any use of hand-crafted or dependency-parsing features. Experimental results show that compared with other models, our approach can significantly improve the performance of ZAR.

Bird's Eye View Semantic Segmentation based on Improved Transformer for Automatic Annotation

  • Tianjiao Liang;Weiguo Pan;Hong Bao;Xinyue Fan;Han Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.1996-2015
    • /
    • 2023
  • High-definition (HD) maps can provide precise road information that enables an autonomous driving system to effectively navigate a vehicle. Recent research has focused on leveraging semantic segmentation to achieve automatic annotation of HD maps. However, the existing methods suffer from low recognition accuracy in automatic driving scenarios, leading to inefficient annotation processes. In this paper, we propose a novel semantic segmentation method for automatic HD map annotation. Our approach introduces a new encoder, known as the convolutional transformer hybrid encoder, to enhance the model's feature extraction capabilities. Additionally, we propose a multi-level fusion module that enables the model to aggregate different levels of detail and semantic information. Furthermore, we present a novel decoupled boundary joint decoder to improve the model's ability to handle the boundary between categories. To evaluate our method, we conducted experiments using the Bird's Eye View point cloud images dataset and Cityscapes dataset. Comparative analysis against stateof-the-art methods demonstrates that our model achieves the highest performance. Specifically, our model achieves an mIoU of 56.26%, surpassing the results of SegFormer with an mIoU of 1.47%. This innovative promises to significantly enhance the efficiency of HD map automatic annotation.

A Study on Residual U-Net for Semantic Segmentation based on Deep Learning (딥러닝 기반의 Semantic Segmentation을 위한 Residual U-Net에 관한 연구)

  • Shin, Seokyong;Lee, SangHun;Han, HyunHo
    • Journal of Digital Convergence
    • /
    • v.19 no.6
    • /
    • pp.251-258
    • /
    • 2021
  • In this paper, we proposed an encoder-decoder model utilizing residual learning to improve the accuracy of the U-Net-based semantic segmentation method. U-Net is a deep learning-based semantic segmentation method and is mainly used in applications such as autonomous vehicles and medical image analysis. The conventional U-Net occurs loss in feature compression process due to the shallow structure of the encoder. The loss of features causes a lack of context information necessary for classifying objects and has a problem of reducing segmentation accuracy. To improve this, The proposed method efficiently extracted context information through an encoder using residual learning, which is effective in preventing feature loss and gradient vanishing problems in the conventional U-Net. Furthermore, we reduced down-sampling operations in the encoder to reduce the loss of spatial information included in the feature maps. The proposed method showed an improved segmentation result of about 12% compared to the conventional U-Net in the Cityscapes dataset experiment.

Hot Place Detection Based on ConvLSTM AutoEncoder Using Foot Traffic Data (유동인구를 활용한 ConvLSTM AutoEncoder 기반 핫플레이스 탐지)

  • Ju-Young Lee;Heon-Jin Park
    • The Journal of Bigdata
    • /
    • v.8 no.2
    • /
    • pp.97-107
    • /
    • 2023
  • Small business owners are relatively likely to be alienated from various benefits caused by the change to a big data/AI-based society. To support them, we would like to detect a hot place based on the floating population to support small business owners' decision-making in the start-up area. Through various studies, it is known that the population size of the region has an important effect on the sales of small business owners. In this study, inland regions were extracted from the Incheon floating population data from January 2019 to June 2022. the Data is consisted of a grid of 50m intervals, central coordinates and the population for each grid are presented, made image structure through imputation to maintain spatial information. Spatial outliers were removed and imputated using LOF and GAM, and temporal outliers were removed and imputated through LOESS. We used ConvLSTM which can take both temporal and spatial characteristics into account as a predictive model, and used AutoEncoder structure, which performs outliers detection based on reconstruction error to define an area with high MAPE as a hot place.

A BERT-Based Automatic Scoring Model of Korean Language Learners' Essay

  • Lee, Jung Hee;Park, Ji Su;Shon, Jin Gon
    • Journal of Information Processing Systems
    • /
    • v.18 no.2
    • /
    • pp.282-291
    • /
    • 2022
  • This research applies a pre-trained bidirectional encoder representations from transformers (BERT) handwriting recognition model to predict foreign Korean-language learners' writing scores. A corpus of 586 answers to midterm and final exams written by foreign learners at the Intermediate 1 level was acquired and used for pre-training, resulting in consistent performance, even with small datasets. The test data were pre-processed and fine-tuned, and the results were calculated in the form of a score prediction. The difference between the prediction and actual score was then calculated. An accuracy of 95.8% was demonstrated, indicating that the prediction results were strong overall; hence, the tool is suitable for the automatic scoring of Korean written test answers, including grammatical errors, written by foreigners. These results are particularly meaningful in that the data included written language text produced by foreign learners, not native speakers.

An Optimization on the Psychoacoustic Model for MPEG-2 AAC Encoder (MPEG-2 AAC Encoder의 심리음향 모델 최적화)

  • Park, Jong-Tae;Moon, Kyu-Sung;Rhee, Kang-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.38 no.2
    • /
    • pp.33-41
    • /
    • 2001
  • Currently, the compression is one of the most important technology in multimedia society. Audio files arc rapidly propagated throughout internet Among them, the most famous one is MP-3(MPEC-1 Laver3) which can obtain CD tone from 128Kbps, but tone quality is abruptly down below 64Kbps. MPEC-II AAC(Advanccd Audio Coding) is not compatible with MPEG 1, but it has high compression of 1.4 times than MP 3, has max. 7.1 and 96KHz sampling rate. In this paper, we propose an algorithm that decreased the capacity of AAC encoding computation but increased the processing speed by optimizing psychoacoustic model which has enormous amount of computation in MPEG 2 AAC encoder. The optimized psychoacoustic model algorithm was implemented by C++ language. The experiment shows that the psychoacoustic model carries out FFT(Fast Fourier Transform) computation of 3048 point with 44.1 KHz sampling rate for SMR(Signal to Masking Ratio), and each entropy value is inputted to the subband filters for the control of encoder block. The proposed psychoacoustic model is operated with high speed because of optimization of unpredictable value. Also, when we transform unpredictable value into a tonality index, the speed of operation process is increased by a tonality index optimized in high frequency range.

  • PDF

Design of HEVC CABAC Encoder With Parallel Processing of Bypass Bins (우회 빈의 병렬처리가 가능한 HEVC CABAC 부호화기의 설계)

  • Kim, Doohwan;Moon, Jeonhak;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • v.19 no.4
    • /
    • pp.583-589
    • /
    • 2015
  • In the HEVC CABAC, the probability model is updated after a bin is encoded and next bin is encoded based on the updated probability model. Conventional CABAC encoders can encode only one bin per cycle, which cannot increase the encoding throughput. The probability model does not need to be updated in the bypass bins. In this paper, a HEVC CABAC encoder is proposed to increase encoding throughput by parallel processing of bypass bins. The designed CABAC encoder can process either a regular bin or maximum 4 bypass bins in a cycle. On the average, it can process 1.15~1.92 bins in a cycle. Synthesized in 0.18 um technology, its gate count, maximum operating speed, and the maximum throughput are 78,698 gates, 136 MHz, and 261 Mbin/s, respectively.

A Performance Comparison of Spatial Scalable Encoders with the Constrained Coding Modes for T-DMB/AT-DMB Services (T-DMB/AT-DMB 서비스를 위한 부호화 모드 제한을 갖는 공간 확장성 부호기의 성능 비교)

  • Kim, Jin-Soo;Park, Jong-Kab;Kim, Kyu-Seok;Choi, Sung-Jin;Seo, Kwang-Deok;Kim, Jae-Gon
    • Journal of Broadcast Engineering
    • /
    • v.13 no.4
    • /
    • pp.501-515
    • /
    • 2008
  • Recently, as users' requests for high quality mobile multimedia services are rapidly increasing and additional bandwidth can be provided by adopting the hierarchical modulation transmission technology, the research on the Advanced Terrestrial DMB (AT-DMB) service using the SVC (Scalable Video Coding) scheme is being actively studied. But, in order to realize a compatible video service and to accelerate the successful standardization and commercialization, it is necessary to simplify the compatible encoder structure. In this parer, we propose a fast mode decision method by constraining the redundant coding modes in the spatial scalable encoder that keeps the current T-DMB video in base layer. The proposed method is based on the statistical characteristics of each coding mode at both base and enhancement layers, inter-layer predictions, which are derived by investigating macroblock-layer coding modes of the spatial scalable encoder's functional structure. Through computer simulations, it is shown that a simplified encoder model that reduces the heavy computational burden can be found, while keeping the objective visual quality very high.