• Title/Summary/Keyword: encoder- decoder

Search Result 454, Processing Time 0.03 seconds

2D and 3D Hand Pose Estimation Based on Skip Connection Form (스킵 연결 형태 기반의 손 관절 2D 및 3D 검출 기법)

  • Ku, Jong-Hoe;Kim, Mi-Kyung;Cha, Eui-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.12
    • /
    • pp.1574-1580
    • /
    • 2020
  • Traditional pose estimation methods include using special devices or images through image processing. The disadvantage of using a device is that the environment in which the device can be used is limited and costly. The use of cameras and image processing has the advantage of reducing environmental constraints and costs, but the performance is lower. CNN(Convolutional Neural Networks) were studied for pose estimation just using only camera without these disadvantage. Various techniques were proposed to increase cognitive performance. In this paper, the effect of the skip connection on the network was experimented by using various skip connections on the joint recognition of the hand. Experiments have confirmed that the presence of additional skip connections other than the basic skip connections has a better effect on performance, but the network with downward skip connections is the best performance.

Reversible Watermarking in JPEG Compression Domain (JPEG 압축 영역에서의 리버서블 워터마킹)

  • Cui, Xue-Nan;Choi, Jong-Uk;Kim, Hak-Il;Kim, Jong-Weon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.17 no.6
    • /
    • pp.121-130
    • /
    • 2007
  • In this paper, we propose a reversible watermarking scheme in the JPEG compression domain. The reversible watermarking is useful to authenticate the content without the quality loss because it preserves the original content when embed the watermark information. In the internet, for the purpose to save the storage space and improve the efficiency of communication, digital image is usually compressed by JPEG or GIF. Therefore, it is necessary to develop a reversible watermarking in the JPEG compression domain. When the watermark is embedded, the lossless compression was used and the original image is recovered during the watermark extracting process. The test results show that PSNRs are distributed from 38dB to 42dB and the payload is from 2.5Kbits to 3.4Kbits where the QF is 75. Where the QF of the Lena image is varied from 10 to 99, the PSNR is directly proportional to the QF and the payload is around $1.6{\sim}2.8Kbits$.

End-to-end speech recognition models using limited training data (제한된 학습 데이터를 사용하는 End-to-End 음성 인식 모델)

  • Kim, June-Woo;Jung, Ho-Young
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.63-71
    • /
    • 2020
  • Speech recognition is one of the areas actively commercialized using deep learning and machine learning techniques. However, the majority of speech recognition systems on the market are developed on data with limited diversity of speakers and tend to perform well on typical adult speakers only. This is because most of the speech recognition models are generally learned using a speech database obtained from adult males and females. This tends to cause problems in recognizing the speech of the elderly, children and people with dialects well. To solve these problems, it may be necessary to retain big database or to collect a data for applying a speaker adaptation. However, this paper proposes that a new end-to-end speech recognition method consists of an acoustic augmented recurrent encoder and a transformer decoder with linguistic prediction. The proposed method can bring about the reliable performance of acoustic and language models in limited data conditions. The proposed method was evaluated to recognize Korean elderly and children speech with limited amount of training data and showed the better performance compared of a conventional method.

Pedestrian and Vehicle Distance Estimation Based on Hard Parameter Sharing (하드 파라미터 쉐어링 기반의 보행자 및 운송 수단 거리 추정)

  • Seo, Ji-Won;Cha, Eui-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.389-395
    • /
    • 2022
  • Because of improvement of deep learning techniques, deep learning using computer vision such as classification, detection and segmentation has also been used widely at many fields. Expecially, automatic driving is one of the major fields that applies computer vision systems. Also there are a lot of works and researches to combine multiple tasks in a single network. In this study, we propose the network that predicts the individual depth of pedestrians and vehicles. Proposed model is constructed based on YOLOv3 for object detection and Monodepth for depth estimation, and it process object detection and depth estimation consequently using encoder and decoder based on hard parameter sharing. We also used attention module to improve the accuracy of both object detection and depth estimation. Depth is predicted with monocular image, and is trained using self-supervised training method.

Attention-based word correlation analysis system for big data analysis (빅데이터 분석을 위한 어텐션 기반의 단어 연관관계 분석 시스템)

  • Chi-Gon, Hwang;Chang-Pyo, Yoon;Soo-Wook, Lee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.27 no.1
    • /
    • pp.41-46
    • /
    • 2023
  • Recently, big data analysis can use various techniques according to the development of machine learning. Big data collected in reality lacks an automated refining technique for the same or similar terms based on semantic analysis of the relationship between words. Since most of the big data is described in general sentences, it is difficult to understand the meaning and terms of the sentences. To solve these problems, it is necessary to understand the morphological analysis and meaning of sentences. Accordingly, NLP, a technique for analyzing natural language, can understand the word's relationship and sentences. Among the NLP techniques, the transformer has been proposed as a way to solve the disadvantages of RNN by using self-attention composed of an encoder-decoder structure of seq2seq. In this paper, transformers are used as a way to form associations between words in order to understand the words and phrases of sentences extracted from big data.

A novel framework for correcting satellite-based precipitation products in Mekong river basin with discontinuous observed data

  • Xuan-Hien Le;Giang V. Nguyen;Sungho Jung;Giha Lee
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.173-173
    • /
    • 2023
  • The Mekong River Basin (MRB) is a crucial watershed in Asia, impacting over 60 million people across six developing nations. Accurate satellite-based precipitation products (SPPs) are essential for effective hydrological and watershed management in this region. However, the performance of SPPs has been varied and limited. The APHRODITE product, a unique gauge-based dataset for MRB, is widely used but is only available until 2015. In this study, we present a novel framework for correcting SPPs in the MRB by employing a deep learning approach that combines convolutional neural networks and encoder-decoder architecture to address pixel-by-pixel bias and enhance accuracy. The DLF was applied to four widely used SPPs (TRMM, CMORPH, CHIRPS, and PERSIANN-CDR) in MRB. For the original SPPs, the TRMM product outperformed the other SPPs. Results revealed that the DLF effectively bridged the spatial-temporal gap between the SPPs and the gauge-based dataset (APHRODITE). Among the four corrected products, ADJ-TRMM demonstrated the best performance, followed by ADJ-CDR, ADJ-CHIRPS, and ADJ-CMORPH. The DLF offered a robust and adaptable solution for bias correction in the MRB and beyond, capable of detecting intricate patterns and learning from data to make appropriate adjustments. With the discontinuation of the APHRODITE product, DLF represents a promising solution for generating a more current and reliable dataset for MRB research. This research showcased the potential of deep learning-based methods for improving the accuracy of SPPs, particularly in regions like the MRB, where gauge-based datasets are limited or discontinued.

  • PDF

Side-Channel Archive Framework Using Deep Learning-Based Leakage Compression (딥러닝을 이용한 부채널 데이터 압축 프레임 워크)

  • Sangyun Jung;Sunghyun Jin;Heeseok Kim
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.3
    • /
    • pp.379-392
    • /
    • 2024
  • With the rapid increase in data, saving storage space and improving the efficiency of data transmission have become critical issues, making the research on the efficiency of data compression technologies increasingly important. Lossless algorithms can precisely restore original data but have limited compression ratios, whereas lossy algorithms provide higher compression rates at the expense of some data loss. There has been active research in data compression using deep learning-based algorithms, especially the autoencoder model. This study proposes a new side-channel analysis data compressor utilizing autoencoders. This compressor achieves higher compression rates than Deflate while maintaining the characteristics of side-channel data. The encoder, using locally connected layers, effectively preserves the temporal characteristics of side-channel data, and the decoder maintains fast decompression times with a multi-layer perceptron. Through correlation power analysis, the proposed compressor has been proven to compress data without losing the characteristics of side-channel data.

Density map estimation based on deep-learning for pest control drone optimization (드론 방제의 최적화를 위한 딥러닝 기반의 밀도맵 추정)

  • Baek-gyeom Seong;Xiongzhe Han;Seung-hwa Yu;Chun-gu Lee;Yeongho Kang;Hyun Ho Woo;Hunsuk Lee;Dae-Hyun Lee
    • Journal of Drive and Control
    • /
    • v.21 no.2
    • /
    • pp.53-64
    • /
    • 2024
  • Global population growth has resulted in an increased demand for food production. Simultaneously, aging rural communities have led to a decrease in the workforce, thereby increasing the demand for automation in agriculture. Drones are particularly useful for unmanned pest control fields. However, the current method of uniform spraying leads to environmental damage due to overuse of pesticides and drift by wind. To address this issue, it is necessary to enhance spraying performance through precise performance evaluation. Therefore, as a foundational study aimed at optimizing drone-based pest control technologies, this research evaluated water-sensitive paper (WSP) via density map estimation using convolutional neural networks (CNN) with a encoder-decoder structure. To achieve more accurate estimation, this study implemented multi-task learning, incorporating an additional classifier for image segmentation alongside the density map estimation classifier. The proposed model in this study resulted in a R-squared (R2) of 0.976 for coverage area in the evaluation data set, demonstrating satisfactory performance in evaluating WSP at various density levels. Further research is needed to improve the accuracy of spray result estimations and develop a real-time assessment technology in the field.

Improved Deep Learning-based Approach for Spatial-Temporal Trajectory Planning via Predictive Modeling of Future Location

  • Zain Ul Abideen;Xiaodong Sun;Chao Sun;Hafiz Shafiq Ur Rehman Khalil
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.7
    • /
    • pp.1726-1748
    • /
    • 2024
  • Trajectory planning is vital for autonomous systems like robotics and UAVs, as it determines optimal, safe paths considering physical limitations, environmental factors, and agent interactions. Recent advancements in trajectory planning and future location prediction stem from rapid progress in machine learning and optimization algorithms. In this paper, we proposed a novel framework for Spatial-temporal transformer-based feed-forward neural networks (STTFFNs). From the traffic flow local area point of view, skip-gram model is trained on trajectory data to generate embeddings that capture the high-level features of different trajectories. These embeddings can then be used as input to a transformer-based trajectory planning model, which can generate trajectories for new objects based on the embeddings of similar trajectories in the training data. In the next step, distant regions, we embedded feedforward network is responsible for generating the distant trajectories by taking as input a set of features that represent the object's current state and historical data. One advantage of using feedforward networks for distant trajectory planning is their ability to capture long-term dependencies in the data. In the final step of forecasting for future locations, the encoder and decoder are crucial parts of the proposed technique. Spatial destinations are encoded utilizing location-based social networks(LBSN) based on visiting semantic locations. The model has been specially trained to forecast future locations using precise longitude and latitude values. Following rigorous testing on two real-world datasets, Porto and Manhattan, it was discovered that the model outperformed a prediction accuracy of 8.7% previous state-of-the-art methods.

Prediction of multipurpose dam inflow utilizing catchment attributes with LSTM and transformer models (유역정보 기반 Transformer및 LSTM을 활용한 다목적댐 일 단위 유입량 예측)

  • Kim, Hyung Ju;Song, Young Hoon;Chung, Eun Sung
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.7
    • /
    • pp.437-449
    • /
    • 2024
  • Rainfall-runoff prediction studies using deep learning while considering catchment attributes have been gaining attention. In this study, we selected two models: the Transformer model, which is suitable for large-scale data training through the self-attention mechanism, and the LSTM-based multi-state-vector sequence-to-sequence (LSTM-MSV-S2S) model with an encoder-decoder structure. These models were constructed to incorporate catchment attributes and predict the inflow of 10 multi-purpose dam watersheds in South Korea. The experimental design consisted of three training methods: Single-basin Training (ST), Pretraining (PT), and Pretraining-Finetuning (PT-FT). The input data for the models included 10 selected watershed attributes along with meteorological data. The inflow prediction performance was compared based on the training methods. The results showed that the Transformer model outperformed the LSTM-MSV-S2S model when using the PT and PT-FT methods, with the PT-FT method yielding the highest performance. The LSTM-MSV-S2S model showed better performance than the Transformer when using the ST method; however, it showed lower performance when using the PT and PT-FT methods. Additionally, the embedding layer activation vectors and raw catchment attributes were used to cluster watersheds and analyze whether the models learned the similarities between them. The Transformer model demonstrated improved performance among watersheds with similar activation vectors, proving that utilizing information from other pre-trained watersheds enhances the prediction performance. This study compared the suitable models and training methods for each multi-purpose dam and highlighted the necessity of constructing deep learning models using PT and PT-FT methods for domestic watersheds. Furthermore, the results confirmed that the Transformer model outperforms the LSTM-MSV-S2S model when applying PT and PT-FT methods.