• Title/Summary/Keyword: encoder- decoder

Search Result 454, Processing Time 0.028 seconds

Media-oriented e-Learning System supporting Execution-File Demonstration (실행파일 시연기능을 지원하는 미디어 지향적 e-러닝 시스템)

  • Jou, Wou-Seok;Lee, Kang-Sun;Meng, Je-An
    • The KIPS Transactions:PartA
    • /
    • v.13A no.6 s.103
    • /
    • pp.555-560
    • /
    • 2006
  • In contrast with the earlier remote education that simply recorded off-line classes, modern remote education emphasizes on offering additional functions that could maximize learning efficiency. Usage of such multimedia information as the texts, graphics, sounds, animations is considered fundamental element in offering the additional functions. This paper designs and implements an encoder/decoder that could accommodate the multimedia information with emphasis on demonstrating execution files. Instructors can demonstrate my type of execution files or application data files, and the remote learners can freely try running the corresponding execution files by themselves. Consequently, a high-level of learning efficiency can be achieved by the proposed encoder/decoder.

A Study on the Development of MGCP and SDP Stack for VoIP Standard Protocols (VoIP 표준 프로토콜을 위한 MGCP 및 SDP 스택 개발에 관한 연구)

  • Ko, Kwang-Man
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.11S
    • /
    • pp.3668-3674
    • /
    • 2000
  • Recently Technology regarding VoIP (Voice over IP) is emerging over the market of the IP network. So far nothing is unfortunately there any attempt to try any research with respect to the development of the protocol stack relating to such control of gateway as MGCP, MEGACO, SIP, SDP. The reasons come from the low level of infrastructue, the shortage of the time and technology required at the moment, and so on. In this regards, this paper is focused on developing a protocol stack made with encoder/decoder, the generator of the header file etc, based on the protocol grammars of MGCP, SDP supported by IETF. For the sake of it, first develops the syntax analyzer, encoder/decoder, header file generator for encoding/decoding as applying the method of syntax-directed to each protocol grammar.

  • PDF

AI photo storyteller based on deep encoder-decoder architecture (딥인코더-디코더 기반의 인공지능 포토 스토리텔러)

  • Min, Kyungbok;Dang, L. Minh;Lee, Sujin;Moon, Hyeonjoon
    • Annual Conference of KIPS
    • /
    • 2019.10a
    • /
    • pp.931-934
    • /
    • 2019
  • Research using artificial intelligence to generate captions for an image has been studied extensively. However, these systems are unable to create creative stories that include more than one sentence based on image content. A story is a better way that humans use to foster social cooperation and develop social norms. This paper proposes a framework that can generate a relatively short story to describe based on the context of an image. The main contributions of this paper are (1) An unsupervised framework which uses recurrent neural network structure and encoder-decoder model to construct a short story for an image. (2) A huge English novel dataset, including horror and romantic themes that are manually collected and validated. By investigating the short stories, the proposed model proves that it can generate more creative contents compared to existing intelligent systems which can produce only one concise sentence. Therefore, the framework demonstrated in this work will trigger the research of a more robust AI story writer and encourages the application of the proposed model in helping story writer find a new idea.

Semi-Supervised Spatial Attention Method for Facial Attribute Editing

  • Yang, Hyeon Seok;Han, Jeong Hoon;Moon, Young Shik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3685-3707
    • /
    • 2021
  • In recent years, facial attribute editing has been successfully used to effectively change face images of various attributes based on generative adversarial networks and encoder-decoder models. However, existing models have a limitation in that they may change an unintended part in the process of changing an attribute or may generate an unnatural result. In this paper, we propose a model that improves the learning of the attention mask by adding a spatial attention mechanism based on the unified selective transfer network (referred to as STGAN) using semi-supervised learning. The proposed model can edit multiple attributes while preserving details independent of the attributes being edited. This study makes two main contributions to the literature. First, we propose an encoder-decoder model structure that learns and edits multiple facial attributes and suppresses distortion using an attention mask. Second, we define guide masks and propose a method and an objective function that use the guide masks for multiple facial attribute editing through semi-supervised learning. Through qualitative and quantitative evaluations of the experimental results, the proposed method was proven to yield improved results that preserve the image details by suppressing unintended changes than existing methods.

Lossless Image Compression Based on Deep Learning (딥 러닝 기반의 무손실 영상압축 방법)

  • Rhee, Hochang;Cho, Nam Ik
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.67-70
    • /
    • 2022
  • 최근 딥러닝 방법의 발전하면서 영상처리 및 컴퓨터 비전의 다양한 분야에서 딥러닝 기반의 알고리즘들이 그 이전의 방법들에 비하여 큰 성능 향상을 보이고 있다. 손실 영상 압축의 경우 최근 encoder-decoder 형태의 네트웍이 영상 압축에서 사용되는 transform을 대체하고 있고, transform 결과들의 엔트로피 코딩을 위한 추가적인 encoder-decoder 네트웍을 사용하여 HEVC 수준에 버금가는 성능을 내고 있다. 무손실 압축의 경우에도 매 픽셀 예측을 CNN으로 수행하는 경우, 기존의 예측방법들에 비하여 예측성능이 크게 향상되어 JPEG-2000 Lossless, FLIF, JEPG-XL 등의 딥러닝을 사용하지 않는 방법들에 비하여 우수한 성능을 내는 것으로 보고되고 있다. 그러나 모든 픽셀에 대하여 예측값을 CNN을 통하여 계산하는 방법은, 영상의 픽셀 수 만큼 CNN을 수행해야 하므로 HD 크기 영상에 대하여 지금까지 알려진 가장 빠른 방법이 한 시간 이상 소요되는 등 비현실적인 것으로 알려져 있다. 따라서 최근에는 성능은 이보다 떨어지지만 속도를 현실적으로 줄인 방법들이 제안되고 있다. 이러한 방법들은 초기에는 FLIF나 JPEG-XL에 비하여 성능이 떨어져서, GPU를 사용하면서도 기존의 방법보다 좋지 않은 성능을 보인다는 면에서 여전히 비현실적이었다. 최근에는 신호의 특성을 더 잘 활용하는 방법들이 제안되면서 매 픽셀마다 CNN을 수행하는 방법보다는 성능이 떨어지지만, 짧은 시간 내에 FLIF나 JPEG-XL보다는 좋은 성능을 내는 현실적인 방법들이 제안되었다. 본 연구에서는 이러한 최근의 몇 가지 방법들을 살펴보고 이들보다 성능을 더 좋게 할 수 있는 보조적인 방법들과 raw image에 대한 성능을 평가한다.

  • PDF

Time-Series Forecasting Based on Multi-Layer Attention Architecture

  • Na Wang;Xianglian Zhao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.1-14
    • /
    • 2024
  • Time-series forecasting is extensively used in the actual world. Recent research has shown that Transformers with a self-attention mechanism at their core exhibit better performance when dealing with such problems. However, most of the existing Transformer models used for time series prediction use the traditional encoder-decoder architecture, which is complex and leads to low model processing efficiency, thus limiting the ability to mine deep time dependencies by increasing model depth. Secondly, the secondary computational complexity of the self-attention mechanism also increases computational overhead and reduces processing efficiency. To address these issues, the paper designs an efficient multi-layer attention-based time-series forecasting model. This model has the following characteristics: (i) It abandons the traditional encoder-decoder based Transformer architecture and constructs a time series prediction model based on multi-layer attention mechanism, improving the model's ability to mine deep time dependencies. (ii) A cross attention module based on cross attention mechanism was designed to enhance information exchange between historical and predictive sequences. (iii) Applying a recently proposed sparse attention mechanism to our model reduces computational overhead and improves processing efficiency. Experiments on multiple datasets have shown that our model can significantly increase the performance of current advanced Transformer methods in time series forecasting, including LogTrans, Reformer, and Informer.

A dual path encoder-decoder network for placental vessel segmentation in fetoscopic surgery

  • Yunbo Rao;Tian Tan;Shaoning Zeng;Zhanglin Chen;Jihong Sun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.15-29
    • /
    • 2024
  • A fetoscope is an optical endoscope, which is often applied in fetoscopic laser photocoagulation to treat twin-to-twin transfusion syndrome. In an operation, the clinician needs to observe the abnormal placental vessels through the endoscope, so as to guide the operation. However, low-quality imaging and narrow field of view of the fetoscope increase the difficulty of the operation. Introducing an accurate placental vessel segmentation of fetoscopic images can assist the fetoscopic laser photocoagulation and help identify the abnormal vessels. This study proposes a method to solve the above problems. A novel encoder-decoder network with a dual-path structure is proposed to segment the placental vessels in fetoscopic images. In particular, we introduce a channel attention mechanism and a continuous convolution structure to obtain multi-scale features with their weights. Moreover, a switching connection is inserted between the corresponding blocks of the two paths to strengthen their relationship. According to the results of a set of blood vessel segmentation experiments conducted on a public fetoscopic image dataset, our method has achieved higher scores than the current mainstream segmentation methods, raising the dice similarity coefficient, intersection over union, and pixel accuracy by 5.80%, 8.39% and 0.62%, respectively.

Deep learning framework for bovine iris segmentation

  • Heemoon Yoon;Mira Park;Hayoung Lee;Jisoon An;Taehyun Lee;Sang-Hee Lee
    • Journal of Animal Science and Technology
    • /
    • v.66 no.1
    • /
    • pp.167-177
    • /
    • 2024
  • Iris segmentation is an initial step for identifying the biometrics of animals when establishing a traceability system for livestock. In this study, we propose a deep learning framework for pixel-wise segmentation of bovine iris with a minimized use of annotation labels utilizing the BovineAAEyes80 public dataset. The proposed image segmentation framework encompasses data collection, data preparation, data augmentation selection, training of 15 deep neural network (DNN) models with varying encoder backbones and segmentation decoder DNNs, and evaluation of the models using multiple metrics and graphical segmentation results. This framework aims to provide comprehensive and in-depth information on each model's training and testing outcomes to optimize bovine iris segmentation performance. In the experiment, U-Net with a VGG16 backbone was identified as the optimal combination of encoder and decoder models for the dataset, achieving an accuracy and dice coefficient score of 99.50% and 98.35%, respectively. Notably, the selected model accurately segmented even corrupted images without proper annotation data. This study contributes to the advancement of iris segmentation and the establishment of a reliable DNN training framework.

Efficient Correlation Channel Modeling for Transform Domain Wyner-Ziv Video Coding (Transform Domain Wyner-Ziv 비디오 부호를 위한 효과적인 상관 채널 모델링)

  • Oh, Ji-Eun;Jung, Chun-Sung;Kim, Dong-Yoon;Park, Hyun-Wook;Ha, Jeong-Seok
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.3
    • /
    • pp.23-31
    • /
    • 2010
  • The increasing demands on low-power, and low-complexity video encoder have been motivating extensive research activities on distributed video coding (DVC) in which the encoder compresses frames without utilizing inter-frame statistical correlation. In DVC encoder, contrary to the conventional video encoder, an error control code compresses the video frames by representing the frames in the form of syndrome bits. In the meantime, the DVC decoder generates side information which is modeled as a noisy version of the original video frames, and a decoder of the error-control code corrects the errors in the side information with the syndrome bits. The noisy observation, i.e., the side information can be understood as the output of a virtual channel corresponding to the orignal video frames, and the conditional probability of the virtual channel model is assumed to follow a Laplacian distribution. Thus, performance improvement of DVC systems depends on performances of the error-control code and the optimal reconstruction step in the DVC decoder. In turn, the performances of two constituent blocks are directly related to a better estimation of the parameter of the correlation channel. In this paper, we propose an algorithm to estimate the parameter of the correlation channel and also a low-complexity version of the proposed algorithm. In particular, the proposed algorithm minimizes squared-error of the Laplacian probability distribution and the empirical observations. Finally, we show that the conventional algorithm can be improved by adopting a confidential window. The proposed algorithm results in PSNR gain up to 1.8 dB and 1.1 dB on Mother and Foreman video sequences, respectively.

Design and Implementation of 8b/10b Encoder/Decoder for Serial ATA (직렬 ATA용 8b/10b 인코더와 디코더 설계 및 구현)

  • Heo Jung-Hwa;Park Nho-Kyung;Park Sang-Bong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1A
    • /
    • pp.93-98
    • /
    • 2004
  • Serial ATA interface Is inexpensive comparatively and performance is superior. So it is suitable technology in demand that now require data transmission and throughput of high speed. This paper describes a design and implementation of Serial ATA Link layer about error detection and 8b/10b encoder/decoder for DC balance in frequency 150MHz. The 8b/10b Encoder is partitioned into a 5b/6b plus a 3b/4b coder. The logical model of the block is described by using Verilog HDL at register transistor level and the verified HDL is synthesized using standard cell libraries. And it is fabricated with $0.35{\mu}m$ Standard CMOS Cell library and the chip size is about $1500{\mu}m\;*\;1500{\mu}m$. The function of this chip has been verified and tested using testboard with FPGA equipment and IDEC ATS2 test equipment. It is used to frequency of 100MHz in verification processes and supply voltage 3.3V. The result of testing is well on the system clock 100MHz. The designed and verified each blocks may be used IP in the field of high speed serial data communication.