• Title/Summary/Keyword: ENCODER

Search Result 1,918, Processing Time 0.029 seconds

DP-LinkNet: A convolutional network for historical document image binarization

  • Xiong, Wei;Jia, Xiuhong;Yang, Dichun;Ai, Meihui;Li, Lirong;Wang, Song
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1778-1797
    • /
    • 2021
  • Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://github.com/beargolden/DP-LinkNet.

KI-HABS: Key Information Guided Hierarchical Abstractive Summarization

  • Zhang, Mengli;Zhou, Gang;Yu, Wanting;Liu, Wenfen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4275-4291
    • /
    • 2021
  • With the unprecedented growth of textual information on the Internet, an efficient automatic summarization system has become an urgent need. Recently, the neural network models based on the encoder-decoder with an attention mechanism have demonstrated powerful capabilities in the sentence summarization task. However, for paragraphs or longer document summarization, these models fail to mine the core information in the input text, which leads to information loss and repetitions. In this paper, we propose an abstractive document summarization method by applying guidance signals of key sentences to the encoder based on the hierarchical encoder-decoder architecture, denoted as KI-HABS. Specifically, we first train an extractor to extract key sentences in the input document by the hierarchical bidirectional GRU. Then, we encode the key sentences to the key information representation in the sentence level. Finally, we adopt key information representation guided selective encoding strategies to filter source information, which establishes a connection between the key sentences and the document. We use the CNN/Daily Mail and Gigaword datasets to evaluate our model. The experimental results demonstrate that our method generates more informative and concise summaries, achieving better performance than the competitive models.

U-net with vision transformer encoder for polyp segmentation in colonoscopy images (비전 트랜스포머 인코더가 포함된 U-net을 이용한 대장 내시경 이미지의 폴립 분할)

  • Ayana, Gelan;Choe, Se-woon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.97-99
    • /
    • 2022
  • For the early identification and treatment of colorectal cancer, accurate polyp segmentation is crucial. However, polyp segmentation is a challenging task, and the majority of current approaches struggle with two issues. First, the position, size, and shape of each individual polyp varies greatly (intra-class inconsistency). Second, there is a significant degree of similarity between polyps and their surroundings under certain circumstances, such as motion blur and light reflection (inter-class indistinction). U-net, which is composed of convolutional neural networks as encoder and decoder, is considered as a standard for tackling this task. We propose an updated U-net architecture replacing the encoder part with vision transformer network for polyp segmentation. The proposed architecture performed better than the standard U-net architecture for the task of polyp segmentation.

  • PDF

Land Cover Classifier Using Coordinate Hash Encoder (좌표 해시 인코더를 활용한 토지피복 분류 모델)

  • Yongsun Yoon;Dongjae Kwon
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_3
    • /
    • pp.1771-1777
    • /
    • 2023
  • With the advancements of deep learning, many semantic segmentation-based methods for land cover classification have been proposed. However, existing deep learning-based models only use image information and cannot guarantee spatiotemporal consistency. In this study, we propose a land cover classification model using geographical coordinates. First, the coordinate features are extracted through the Coordinate Hash Encoder, which is an extension of the Multi-resolution Hash Encoder, an implicit neural representation technique, to the longitude-latitude coordinate system. Next, we propose an architecture that combines the extracted coordinate features with different levels of U-net decoder. Experimental results show that the proposed method improves the mean intersection over union by about 32% and improves the spatiotemporal consistency.

Attention-based deep learning framework for skin lesion segmentation (피부 병변 분할을 위한 어텐션 기반 딥러닝 프레임워크)

  • Afnan Ghafoor;Bumshik Lee
    • Smart Media Journal
    • /
    • v.13 no.3
    • /
    • pp.53-61
    • /
    • 2024
  • This paper presents a novel M-shaped encoder-decoder architecture for skin lesion segmentation, achieving better performance than existing approaches. The proposed architecture utilizes the left and right legs to enable multi-scale feature extraction and is further enhanced by integrating an attention module within the skip connection. The image is partitioned into four distinct patches, facilitating enhanced processing within the encoder-decoder framework. A pivotal aspect of the proposed method is to focus more on critical image features through an attention mechanism, leading to refined segmentation. Experimental results highlight the effectiveness of the proposed approach, demonstrating superior accuracy, precision, and Jaccard Index compared to existing methods

Load Prediction using Finite Element Analysis and Recurrent Neural Network (유한요소해석과 순환신경망을 활용한 하중 예측)

  • Jung-Ho Kang
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.1
    • /
    • pp.151-160
    • /
    • 2024
  • Artificial Neural Networks that enabled Artificial Intelligence are being used in many fields. However, the application to mechanical structures has several problems and research is incomplete. One of the problems is that it is difficult to secure a large amount of data necessary for learning Artificial Neural Networks. In particular, it is important to detect and recognize external forces and forces for safety working and accident prevention of mechanical structures. This study examined the possibility by applying the Current Neural Network of Artificial Neural Networks to detect and recognize the load on the machine. Tens of thousands of data are required for general learning of Recurrent Neural Networks, and to secure large amounts of data, this paper derives load data from ANSYS structural analysis results and applies a stacked auto-encoder technique to secure the amount of data that can be learned. The usefulness of Stacked Auto-Encoder data was examined by comparing Stacked Auto-Encoder data and ANSYS data. In addition, in order to improve the accuracy of detection and recognition of load data with a Recurrent Neural Network, the optimal conditions are proposed by investigating the effects of related functions.

Accuracy Assessment of Forest Degradation Detection in Semantic Segmentation based Deep Learning Models with Time-series Satellite Imagery

  • Woo-Dam Sim;Jung-Soo Lee
    • Journal of Forest and Environmental Science
    • /
    • v.40 no.1
    • /
    • pp.15-23
    • /
    • 2024
  • This research aimed to assess the possibility of detecting forest degradation using time-series satellite imagery and three different deep learning-based change detection techniques. The dataset used for the deep learning models was composed of two sets, one based on surface reflectance (SR) spectral information from satellite imagery, combined with Texture Information (GLCM; Gray-Level Co-occurrence Matrix) and terrain information. The deep learning models employed for land cover change detection included image differencing using the Unet semantic segmentation model, multi-encoder Unet model, and multi-encoder Unet++ model. The study found that there was no significant difference in accuracy between the deep learning models for forest degradation detection. Both training and validation accuracies were approx-imately 89% and 92%, respectively. Among the three deep learning models, the multi-encoder Unet model showed the most efficient analysis time and comparable accuracy. Moreover, models that incorporated both texture and gradient information in addition to spectral information were found to have a higher classification accuracy compared to models that used only spectral information. Overall, the accuracy of forest degradation extraction was outstanding, achieving 98%.

The Region-of-Interest Based Pixel Domain Distributed Video Coding With Low Decoding Complexity (관심 영역 기반의 픽셀 도메인 분산 비디오 부호)

  • Jung, Chun-Sung;Kim, Ung-Hwan;Jun, Dong-San;Park, Hyun-Wook;Ha, Jeong-Seok
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.4
    • /
    • pp.79-89
    • /
    • 2010
  • Recently, distributed video coding (DVC) has been actively studied for low complexity video encoder. The complexity of the encoder in DVC is much simpler than that of traditional video coding schemes such as H.264/AVC, but the complexity of the decoder in DVC increases. In this paper, we propose the Region-Of-Interest (ROI) based DVC with low decoding complexity. The proposed scheme uses the ROI, the region the motion of objects is quickly moving as the input of the Wyner-Ziv (WZ) encoder instead of the whole WZ frame. In this case, the complexity of encoder and decoder is reduced, and the bite rate decreases. Experimental results show that the proposed scheme obtain 0.95 dB as the maximum PSNR gain in Hall Monitor sequence and 1.87 dB in Salesman sequence. Moreover, the complexity of encoder and decoder in the proposed scheme is significantly reduced by 73.7% and 63.3% over the traditional DVC scheme, respectively. In addition, we employ the layered belief propagation (LBP) algorithm whose decoding convergence speed is 1.73 times faster than belief propagation algorithm as the Low-Density Parity-Check (LDPC) decoder for low decoding complexity.

Multi-View Video System using Single Encoder and Decoder (단일 엔코더 및 디코더를 이용하는 다시점 비디오 시스템)

  • Kim Hak-Soo;Kim Yoon;Kim Man-Bae
    • Journal of Broadcast Engineering
    • /
    • v.11 no.1 s.30
    • /
    • pp.116-129
    • /
    • 2006
  • The progress of data transmission technology through the Internet has spread a variety of realistic contents. One of such contents is multi-view video that is acquired from multiple camera sensors. In general, the multi-view video processing requires encoders and decoders as many as the number of cameras, and thus the processing complexity results in difficulties of practical implementation. To solve for this problem, this paper considers a simple multi-view system utilizing a single encoder and a single decoder. In the encoder side, input multi-view YUV sequences are combined on GOP units by a video mixer. Then, the mixed sequence is compressed by a single H.264/AVC encoder. The decoding is composed of a single decoder and a scheduler controling the decoding process. The goal of the scheduler is to assign approximately identical number of decoded frames to each view sequence by estimating the decoder utilization of a Gap and subsequently applying frame skip algorithms. Furthermore, in the frame skip, efficient frame selection algorithms are studied for H.264/AVC baseline and main profiles based upon a cost function that is related to perceived video quality. Our proposed method has been performed on various multi-view test sequences adopted by MPEG 3DAV. Experimental results show that approximately identical decoder utilization is achieved for each view sequence so that each view sequence is fairly displayed. As well, the performance of the proposed method is examined in terms of bit-rate and PSNR using a rate-distortion curve.

A Performance Comparison of Spatial Scalable Encoders with the Constrained Coding Modes for T-DMB/AT-DMB Services (T-DMB/AT-DMB 서비스를 위한 부호화 모드 제한을 갖는 공간 확장성 부호기의 성능 비교)

  • Kim, Jin-Soo;Park, Jong-Kab;Kim, Kyu-Seok;Choi, Sung-Jin;Seo, Kwang-Deok;Kim, Jae-Gon
    • Journal of Broadcast Engineering
    • /
    • v.13 no.4
    • /
    • pp.501-515
    • /
    • 2008
  • Recently, as users' requests for high quality mobile multimedia services are rapidly increasing and additional bandwidth can be provided by adopting the hierarchical modulation transmission technology, the research on the Advanced Terrestrial DMB (AT-DMB) service using the SVC (Scalable Video Coding) scheme is being actively studied. But, in order to realize a compatible video service and to accelerate the successful standardization and commercialization, it is necessary to simplify the compatible encoder structure. In this parer, we propose a fast mode decision method by constraining the redundant coding modes in the spatial scalable encoder that keeps the current T-DMB video in base layer. The proposed method is based on the statistical characteristics of each coding mode at both base and enhancement layers, inter-layer predictions, which are derived by investigating macroblock-layer coding modes of the spatial scalable encoder's functional structure. Through computer simulations, it is shown that a simplified encoder model that reduces the heavy computational burden can be found, while keeping the objective visual quality very high.