• Title/Summary/Keyword: encoder- decoder

Search Result 454, Processing Time 0.029 seconds

Attention based multimodal model for Korean speech recognition post-editing (한국어 음성인식 후처리를 위한 주의집중 기반의 멀티모달 모델)

  • Jeong, Yeong-Seok;Oh, Byoung-Doo;Heo, Tak-Sung;Choi, Jeong-Myeong;Kim, Yu-Seop
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.145-150
    • /
    • 2020
  • 최근 음성인식 분야에서 신경망 기반의 종단간 모델이 제안되고 있다. 해당 모델들은 음성을 직접 입력받아 전사된 문장을 생성한다. 음성을 직접 입력받는 모델의 특성상 데이터의 품질이 모델의 성능에 많은 영향을 준다. 본 논문에서는 이러한 종단간 모델의 문제점을 해결하고자 음성인식 결과를 후처리하기 위한 멀티모달 기반 모델을 제안한다. 제안 모델은 음성과 전사된 문장을 입력 받는다. 입력된 각각의 데이터는 Encoder를 통해 자질을 추출하고 주의집중 메커니즘을 통해 Decoder로 추출된 정보를 전달한다. Decoder에서는 전달받은 주의집중 메커니즘의 결과를 바탕으로 후처리된 토큰을 생성한다. 본 논문에서는 후처리 모델의 성능을 평가하기 위해 word error rate를 사용했으며, 실험결과 Google cloud speech to text모델에 비해 word error rate가 8% 감소한 것을 확인했다.

  • PDF

Denoising Diffusion Null-space Model and Colorization based Image Compression

  • Indra Imanuel;Dae-Ki Kang;Suk-Ho Lee
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.22-30
    • /
    • 2024
  • Image compression-decompression methods have become increasingly crucial in modern times, facilitating the transfer of high-quality images while minimizing file size and internet traffic. Historically, early image compression relied on rudimentary codecs, aiming to compress and decompress data with minimal loss of image quality. Recently, a novel compression framework leveraging colorization techniques has emerged. These methods, originally developed for infusing grayscale images with color, have found application in image compression, leading to colorization-based coding. Within this framework, the encoder plays a crucial role in automatically extracting representative pixels-referred to as color seeds-and transmitting them to the decoder. The decoder, utilizing colorization methods, reconstructs color information for the remaining pixels based on the transmitted data. In this paper, we propose a novel approach to image compression, wherein we decompose the compression task into grayscale image compression and colorization tasks. Unlike conventional colorization-based coding, our method focuses on the colorization process rather than the extraction of color seeds. Moreover, we employ the Denoising Diffusion Null-Space Model (DDNM) for colorization, ensuring high-quality color restoration and contributing to superior compression rates. Experimental results demonstrate that our method achieves higher-quality decompressed images compared to standard JPEG and JPEG2000 compression schemes, particularly in high compression rate scenarios.

Multi-scale context fusion network for melanoma segmentation

  • Zhenhua Li;Lei Zhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.7
    • /
    • pp.1888-1906
    • /
    • 2024
  • Aiming at the problems that the edge of melanoma image is fuzzy, the contrast with the background is low, and the hair occlusion makes it difficult to segment accurately, this paper proposes a model MSCNet for melanoma segmentation based on U-net frame. Firstly, a multi-scale pyramid fusion module is designed to reconstruct the skip connection and transmit global information to the decoder. Secondly, the contextural information conduction module is innovatively added to the top of the encoder. The module provides different receptive fields for the segmented target by using the hole convolution with different expansion rates, so as to better fuse multi-scale contextural information. In addition, in order to suppress redundant information in the input image and pay more attention to melanoma feature information, global channel attention mechanism is introduced into the decoder. Finally, In order to solve the problem of lesion class imbalance, this paper uses a combined loss function. The algorithm of this paper is verified on ISIC 2017 and ISIC 2018 public datasets. The experimental results indicate that the proposed algorithm has better accuracy for melanoma segmentation compared with other CNN-based image segmentation algorithms.

Automatic Generation of Bridge Defect Descriptions Using Image Captioning Techniques

  • Chengzhang Chai;Yan Gao;Haijiang Li;Guanyu Xiong
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.327-334
    • /
    • 2024
  • Bridge inspection is crucial for infrastructure maintenance. Current inspections based on computer vision primarily focus on identifying simple defects such as cracks or corrosion. These detection results can serve merely as preliminary references for bridge inspection reports. To generate detailed reports, on-site engineers must still present the structural conditions through lengthy textual descriptions. This process is time-consuming, costly, and prone to human error. To bridge this gap, we propose a deep learning-based framework to generate detailed and accurate textual descriptions, laying the foundation for automating bridge inspection reports. This framework is built around an encoder-decoder architecture, utilizing Convolutional Neural Networks (CNN) for encoding image features and Gated Recurrent Units (GRU) as the decoder, combined with a dynamically adaptive attention mechanism. The experimental results demonstrate this approach's effectiveness, proving that the introduction of the attention mechanism contributes to improved generation results. Moreover, it is worth noting that, through comparative experiments on image restoration, we found that the model requires further improvement in terms of explainability. In summary, this study demonstrates the potential and practical application of image captioning techniques for bridge defect detection, and future research can further explore the integration of domain knowledge with artificial intelligence (AI).

Advanced Error Tracking Algorithm for H.263 (H.263에 적합한 개선된 에러 트래킹 알고리즘)

  • Hyo-seok Lee;Soo-Mok Jung
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.1
    • /
    • pp.123-130
    • /
    • 2004
  • In this paper, an advanced error tracking algorithm by using feedback channel was proposed for error resilient transmission. Using this proposed algorithm, the propagation of errors were reduced within the decoded data over bit error prone network. The addresses of corrupted blocks are reported to encoder by decoder. With negative acknowledgments of feedback channel, the encoder can precisely calculate negative acknowledgments and track the propagated errors by examining the backward motion dependency for proper pixel in the current encoding frame. The error-propagation effects can be terminated completely by INTRA refreshing the affected macro-blocks by using proposed error tracking algorithm. By utilizing the selective four-corner error tracking approximation, the error tracking computations of the proposed algorithm is less than that of the algorithm using full pixel without substantial degradation in video quality. The proposed algorithm can track errors rapidly and accurately.

  • PDF

3D Volumetric Medical Image Coding Using Unbalanced Tree (3차원 불균형 트리 구조를 가진 의료 영상 압축에 대한 연구)

  • Kim, Young-Seop;Cho, Jae-Hoon
    • Journal of the Semiconductor & Display Technology
    • /
    • v.5 no.2 s.15
    • /
    • pp.19-25
    • /
    • 2006
  • This paper focuses on lossy medical image compression methods for medical images that operate on three-dimensional(3-D) irreversible integer wavelet transform. We offer an application of unbalanced tree structure algorithm to medical images, using a 3-D unbalanced wavelet decomposition and a 3-D unbalanced spatial dependence tree. The wavelet decomposition is accomplished with integer wavelet filters implemented with the lifting method. We have tested our encoder on volumetric medical images using different integer filters and coding unit sizes. The coding unit sizes of 16 slices save considerable dynamic memory(RAM) and coding delay from full sequence coding units used in previous works. If we allow the formation of trees of different lengths, then we can accomodate more transaxial scales than three. The encoder and decoder can then keep track of the length of the tree in which each pixel resides through the sequence of decompositions. Results show that, even with these small coding units, our algorithm with certain filters performs as well and better in lossy coding than previous coding systems using 3-D integer unbalanced wavelet transforms on volumetric medical images.

  • PDF

Correcting Misclassified Image Features with Convolutional Coding

  • Mun, Ye-Ji;Kim, Nayoung;Lee, Jieun;Kang, Je-Won
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2018.11a
    • /
    • pp.11-14
    • /
    • 2018
  • The aim of this study is to rectify the misclassified image features and enhance the performance of image classification tasks by incorporating a channel- coding technique, widely used in telecommunication. Specifically, the proposed algorithm employs the error - correcting mechanism of convolutional coding combined with the convolutional neural networks (CNNs) that are the state - of- the- arts image classifier s. We develop an encoder and a decoder to employ the error - correcting capability of the convolutional coding. In the encoder, the label values of the image data are converted to convolutional codes that are used as target outputs of the CNN, and the network is trained to minimize the Euclidean distance between the target output codes and the actual output codes. In order to correct misclassified features, the outputs of the network are decoded through the trellis structure with Viterbi algorithm before determining the final prediction. This paper demonstrates that the proposed architecture advances the performance of the neural networks compared to the traditional one- hot encoding method.

  • PDF

Bird's Eye View Semantic Segmentation based on Improved Transformer for Automatic Annotation

  • Tianjiao Liang;Weiguo Pan;Hong Bao;Xinyue Fan;Han Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.1996-2015
    • /
    • 2023
  • High-definition (HD) maps can provide precise road information that enables an autonomous driving system to effectively navigate a vehicle. Recent research has focused on leveraging semantic segmentation to achieve automatic annotation of HD maps. However, the existing methods suffer from low recognition accuracy in automatic driving scenarios, leading to inefficient annotation processes. In this paper, we propose a novel semantic segmentation method for automatic HD map annotation. Our approach introduces a new encoder, known as the convolutional transformer hybrid encoder, to enhance the model's feature extraction capabilities. Additionally, we propose a multi-level fusion module that enables the model to aggregate different levels of detail and semantic information. Furthermore, we present a novel decoupled boundary joint decoder to improve the model's ability to handle the boundary between categories. To evaluate our method, we conducted experiments using the Bird's Eye View point cloud images dataset and Cityscapes dataset. Comparative analysis against stateof-the-art methods demonstrates that our model achieves the highest performance. Specifically, our model achieves an mIoU of 56.26%, surpassing the results of SegFormer with an mIoU of 1.47%. This innovative promises to significantly enhance the efficiency of HD map automatic annotation.

TSDnet: Three-scale Dense Network for Infrared and Visible Image Fusion (TSDnet: 적외선과 가시광선 이미지 융합을 위한 규모-3 밀도망)

  • Zhang, Yingmei;Lee, Hyo Jong
    • Annual Conference of KIPS
    • /
    • 2022.11a
    • /
    • pp.656-658
    • /
    • 2022
  • The purpose of infrared and visible image fusion is to integrate images of different modes with different details into a result image with rich information, which is convenient for high-level computer vision task. Considering many deep networks only work in a single scale, this paper proposes a novel image fusion based on three-scale dense network to preserve the content and key target features from the input images in the fused image. It comprises an encoder, a three-scale block, a fused strategy and a decoder, which can capture incredibly rich background details and prominent target details. The encoder is used to extract three-scale dense features from the source images for the initial image fusion. Then, a fusion strategy called l1-norm to fuse features of different scales. Finally, the fused image is reconstructed by decoding network. Compared with the existing methods, the proposed method can achieve state-of-the-art fusion performance in subjective observation.

Adaptive Hard Decision Aided Fast Decoding Method using Parity Request Estimation in Distributed Video Coding (패리티 요구량 예측을 이용한 적응적 경판정 출력 기반 고속 분산 비디오 복호화 기술)

  • Shim, Hiuk-Jae;Oh, Ryang-Geun;Jeon, Byeung-Woo
    • Journal of Broadcast Engineering
    • /
    • v.16 no.4
    • /
    • pp.635-646
    • /
    • 2011
  • In distributed video coding, low complexity encoder can be realized by shifting encoder-side complex processes to decoder-side. However, not only motion estimation/compensation processes but also complex LDPC decoding process are imposed to the Wyner-Ziv decoder, therefore decoder-side complexity has been one important issue to improve. LDPC decoding process consists of numerous iterative decoding processes, therefore complexity increases as the number of iteration increases. This iterative LDPC decoding process accounts for more than 60% of whole WZ decoding complexity, therefore it can be said to be a main target for complexity reduction. Previously, HDA (Hard Decision Aided) method is introduced for fast LDPC decoding process. For currently received parity bits, HDA method certainly reduces the complexity of decoding process, however, LDPC decoding process is still performed even with insufficient amount of parity request which cannot lead to successful LDPC decoding. Therefore, we can further reduce complexity by avoiding the decoding process for insufficient parity bits. In this paper, therefore, a parity request estimation method is proposed using bit plane-wise correlation and temporal correlation. Joint usage of HDA method and the proposed method achieves about 72% of complexity reduction in LDPC decoding process, while rate distortion performance is degraded only by -0.0275 dB in BDPSNR.