• Title/Summary/Keyword: encoder- decoder

Search Result 454, Processing Time 0.029 seconds

Deep Learning Based Gray Image Generation from 3D LiDAR Reflection Intensity (딥러닝 기반 3차원 라이다의 반사율 세기 신호를 이용한 흑백 영상 생성 기법)

  • Kim, Hyun-Koo;Yoo, Kook-Yeol;Park, Ju H.;Jung, Ho-Youl
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.1
    • /
    • pp.1-9
    • /
    • 2019
  • In this paper, we propose a method of generating a 2D gray image from LiDAR 3D reflection intensity. The proposed method uses the Fully Convolutional Network (FCN) to generate the gray image from 2D reflection intensity which is projected from LiDAR 3D intensity. Both encoder and decoder of FCN are configured with several convolution blocks in the symmetric fashion. Each convolution block consists of a convolution layer with $3{\times}3$ filter, batch normalization layer and activation function. The performance of the proposed method architecture is empirically evaluated by varying depths of convolution blocks. The well-known KITTI data set for various scenarios is used for training and performance evaluation. The simulation results show that the proposed method produces the improvements of 8.56 dB in peak signal-to-noise ratio and 0.33 in structural similarity index measure compared with conventional interpolation methods such as inverse distance weighted and nearest neighbor. The proposed method can be possibly used as an assistance tool in the night-time driving system for autonomous vehicles.

Detection and Localization of Image Tampering using Deep Residual UNET with Stacked Dilated Convolution

  • Aminu, Ali Ahmad;Agwu, Nwojo Nnanna;Steve, Adeshina
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.203-211
    • /
    • 2021
  • Image tampering detection and localization have become an active area of research in the field of digital image forensics in recent times. This is due to the widespread of malicious image tampering. This study presents a new method for image tampering detection and localization that combines the advantages of dilated convolution, residual network, and UNET Architecture. Using the UNET architecture as a backbone, we built the proposed network from two kinds of residual units, one for the encoder path and the other for the decoder path. The residual units help to speed up the training process and facilitate information propagation between the lower layers and the higher layers which are often difficult to train. To capture global image tampering artifacts and reduce the computational burden of the proposed method, we enlarge the receptive field size of the convolutional kernels by adopting dilated convolutions in the residual units used in building the proposed network. In contrast to existing deep learning methods, having a large number of layers, many network parameters, and often difficult to train, the proposed method can achieve excellent performance with a fewer number of parameters and less computational cost. To test the performance of the proposed method, we evaluate its performance in the context of four benchmark image forensics datasets. Experimental results show that the proposed method outperforms existing methods and could be potentially used to enhance image tampering detection and localization.

Context-Awareness Cat Behavior Captioning System (반려묘의 상황인지형 행동 캡셔닝 시스템)

  • Chae, Heechan;Choi, Yoona;Lee, Jonguk;Park, Daihee;Chung, Yongwha
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.1
    • /
    • pp.21-29
    • /
    • 2021
  • With the recent increase in the number of households raising pets, various engineering studies have been underway for pets. The final purpose of this study is to automatically generate situation-sensitive captions that can express implicit intentions based on the behavior and sound of cats by embedding the already mature behavioral detection technology of pets as basic element technology in the video capturing research. As a pilot project to this end, this paper proposes a high-level capturing system using optical-flow, RGB, and sound information of cat videos. That is, the proposed system uses video datasets collected in an actual breeding environment to extract feature vectors from the video and sound, then through hierarchical LSTM encoder and decoder, to identify the cat's behavior and its implicit intentions, and to perform learning to create context-sensitive captions. The performance of the proposed system was verified experimentally by utilizing video data collected in the environment where actual cats are raised.

A Study on Error-Resilient, Scalable Video Codecs Based on the Set Partitioning in Hierarchical Trees(SPIHT) Algorithm (계층적 트리의 집합 분할 알고리즘(SPIHT)에 기반한 에러에 강하고 가변적인 웨이브렛 비디오 코덱에 관한 연구)

  • Inn-Ho, Jee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.1
    • /
    • pp.37-43
    • /
    • 2023
  • Compressed still image or video bitstreams require protection from channel errors in a wireless channel. Embedded Zerotree Coding(EZW), SPIHT could have provided unprecedented high performance in image compression with low complexity. If bit error is generated by dint of wireless channel transmission problem, the loss of synchronization on between encoder and decoder causes serious performance degradation. But wavelet zerotree coding algorithms are producing variable-length codewords, extremely sensitive to bit errors. The idea is to partition the lifting coefficients. A many partition of lifting transform coefficients distributes channel error from wireless channel to each partition. Therefore synchronization problem that caused quality deterioration in still image and video stream was improved.

Crack segmentation in high-resolution images using cascaded deep convolutional neural networks and Bayesian data fusion

  • Tang, Wen;Wu, Rih-Teng;Jahanshahi, Mohammad R.
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.221-235
    • /
    • 2022
  • Manual inspection of steel box girders on long span bridges is time-consuming and labor-intensive. The quality of inspection relies on the subjective judgements of the inspectors. This study proposes an automated approach to detect and segment cracks in high-resolution images. An end-to-end cascaded framework is proposed to first detect the existence of cracks using a deep convolutional neural network (CNN) and then segment the crack using a modified U-Net encoder-decoder architecture. A Naïve Bayes data fusion scheme is proposed to reduce the false positives and false negatives effectively. To generate the binary crack mask, first, the original images are divided into 448 × 448 overlapping image patches where these image patches are classified as cracks versus non-cracks using a deep CNN. Next, a modified U-Net is trained from scratch using only the crack patches for segmentation. A customized loss function that consists of binary cross entropy loss and the Dice loss is introduced to enhance the segmentation performance. Additionally, a Naïve Bayes fusion strategy is employed to integrate the crack score maps from different overlapping crack patches and to decide whether a pixel is crack or not. Comprehensive experiments have demonstrated that the proposed approach achieves an 81.71% mean intersection over union (mIoU) score across 5 different training/test splits, which is 7.29% higher than the baseline reference implemented with the original U-Net.

Boundary and Reverse Attention Module for Lung Nodule Segmentation in CT Images (CT 영상에서 폐 결절 분할을 위한 경계 및 역 어텐션 기법)

  • Hwang, Gyeongyeon;Ji, Yewon;Yoon, Hakyoung;Lee, Sang Jun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.5
    • /
    • pp.265-272
    • /
    • 2022
  • As the risk of lung cancer has increased, early-stage detection and treatment of cancers have received a lot of attention. Among various medical imaging approaches, computer tomography (CT) has been widely utilized to examine the size and growth rate of lung nodules. However, the process of manual examination is a time-consuming task, and it causes physical and mental fatigue for medical professionals. Recently, many computer-aided diagnostic methods have been proposed to reduce the workload of medical professionals. In recent studies, encoder-decoder architectures have shown reliable performances in medical image segmentation, and it is adopted to predict lesion candidates. However, localizing nodules in lung CT images is a challenging problem due to the extremely small sizes and unstructured shapes of nodules. To solve these problems, we utilize atrous spatial pyramid pooling (ASPP) to minimize the loss of information for a general U-Net baseline model to extract rich representations from various receptive fields. Moreover, we propose mixed-up attention mechanism of reverse, boundary and convolutional block attention module (CBAM) to improve the accuracy of segmentation small scale of various shapes. The performance of the proposed model is compared with several previous attention mechanisms on the LIDC-IDRI dataset, and experimental results demonstrate that reverse, boundary, and CBAM (RB-CBAM) are effective in the segmentation of small nodules.

Hybrid model-based and deep learning-based metal artifact reduction method in dental cone-beam computed tomography

  • Jin Hur;Yeong-Gil Shin;Ho Lee
    • Nuclear Engineering and Technology
    • /
    • v.55 no.8
    • /
    • pp.2854-2863
    • /
    • 2023
  • Objective: To present a hybrid approach that incorporates a constrained beam-hardening estimator (CBHE) and deep learning (DL)-based post-refinement for metal artifact reduction in dental cone-beam computed tomography (CBCT). Methods: Constrained beam-hardening estimator (CBHE) is derived from a polychromatic X-ray attenuation model with respect to X-ray transmission length, which calculates associated parameters numerically. Deep-learning-based post-refinement with an artifact disentanglement network (ADN) is performed to mitigate the remaining dark shading regions around a metal. Artifact disentanglement network (ADN) supports an unsupervised learning approach, in which no paired CBCT images are required. The network consists of an encoder that separates artifacts and content and a decoder for the content. Additionally, ADN with data normalization replaces metal regions with values from bone or soft tissue regions. Finally, the metal regions obtained from the CBHE are blended into reconstructed images. The proposed approach is systematically assessed using a dental phantom with two types of metal objects for qualitative and quantitative comparisons. Results: The proposed hybrid scheme provides improved image quality in areas surrounding the metal while preserving native structures. Conclusion: This study may significantly improve the detection of areas of interest in many dentomaxillofacial applications.

Multi-Scale Dilation Convolution Feature Fusion (MsDC-FF) Technique for CNN-Based Black Ice Detection

  • Sun-Kyoung KANG
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.3
    • /
    • pp.17-22
    • /
    • 2023
  • In this paper, we propose a black ice detection system using Convolutional Neural Networks (CNNs). Black ice poses a serious threat to road safety, particularly during winter conditions. To overcome this problem, we introduce a CNN-based architecture for real-time black ice detection with an encoder-decoder network, specifically designed for real-time black ice detection using thermal images. To train the network, we establish a specialized experimental platform to capture thermal images of various black ice formations on diverse road surfaces, including cement and asphalt. This enables us to curate a comprehensive dataset of thermal road black ice images for a training and evaluation purpose. Additionally, in order to enhance the accuracy of black ice detection, we propose a multi-scale dilation convolution feature fusion (MsDC-FF) technique. This proposed technique dynamically adjusts the dilation ratios based on the input image's resolution, improving the network's ability to capture fine-grained details. Experimental results demonstrate the superior performance of our proposed network model compared to conventional image segmentation models. Our model achieved an mIoU of 95.93%, while LinkNet achieved an mIoU of 95.39%. Therefore, it is concluded that the proposed model in this paper could offer a promising solution for real-time black ice detection, thereby enhancing road safety during winter conditions.

Developing radar-based rainfall prediction model with GAN(Generative Adversarial Network) (생성적 적대 신경망(GAN)을 활용한 강우예측모델 개발)

  • Choi, Suyeon;Sohn, Soyoung;Kim, Yeonjoo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.185-185
    • /
    • 2021
  • 기후변화로 인한 돌발 강우 등 이상 기후 현상이 증가함에 따라 정확한 강우예측의 중요성은 더 증가하는 추세이다. 전통적인 강우예측의 경우 기상수치모델 또는 외삽법을 이용한 레이더 기반 강우예측 기법을 이용하며, 최근 머신러닝 기술의 발달에 따라 이를 활용한 레이더 자료기반 강우예측기법이 개발되고 있다. 기존 머신러닝을 이용한 강우예측 모델의 경우 주로 시계열 이미지 예측에 적합한 2차원 순환 신경망 기반 기법(Convolutional Long Short-Term Memory, ConvLSTM) 또는 합성곱 신경망 기반 기법(Convolutional Neural Network(CNN) Encoder-Decoder) 등을 이용한다. 본 연구에서는 생성적 적대 신경망 기반 기법(Generative Adversarial Network, GAN)을 이용해 미래 강우예측을 수행하도록 하였다. GAN 방법론은 이미지를 생성하는 생성자와 이를 실제 이미지와 구분하는 구별자가 경쟁하며 학습되어 현재 이미지 생성 분야에서 높은 성능을 보여주고 있다. 본 연구에서 개발한 GAN 기반 모델은 기상청에서 제공된 2016년~2019년까지의 레이더 이미지 자료를 이용하여 초단기, 단기 강우예측을 수행하도록 학습시키고, 2020년 레이더 이미지 자료를 이용해 단기강우예측을 모의하였다. 또한, 기존 머신러닝 기법을 기반으로 한 모델들의 강우예측결과와 GAN 기반 모델의 강우예측결과를 비교분석한 결과, 본 연구를 통해 개발한 강우예측모델이 단기강우예측에 뛰어난 성능을 보이는 것을 확인할 수 있었다.

  • PDF

Joint Training of Neural Image Compression and Super Resolution Model (신경망 이미지 부호화 모델과 초해상화 모델의 합동훈련)

  • Cho, Hyun Dong;Kim, YeongWoong;Cha, Junyeong;Kim, DongHyun;Lim, Sung Chang;Kim, Hui Yong
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.1191-1194
    • /
    • 2022
  • 인터넷의 발전으로 수많은 이미지와 비디오를 손쉽게 이용할 수 있게 되었다. 이미지와 비디오 데이터의 양이 기하급수적으로 증가함에 따라, JPEG, HEVC, VVC 등 이미지와 비디오를 효율적으로 저장하기 위한 부호화 기술들이 등장했다. 최근에는 인공신경망을 활용한 학습 기반 모델이 발전함에 따라, 이를 활용한 이미지 및 비디오 압축 기술에 관한 연구가 빠르게 진행되고 있다. NNIC (Neural Network based Image Coding)는 이러한 학습 가능한 인공신경망 기반 이미지 부호화 기술을 의미한다. 본 논문에서는 NNIC 모델과 인공신경망 기반의 초해상화(Super Resolution) 모델을 합동훈련하여 기존 NNIC 모델보다 더 높은 성능을 보일 수 있는 방법을 제시한다. 먼저 NNIC 인코더(Encoder)에 이미지를 입력하기 전 다운 스케일링(Down Scaling)으로 쌍삼차보간법을 사용하여 이미지의 화소를 줄인 후 부호화(Encoding)한다. NNIC 디코더(Decoder)를 통해 부호화된 이미지를 복호화(Decoding)하고 업 스케일링으로 초해상화를 통해 복호화된 이미지를 원본 이미지로 복원한다. 이때 NNIC 모델과 초해상화 모델을 합동훈련한다. 결과적으로 낮은 비트량에서 더 높은 성능을 볼 수 있는 가능성을 보았다. 또한 합동훈련을 함으로써 전체 성능의 향상을 보아 학습 시간을 늘리고, 압축 잡음을 위한 초해상화 모델을 사용한다면 기존의 NNIC 보다 나은 성능을 보일 수 있는 가능성을 시사한다.

  • PDF