• Title/Summary/Keyword: Spatial attention mechanism

Search Result 42, Processing Time 0.021 seconds

Crack detection based on ResNet with spatial attention

  • Yang, Qiaoning;Jiang, Si;Chen, Juan;Lin, Weiguo
    • Computers and Concrete
    • /
    • v.26 no.5
    • /
    • pp.411-420
    • /
    • 2020
  • Deep Convolution neural network (DCNN) has been widely used in the healthy maintenance of civil infrastructure. Using DCNN to improve crack detection performance has attracted many researchers' attention. In this paper, a light-weight spatial attention network module is proposed to strengthen the representation capability of ResNet and improve the crack detection performance. It utilizes attention mechanism to strengthen the interested objects in global receptive field of ResNet convolution layers. Global average spatial information over all channels are used to construct an attention scalar. The scalar is combined with adaptive weighted sigmoid function to activate the output of each channel's feature maps. Salient objects in feature maps are refined by the attention scalar. The proposed spatial attention module is stacked in ResNet50 to detect crack. Experiments results show that the proposed module can got significant performance improvement in crack detection.

Intra Prediction Method for Depth Picture Using CNN and Attention Mechanism (CNN과 Attention을 통한 깊이 화면 내 예측 방법)

  • Jae-hyuk Yoon;Dong-seok Lee;Byoung-ju Yun;Soon-kak Kwon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.2
    • /
    • pp.35-45
    • /
    • 2024
  • In this paper, we propose an intra prediction method for depth picture using CNN and Attention mechanism. The proposed method allows each pixel in a block to predict to select pixels among reference area. Spatial features in the vertical and horizontal directions for reference pixels are extracted from the top and left areas adjacent to the block, respectively, through a CNN layer. The two spatial features are merged into the feature direction and the spatial direction to predict features for the prediction block and reference pixels, respectively. the correlation between the prediction block and the reference pixel is predicted through attention mechanism. The predicted correlations are restored to the pixel domain through CNN layers to predict the pixels in the block. The average prediction error of intra prediction is reduced by 5.8% when the proposed method is added to VVC intra modes.

MSaGAN: Improved SaGAN using Guide Mask and Multitask Learning Approach for Facial Attribute Editing

  • Yang, Hyeon Seok;Han, Jeong Hoon;Moon, Young Shik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.5
    • /
    • pp.37-46
    • /
    • 2020
  • Recently, studies of facial attribute editing have obtained realistic results using generative adversarial net (GAN) and encoder-decoder structure. Spatial attention GAN (SaGAN), one of the latest researches, is the method that can change only desired attribute in a face image by spatial attention mechanism. However, sometimes unnatural results are obtained due to insufficient information on face areas. In this paper, we propose an improved SaGAN (MSaGAN) using a guide mask for learning and applying multitask learning approach to improve the limitations of the existing methods. Through extensive experiments, we evaluated the results of the facial attribute editing in therms of the mask loss function and the neural network structure. It has been shown that the proposed method can efficiently produce more natural results compared to the previous methods.

Semi-Supervised Spatial Attention Method for Facial Attribute Editing

  • Yang, Hyeon Seok;Han, Jeong Hoon;Moon, Young Shik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3685-3707
    • /
    • 2021
  • In recent years, facial attribute editing has been successfully used to effectively change face images of various attributes based on generative adversarial networks and encoder-decoder models. However, existing models have a limitation in that they may change an unintended part in the process of changing an attribute or may generate an unnatural result. In this paper, we propose a model that improves the learning of the attention mask by adding a spatial attention mechanism based on the unified selective transfer network (referred to as STGAN) using semi-supervised learning. The proposed model can edit multiple attributes while preserving details independent of the attributes being edited. This study makes two main contributions to the literature. First, we propose an encoder-decoder model structure that learns and edits multiple facial attributes and suppresses distortion using an attention mask. Second, we define guide masks and propose a method and an objective function that use the guide masks for multiple facial attribute editing through semi-supervised learning. Through qualitative and quantitative evaluations of the experimental results, the proposed method was proven to yield improved results that preserve the image details by suppressing unintended changes than existing methods.

The Effect of Spatial Attention in Hangul Word Recognition: Depending on Visual Factors (한글 단어 재인에서 시각적 요인에 따른 공간주의의 영향)

  • Ko Eun Lee;Hye-Won Lee
    • Korean Journal of Cognitive Science
    • /
    • v.34 no.1
    • /
    • pp.1-20
    • /
    • 2023
  • In this study, we examined the effects of spatial attention in Hangul word recognition depending on visual factors. The visual complexity of words (Experiment 1) and contrast (Experiment 2) were manipulated to examine whether the effect of spatial attention differs depending on visual quality. Participants responded to words with and without codas in experiment 1 and words in high-contrast and low-contrast conditions in experiment 2. The effects of spatial attention were investigated by calculating the difference in performance between the condition where spatial cues were given at the target location (valid trial) and the condition where the spatial cues were not given at the target location (invalid trial) as the cuing effects. As a result, the cuing effects were similar depending on the complexity of the words. It indicates that the effects of spatial attention were not different across the visual complexity conditions. The cuing effects were greater in the low-contrast condition than in the high-contrast condition. The greater effect of spatial attention when the contrast is low was explained as a mechanism of signal enhancement.

Image Captioning with Synergy-Gated Attention and Recurrent Fusion LSTM

  • Yang, You;Chen, Lizhi;Pan, Longyue;Hu, Juntao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3390-3405
    • /
    • 2022
  • Long Short-Term Memory (LSTM) combined with attention mechanism is extensively used to generate semantic sentences of images in image captioning models. However, features of salient regions and spatial information are not utilized sufficiently in most related works. Meanwhile, the LSTM also suffers from the problem of underutilized information in a single time step. In the paper, two innovative approaches are proposed to solve these problems. First, the Synergy-Gated Attention (SGA) method is proposed, which can process the spatial features and the salient region features of given images simultaneously. SGA establishes a gated mechanism through the global features to guide the interaction of information between these two features. Then, the Recurrent Fusion LSTM (RF-LSTM) mechanism is proposed, which can predict the next hidden vectors in one time step and improve linguistic coherence by fusing future information. Experimental results on the benchmark dataset of MSCOCO show that compared with the state-of-the-art methods, the proposed method can improve the performance of image captioning model, and achieve competitive performance on multiple evaluation indicators.

A Neural Network Model for Visual Selection: Top-down mechanism of Feature Gate model (시각적 선택에 대한 신경 망 모형FeatureGate 모형의 하향식 기제)

  • 김민식
    • Korean Journal of Cognitive Science
    • /
    • v.10 no.3
    • /
    • pp.1-15
    • /
    • 1999
  • Based on known physiological and psychophysical results, a neural network model for visual selection, called FeaureGate is proposed. The model consists of a hierarchy of spatial maps. and the flow of information from each level of the hierarchy to the next is controlled by attentional gates. The gates are jointly controlled by a bottom-up system favoring locations with unique features. and a top-down mechanism favoring locations with features designated as target features. The present study focuses on the top-down mechanism of the FeatureGate model that produces results similar to Moran and Desimone's (1985), which many current models have failed to explain, The FeatureGate model allows a consistent interpretation of many different experimental results in visual attention. including parallel feature searches and serial conjunction searches. attentional gradients triggered by cuing, feature-driven spatial selection, split a attention, inhibition of distractor locations, and flanking inhibition. This framework can be extended to produce a model of shape recognition using upper-level units that respond to configurations of features.

  • PDF

STAGCN-based Human Action Recognition System for Immersive Large-Scale Signage Content (몰입형 대형 사이니지 콘텐츠를 위한 STAGCN 기반 인간 행동 인식 시스템)

  • Jeongho Kim;Byungsun Hwang;Jinwook Kim;Joonho Seon;Young Ghyu Sun;Jin Young Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.6
    • /
    • pp.89-95
    • /
    • 2023
  • In recent decades, human action recognition (HAR) has demonstrated potential applications in sports analysis, human-robot interaction, and large-scale signage content. In this paper, spatial temporal attention graph convolutional network (STAGCN)-based HAR system is proposed. Spatioal-temmporal features of skeleton sequences are assigned different weights by STAGCN, enabling the consideration of key joints and viewpoints. From simulation results, it has been shown that the performance of the proposed model can be improved in terms of classification accuracy in the NTU RGB+D dataset.

EDMFEN: Edge detection-based multi-scale feature enhancement Network for low-light image enhancement

  • Canlin Li;Shun Song;Pengcheng Gao;Wei Huang;Lihua Bi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.4
    • /
    • pp.980-997
    • /
    • 2024
  • To improve the brightness of images and reveal hidden information in dark areas is the main objective of low-light image enhancement (LLIE). LLIE methods based on deep learning show good performance. However, there are some limitations to these methods, such as the complex network model requires highly configurable environments, and deficient enhancement of edge details leads to blurring of the target content. Single-scale feature extraction results in the insufficient recovery of the hidden content of the enhanced images. This paper proposed an edge detection-based multi-scale feature enhancement network for LLIE (EDMFEN). To reduce the loss of edge details in the enhanced images, an edge extraction module consisting of a Sobel operator is introduced to obtain edge information by computing gradients of images. In addition, a multi-scale feature enhancement module (MSFEM) consisting of multi-scale feature extraction block (MSFEB) and a spatial attention mechanism is proposed to thoroughly recover the hidden content of the enhanced images and obtain richer features. Since the fused features may contain some useless information, the MSFEB is introduced so as to obtain the image features with different perceptual fields. To use the multi-scale features more effectively, a spatial attention mechanism module is used to retain the key features and improve the model performance after fusing multi-scale features. Experimental results on two datasets and five baseline datasets show that EDMFEN has good performance when compared with the stateof-the-art LLIE methods.

Electroencephalogram-based emotional stress recognition according to audiovisual stimulation using spatial frequency convolutional gated transformer (공간 주파수 합성곱 게이트 트랜스포머를 이용한 시청각 자극에 따른 뇌전도 기반 감정적 스트레스 인식)

  • Kim, Hyoung-Gook;Jeong, Dong-Ki;Kim, Jin Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.5
    • /
    • pp.518-524
    • /
    • 2022
  • In this paper, we propose a method for combining convolutional neural networks and attention mechanism to improve the recognition performance of emotional stress from Electroencephalogram (EGG) signals. In the proposed method, EEG signals are decomposed into five frequency domains, and spatial information of EEG features is obtained by applying a convolutional neural network layer to each frequency domain. As a next step, salient frequency information is learned in each frequency band using a gate transformer-based attention mechanism, and complementary frequency information is further learned through inter-frequency mapping to reflect it in the final attention representation. Through an EEG stress recognition experiment involving a DEAP dataset and six subjects, we show that the proposed method is effective in improving EEG-based stress recognition performance compared to the existing methods.