• Title/Summary/Keyword: attention mechanism

Search Result 772, Processing Time 0.023 seconds

Region of Interest Detection Based on Visual Attention and Threshold Segmentation in High Spatial Resolution Remote Sensing Images

  • Zhang, Libao;Li, Hao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.8
    • /
    • pp.1843-1859
    • /
    • 2013
  • The continuous increase of the spatial resolution of remote sensing images brings great challenge to image analysis and processing. Traditional prior knowledge-based region detection and target recognition algorithms for processing high resolution remote sensing images generally employ a global searching solution, which results in prohibitive computational complexity. In this paper, a more efficient region of interest (ROI) detection algorithm based on visual attention and threshold segmentation (VA-TS) is proposed, wherein a visual attention mechanism is used to eliminate image segmentation and feature detection to the entire image. The input image is subsampled to decrease the amount of data and the discrete moment transform (DMT) feature is extracted to provide a finer description of the edges. The feature maps are combined with weights according to the amount of the "strong points" and the "salient points". A threshold segmentation strategy is employed to obtain more accurate region of interest shape information with the very low computational complexity. Experimental statistics have shown that the proposed algorithm is computational efficient and provide more visually accurate detection results. The calculation time is only about 0.7% of the traditional Itti's model.

MLSE-Net: Multi-level Semantic Enriched Network for Medical Image Segmentation

  • Di Gai;Heng Luo;Jing He;Pengxiang Su;Zheng Huang;Song Zhang;Zhijun Tu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2458-2482
    • /
    • 2023
  • Medical image segmentation techniques based on convolution neural networks indulge in feature extraction triggering redundancy of parameters and unsatisfactory target localization, which outcomes in less accurate segmentation results to assist doctors in diagnosis. In this paper, we propose a multi-level semantic-rich encoding-decoding network, which consists of a Pooling-Conv-Former (PCFormer) module and a Cbam-Dilated-Transformer (CDT) module. In the PCFormer module, it is used to tackle the issue of parameter explosion in the conservative transformer and to compensate for the feature loss in the down-sampling process. In the CDT module, the Cbam attention module is adopted to highlight the feature regions by blending the intersection of attention mechanisms implicitly, and the Dilated convolution-Concat (DCC) module is designed as a parallel concatenation of multiple atrous convolution blocks to display the expanded perceptual field explicitly. In addition, MultiHead Attention-DwConv-Transformer (MDTransformer) module is utilized to evidently distinguish the target region from the background region. Extensive experiments on medical image segmentation from Glas, SIIM-ACR, ISIC and LGG demonstrated that our proposed network outperforms existing advanced methods in terms of both objective evaluation and subjective visual performance.

CAttNet: A Compound Attention Network for Depth Estimation of Light Field Images

  • Dingkang Hua;Qian Zhang;Wan Liao;Bin Wang;Tao Yan
    • Journal of Information Processing Systems
    • /
    • v.19 no.4
    • /
    • pp.483-497
    • /
    • 2023
  • Depth estimation is one of the most complicated and difficult problems to deal with in the light field. In this paper, a compound attention convolutional neural network (CAttNet) is proposed to extract depth maps from light field images. To make more effective use of the sub-aperture images (SAIs) of light field and reduce the redundancy in SAIs, we use a compound attention mechanism to weigh the channel and space of the feature map after extracting the primary features, so it can more efficiently select the required view and the important area within the view. We modified various layers of feature extraction to make it more efficient and useful to extract features without adding parameters. By exploring the characteristics of light field, we increased the network depth and optimized the network structure to reduce the adverse impact of this change. CAttNet can efficiently utilize different SAIs correlations and features to generate a high-quality light field depth map. The experimental results show that CAttNet has advantages in both accuracy and time.

Attention-based deep learning framework for skin lesion segmentation (피부 병변 분할을 위한 어텐션 기반 딥러닝 프레임워크)

  • Afnan Ghafoor;Bumshik Lee
    • Smart Media Journal
    • /
    • v.13 no.3
    • /
    • pp.53-61
    • /
    • 2024
  • This paper presents a novel M-shaped encoder-decoder architecture for skin lesion segmentation, achieving better performance than existing approaches. The proposed architecture utilizes the left and right legs to enable multi-scale feature extraction and is further enhanced by integrating an attention module within the skip connection. The image is partitioned into four distinct patches, facilitating enhanced processing within the encoder-decoder framework. A pivotal aspect of the proposed method is to focus more on critical image features through an attention mechanism, leading to refined segmentation. Experimental results highlight the effectiveness of the proposed approach, demonstrating superior accuracy, precision, and Jaccard Index compared to existing methods

The Effect of Consistency between Represented Location of the Cue and the Target on Attention Mechanism (단서자극과 표적자극의 표상된 위치의 일치성이 주의기제의 작용에 미치는 영향)

  • Seo, Jun-Ho;Li, Hyung-Chul O.
    • Korean Journal of Cognitive Science
    • /
    • v.20 no.4
    • /
    • pp.481-506
    • /
    • 2009
  • The purpose of the present research was to examine whether the attention mechanism employs physical or represented location of the cue and target. To achieve this, we have employed the paradigm of facilitation of response as well as inhibition of return. In the experiments, valid and invalid conditions were defined by the position consistency of the cue and the target in the aspect of either physical or represented location. We used auditory cue and visual target in Experiment 1 while visual cue and auditory target in Experiment 2. As a results, in Experiment 1, effect of facilitation of response in valid condition was found when the valid/invalid conditions were defined in the aspect of represented location. In Experiment 2, effect of facilitation of response in valid condition was found when the valid/invalid conditions were defined in the aspect of represented location. In all the other conditions, no effect was found when the conditions were defined in the aspect of physical location. No effects of inhibition of return were found in Experiment 2. These results imply the possibility that attention mechanism operates based on objects' represented location rather than on their physical location. More importantly, the present research suggests that it is necessary to separate represented location from physical location of the target and the cue in the experiment of facilitation of response and inhibition of return in the future.

  • PDF

An end-to-end synthesis method for Korean text-to-speech systems (한국어 text-to-speech(TTS) 시스템을 위한 엔드투엔드 합성 방식 연구)

  • Choi, Yeunju;Jung, Youngmoon;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.39-48
    • /
    • 2018
  • A typical statistical parametric speech synthesis (text-to-speech, TTS) system consists of separate modules, such as a text analysis module, an acoustic modeling module, and a speech synthesis module. This causes two problems: 1) expert knowledge of each module is required, and 2) errors generated in each module accumulate passing through each module. An end-to-end TTS system could avoid such problems by synthesizing voice signals directly from an input string. In this study, we implemented an end-to-end Korean TTS system using Google's Tacotron, which is an end-to-end TTS system based on a sequence-to-sequence model with attention mechanism. We used 4392 utterances spoken by a Korean female speaker, an amount that corresponds to 37% of the dataset Google used for training Tacotron. Our system obtained mean opinion score (MOS) 2.98 and degradation mean opinion score (DMOS) 3.25. We will discuss the factors which affected training of the system. Experiments demonstrate that the post-processing network needs to be designed considering output language and input characters and that according to the amount of training data, the maximum value of n for n-grams modeled by the encoder should be small enough.

Mechanism and Application Methodology of Mental Practice (정신 연습의 기전과 적용 방법)

  • Kim Jong-soon;Lee Keun-heui;Bae Sung-soo
    • The Journal of Korean Physical Therapy
    • /
    • v.15 no.2
    • /
    • pp.75-84
    • /
    • 2003
  • The purpose of this study was to review of mechanism and application methodology about mental practice. The mental practice is symbolic rehearsal of physical activity in the absence of any gross muscular movements. Human have the ability to generate mental correlates of perceptual and motor events without any triggering external stimulus, a function known as imagery, Practice produces both internal and external sensory consequences which are thought to be essential for learning to occur, It is for this reason that mental practice, rehearsal of skill in imagination rather than by overt physical activity, has intrigued theorists, especially those interested in cognitive process. Several studies in sport psychology have shown that mental practice can be effective in optimizing the execution of movements in athletes and help novice learner in the incremental acquisition of new skilled behaviors. There are many theories of mental practice for explaining the positive effect In skill learning and performance. Most tenable theories are symbolic learning theory, psyconeuromuscular theory, Paivio's theory, regional cerebral blood flow theory, motivation theory, modeling theory, mental and muscle movement nodes theory, insight theory, selective attention theory, and attention-arousal set theory etc.. The factors for influencing to effects of mental practice are application form, application period, time for length of the mental practice, number of repetition, existence of physical practice.

  • PDF

A Knowledge-Based Machine Vision System for Automated Industrial Web Inspection

  • Cho, Tai-Hoon;Jung, Young-Kee;Cho, Hyun-Chan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.1 no.1
    • /
    • pp.13-23
    • /
    • 2001
  • Most current machine vision systems for industrial inspection were developed with one specific task in mind. Hence, these systems are inflexible in the sense that they cannot easily be adapted to other applications. In this paper, a general vision system framework has been developed that can be easily adapted to a variety of industrial web inspection problems. The objective of this system is to automatically locate and identify \\\"defects\\\" on the surface of the material being inspected. This framework is designed to be robust, to be flexible, and to be as computationally simple as possible. To assure robustness this framework employs a combined strategy of top-down and bottom-up control, hierarchical defect models, and uncertain reasoning methods. To make this framework flexible, a modular Blackboard framework is employed. To minimize computational complexity the system incorporates a simple multi-thresholding segmentation scheme, a fuzzy logic focus of attention mechanism for scene analysis operations, and a partitioning if knowledge that allows concurrent parallel processing during recognition.cognition.

  • PDF

A Study of Efficiency Information Filtering System using One-Hot Long Short-Term Memory

  • Kim, Hee sook;Lee, Min Hi
    • International Journal of Advanced Culture Technology
    • /
    • v.5 no.1
    • /
    • pp.83-89
    • /
    • 2017
  • In this paper, we propose an extended method of one-hot Long Short-Term Memory (LSTM) and evaluate the performance on spam filtering task. Most of traditional methods proposed for spam filtering task use word occurrences to represent spam or non-spam messages and all syntactic and semantic information are ignored. Major issue appears when both spam and non-spam messages share many common words and noise words. Therefore, it becomes challenging to the system to filter correct labels between spam and non-spam. Unlike previous studies on information filtering task, instead of using only word occurrence and word context as in probabilistic models, we apply a neural network-based approach to train the system filter for a better performance. In addition to one-hot representation, using term weight with attention mechanism allows classifier to focus on potential words which most likely appear in spam and non-spam collection. As a result, we obtained some improvement over the performances of the previous methods. We find out using region embedding and pooling features on the top of LSTM along with attention mechanism allows system to explore a better document representation for filtering task in general.

Emotion Classification based on EEG signals with LSTM deep learning method (어텐션 메커니즘 기반 Long-Short Term Memory Network를 이용한 EEG 신호 기반의 감정 분류 기법)

  • Kim, Youmin;Choi, Ahyoung
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-10
    • /
    • 2021
  • This study proposed a Long-Short Term Memory network to consider changes in emotion over time, and applied an attention mechanism to give weights to the emotion states that appear at specific moments. We used 32 channel EEG data from DEAP database. A 2-level classification (Low and High) experiment and a 3-level classification experiment (Low, Middle, and High) were performed on Valence and Arousal emotion model. As a result, accuracy of the 2-level classification experiment was 90.1% for Valence and 88.1% for Arousal. The accuracy of 3-level classification was 83.5% for Valence and 82.5% for Arousal.