• Title/Summary/Keyword: Attention U-Net

Search Result 35, Processing Time 0.025 seconds

A study on speech enhancement using complex-valued spectrum employing Feature map Dependent attention gate (특징 맵 중요도 기반 어텐션을 적용한 복소 스펙트럼 기반 음성 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.544-551
    • /
    • 2023
  • Speech enhancement used to improve the perceptual quality and intelligibility of noise speech has been studied as a method using a complex-valued spectrum that can improve both magnitude and phase in a method using a magnitude spectrum. In this paper, a study was conducted on how to apply attention mechanism to complex-valued spectrum-based speech enhancement systems to further improve the intelligibility and quality of noise speech. The attention is performed based on additive attention and allows the attention weight to be calculated in consideration of the complex-valued spectrum. In addition, the global average pooling was used to consider the importance of the feature map. Complex-valued spectrum-based speech enhancement was performed based on the Deep Complex U-Net (DCUNET) model, and additive attention was conducted based on the proposed method in the Attention U-Net model. The results of the experiments on noise speech in a living room environment showed that the proposed method is improved performance over the baseline model according to evaluation metrics such as Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short Time Object Intelligence (STOI), and consistently improved performance across various background noise environments and low Signal-to-Noise Ratio (SNR) conditions. Through this, the proposed speech enhancement system demonstrated its effectiveness in improving the intelligibility and quality of noisy speech.

Precise segmentation of fetal head in ultrasound images using improved U-Net model

  • Vimala Nagabotu;Anupama Namburu
    • ETRI Journal
    • /
    • v.46 no.3
    • /
    • pp.526-537
    • /
    • 2024
  • Monitoring fetal growth in utero is crucial to anomaly diagnosis. However, current computer-vision models struggle to accurately assess the key metrics (i.e., head circumference and occipitofrontal and biparietal diameters) from ultrasound images, largely owing to a lack of training data. Mitigation usually entails image augmentation (e.g., flipping, rotating, scaling, and translating). Nevertheless, the accuracy of our task remains insufficient. Hence, we offer a U-Net fetal head measurement tool that leverages a hybrid Dice and binary cross-entropy loss to compute the similarity between actual and predicted segmented regions. Ellipse-fitted two-dimensional ultrasound images acquired from the HC18 dataset are input, and their lower feature layers are reused for efficiency. During regression, a novel region of interest pooling layer extracts elliptical feature maps, and during segmentation, feature pyramids fuse field-layer data with a new scale attention method to reduce noise. Performance is measured by Dice similarity, mean pixel accuracy, and mean intersection-over-union, giving 97.90%, 99.18%, and 97.81% scores, respectively, which match or outperform the best U-Net models.

COVID-19 Lung CT Image Recognition (COVID-19 폐 CT 이미지 인식)

  • Su, Jingjie;Kim, Kang-Chul
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.3
    • /
    • pp.529-536
    • /
    • 2022
  • In the past two years, Severe Acute Respiratory Syndrome Coronavirus-2(SARS-CoV-2) has been hitting more and more to people. This paper proposes a novel U-Net Convolutional Neural Network to classify and segment COVID-19 lung CT images, which contains Sub Coding Block (SCB), Atrous Spatial Pyramid Pooling(ASPP) and Attention Gate(AG). Three different models such as FCN, U-Net and U-Net-SCB are designed to compare the proposed model and the best optimizer and atrous rate are chosen for the proposed model. The simulation results show that the proposed U-Net-MMFE has the best Dice segmentation coefficient of 94.79% for the COVID-19 CT scan digital image dataset compared with other segmentation models when atrous rate is 12 and the optimizer is Adam.

A three-stage deep-learning-based method for crack detection of high-resolution steel box girder image

  • Meng, Shiqiao;Gao, Zhiyuan;Zhou, Ying;He, Bin;Kong, Qingzhao
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.29-39
    • /
    • 2022
  • Crack detection plays an important role in the maintenance and protection of steel box girder of bridges. However, since the cracks only occupy an extremely small region of the high-resolution images captured from actual conditions, the existing methods cannot deal with this kind of image effectively. To solve this problem, this paper proposed a novel three-stage method based on deep learning technology and morphology operations. The training set and test set used in this paper are composed of 360 images (4928 × 3264 pixels) in steel girder box. The first stage of the proposed model converted high-resolution images into sub-images by using patch-based method and located the region of cracks by CBAM ResNet-50 model. The Recall reaches 0.95 on the test set. The second stage of our method uses the Attention U-Net model to get the accurate geometric edges of cracks based on results in the first stage. The IoU of the segmentation model implemented in this stage attains 0.48. In the third stage of the model, we remove the wrong-predicted isolated points in the predicted results through dilate operation and outlier elimination algorithm. The IoU of test set ascends to 0.70 after this stage. Ablation experiments are conducted to optimize the parameters and further promote the accuracy of the proposed method. The result shows that: (1) the best patch size of sub-images is 1024 × 1024. (2) the CBAM ResNet-50 and the Attention U-Net achieved the best results in the first and the second stage, respectively. (3) Pre-training the model of the first two stages can improve the IoU by 2.9%. In general, our method is of great significance for crack detection.

Automatic crack detection of dam concrete structures based on deep learning

  • Zongjie Lv;Jinzhang Tian;Yantao Zhu;Yangtao Li
    • Computers and Concrete
    • /
    • v.32 no.6
    • /
    • pp.615-623
    • /
    • 2023
  • Crack detection is an essential method to ensure the safety of dam concrete structures. Low-quality crack images of dam concrete structures limit the application of neural network methods in crack detection. This research proposes a modified attentional mechanism model to reduce the disturbance caused by uneven light, shadow, and water spots in crack images. Also, the focal loss function solves the small ratio of crack information. The dataset collects from the network, laboratory and actual inspection dataset of dam concrete structures. This research proposes a novel method for crack detection of dam concrete structures based on the U-Net neural network, namely AF-UNet. A mutual comparison of OTSU, Canny, region growing, DeepLab V3+, SegFormer, U-Net, and AF-UNet (proposed) verified the detection accuracy. A binocular camera detects cracks in the experimental scene. The smallest measurement width of the system is 0.27 mm. The potential goal is to achieve real-time detection and localization of cracks in dam concrete structures.

Deep learning-based post-disaster building inspection with channel-wise attention and semi-supervised learning

  • Wen Tang;Tarutal Ghosh Mondal;Rih-Teng Wu;Abhishek Subedi;Mohammad R. Jahanshahi
    • Smart Structures and Systems
    • /
    • v.31 no.4
    • /
    • pp.365-381
    • /
    • 2023
  • The existing vision-based techniques for inspection and condition assessment of civil infrastructure are mostly manual and consequently time-consuming, expensive, subjective, and risky. As a viable alternative, researchers in the past resorted to deep learning-based autonomous damage detection algorithms for expedited post-disaster reconnaissance of structures. Although a number of automatic damage detection algorithms have been proposed, the scarcity of labeled training data remains a major concern. To address this issue, this study proposed a semi-supervised learning (SSL) framework based on consistency regularization and cross-supervision. Image data from post-earthquake reconnaissance, that contains cracks, spalling, and exposed rebars are used to evaluate the proposed solution. Experiments are carried out under different data partition protocols, and it is shown that the proposed SSL method can make use of unlabeled images to enhance the segmentation performance when limited amount of ground truth labels are provided. This study also proposes DeepLab-AASPP and modified versions of U-Net++ based on channel-wise attention mechanism to better segment the components and damage areas from images of reinforced concrete buildings. The channel-wise attention mechanism can effectively improve the performance of the network by dynamically scaling the feature maps so that the networks can focus on more informative feature maps in the concatenation layer. The proposed DeepLab-AASPP achieves the best performance on component segmentation and damage state segmentation tasks with mIoU scores of 0.9850 and 0.7032, respectively. For crack, spalling, and rebar segmentation tasks, modified U-Net++ obtains the best performance with Igou scores (excluding the background pixels) of 0.5449, 0.9375, and 0.5018, respectively. The proposed architectures win the second place in IC-SHM2021 competition in all five tasks of Project 2.

SKU-Net: Improved U-Net using Selective Kernel Convolution for Retinal Vessel Segmentation

  • Hwang, Dong-Hwan;Moon, Gwi-Seong;Kim, Yoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.29-37
    • /
    • 2021
  • In this paper, we propose a deep learning-based retinal vessel segmentation model for handling multi-scale information of fundus images. we integrate the selective kernel convolution into U-Net-based convolutional neural network. The proposed model extracts and segment features information with various shapes and sizes of retinal blood vessels, which is important information for diagnosing eye-related diseases from fundus images. The proposed model consists of standard convolutions and selective kernel convolutions. While the standard convolutional layer extracts information through the same size kernel size, The selective kernel convolution extracts information from branches with various kernel sizes and combines them by adaptively adjusting them through split-attention. To evaluate the performance of the proposed model, we used the DRIVE and CHASE DB1 datasets and the proposed model showed F1 score of 82.91% and 81.71% on both datasets respectively, confirming that the proposed model is effective in segmenting retinal blood vessels.

Lightweight high-precision pedestrian tracking algorithm in complex occlusion scenarios

  • Qiang Gao;Zhicheng He;Xu Jia;Yinghong Xie;Xiaowei Han
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.3
    • /
    • pp.840-860
    • /
    • 2023
  • Aiming at the serious occlusion and slow tracking speed in pedestrian target tracking and recognition in complex scenes, a target tracking method based on improved YOLO v5 combined with Deep SORT is proposed. By merging the attention mechanism ECA-Net with the Neck part of the YOLO v5 network, using the CIoU loss function and the method of CIoU non-maximum value suppression, connecting the Deep SORT model using Shuffle Net V2 as the appearance feature extraction network to achieve lightweight and fast speed tracking and the purpose of improving tracking under occlusion. A large number of experiments show that the improved YOLO v5 increases the average precision by 1.3% compared with other algorithms. The improved tracking model, MOTA reaches 54.3% on the MOT17 pedestrian tracking data, and the tracking accuracy is 3.7% higher than the related algorithms and The model presented in this paper improves the FPS by nearly 5 on the fps indicator.

Change Detection Using Deep Learning Based Semantic Segmentation for Nuclear Activity Detection and Monitoring (핵 활동 탐지 및 감시를 위한 딥러닝 기반 의미론적 분할을 활용한 변화 탐지)

  • Song, Ahram;Lee, Changhui;Lee, Jinmin;Han, Youkyung
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.991-1005
    • /
    • 2022
  • Satellite imaging is an effective supplementary data source for detecting and verifying nuclear activity. It is also highly beneficial in regions with limited access and information, such as nuclear installations. Time series analysis, in particular, can identify the process of preparing for the conduction of a nuclear experiment, such as relocating equipment or changing facilities. Differences in the semantic segmentation findings of time series photos were employed in this work to detect changes in meaningful items connected to nuclear activity. Building, road, and small object datasets made of KOMPSAT 3/3A photos given by AIHub were used to train deep learning models such as U-Net, PSPNet, and Attention U-Net. To pick relevant models for targets, many model parameters were adjusted. The final change detection was carried out by including object information into the first change detection, which was obtained as the difference in semantic segmentation findings. The experiment findings demonstrated that the suggested approach could effectively identify altered pixels. Although the suggested approach is dependent on the accuracy of semantic segmentation findings, it is envisaged that as the dataset for the region of interest grows in the future, so will the relevant scope of the proposed method.

U-Net Cloud Detection for the SPARCS Cloud Dataset from Landsat 8 Images (Landsat 8 기반 SPARCS 데이터셋을 이용한 U-Net 구름탐지)

  • Kang, Jonggu;Kim, Geunah;Jeong, Yemin;Kim, Seoyeon;Youn, Youjeong;Cho, Soobin;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.5_1
    • /
    • pp.1149-1161
    • /
    • 2021
  • With a trend of the utilization of computer vision for satellite images, cloud detection using deep learning also attracts attention recently. In this study, we conducted a U-Net cloud detection modeling using SPARCS (Spatial Procedures for Automated Removal of Cloud and Shadow) Cloud Dataset with the image data augmentation and carried out 10-fold cross-validation for an objective assessment of the model. Asthe result of the blind test for 1800 datasets with 512 by 512 pixels, relatively high performance with the accuracy of 0.821, the precision of 0.847, the recall of 0.821, the F1-score of 0.831, and the IoU (Intersection over Union) of 0.723. Although 14.5% of actual cloud shadows were misclassified as land, and 19.7% of actual clouds were misidentified as land, this can be overcome by increasing the quality and quantity of label datasets. Moreover, a state-of-the-art DeepLab V3+ model and the NAS (Neural Architecture Search) optimization technique can help the cloud detection for CAS500 (Compact Advanced Satellite 500) in South Korea.