Search | Korea Science

A study on skip-connection with time-frequency self-attention for improving speech enhancement based on complex-valued spectrum (복소 스펙트럼 기반 음성 향상의 성능 향상을 위한 time-frequency self-attention 기반 skip-connection 기법 연구)

Jaehee Jung;Wooil Kim
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.2
- /
- pp.94-101
- /
- 2023
A deep neural network composed of encoders and decoders, such as U-Net, used for speech enhancement, concatenates the encoder to the decoder through skip-connection. Skip-connection helps reconstruct the enhanced spectrum and complement the lost information. The features of the encoder and the decoder connected by the skip-connection are incompatible with each other. In this paper, for complex-valued spectrum based speech enhancement, Self-Attention (SA) method is applied to skip-connection to transform the feature of encoder to be compatible with the features of decoder. SA is a technique in which when generating an output sequence in a sequence-to-sequence tasks the weighted average of input is used to put attention on subsets of input, showing that noise can be effectively eliminated by being applied in speech enhancement. The three models using encoder and decoder features to apply SA to skip-connection are studied. As experimental results using TIMIT database, the proposed methods show improvements in all evaluation metrics compared to the Deep Complex U-Net (DCUNET) with skip-connection only.
https://doi.org/10.7776/ASK.2023.42.2.094 인용 PDF

Multi-level Skip Connection for Nested U-Net-based Speech Enhancement (중첩 U-Net 기반 음성 향상을 위한 다중 레벨 Skip Connection)

Seorim, Hwang;Joon, Byun;Junyeong, Heo;Jaebin, Cha;Youngcheol, Park
- Journal of Broadcast Engineering
- /
- v.27 no.6
- /
- pp.840-847
- /
- 2022
In a deep neural network (DNN)-based speech enhancement, using global and local input speech information is closely related to model performance. Recently, a nested U-Net structure that utilizes global and local input data information using multi-scale has bee n proposed. This nested U-Net was also applied to speech enhancement and showed outstanding performance. However, a single skip connection used in nested U-Nets must be modified for the nested structure. In this paper, we propose a multi-level skip connection (MLS) to optimize the performance of the nested U-Net-based speech enhancement algorithm. As a result, the proposed MLS showed excellent performance improvement in various objective evaluation metrics compared to the standard skip connection, which means th at the MLS can optimize the performance of the nested U-Net-based speech enhancement algorithm. In addition, the final proposed m odel showed superior performance compared to other DNN-based speech enhancement models.
https://doi.org/10.5909/JBE.2022.27.6.840 인용 PDF KSCI KPUBS

Investigating the Feature Collection for Semantic Segmentation via Single Skip Connection (깊은 신경망에서 단일 중간층 연결을 통한 물체 분할 능력의 심층적 분석)

Yim, Jonghwa;Sohn, Kyung-Ah
- Journal of KIISE
- /
- v.44 no.12
- /
- pp.1282-1289
- /
- 2017
Since the study of deep convolutional neural network became prevalent, one of the important discoveries is that a feature map from a convolutional network can be extracted before going into the fully connected layer and can be used as a saliency map for object detection. Furthermore, the model can use features from each different layer for accurate object detection: the features from different layers can have different properties. As the model goes deeper, it has many latent skip connections and feature maps to elaborate object detection. Although there are many intermediate layers that we can use for semantic segmentation through skip connection, still the characteristics of each skip connection and the best skip connection for this task are uncertain. Therefore, in this study, we exhaustively research skip connections of state-of-the-art deep convolutional networks and investigate the characteristics of the features from each intermediate layer. In addition, this study would suggest how to use a recent deep neural network model for semantic segmentation and it would therefore become a cornerstone for later studies with the state-of-the-art network models.
https://doi.org/10.5626/JOK.2017.44.12.1282 인용 KSCI

Clustering Performance Analysis of Autoencoder with Skip Connection (스킵연결이 적용된 오토인코더 모델의 클러스터링 성능 분석)

Jo, In-su;Kang, Yunhee;Choi, Dong-bin;Park, Young B.
- KIPS Transactions on Software and Data Engineering
- /
- v.9 no.12
- /
- pp.403-410
- /
- 2020
In addition to the research on noise removal and super-resolution using the data restoration (Output result) function of Autoencoder, research on the performance improvement of clustering using the dimension reduction function of autoencoder are actively being conducted. The clustering function and data restoration function using Autoencoder have common points that both improve performance through the same learning. Based on these characteristics, this study conducted an experiment to see if the autoencoder model designed to have excellent data recovery performance is superior in clustering performance. Skip connection technique was used to design autoencoder with excellent data recovery performance. The output result performance and clustering performance of both autoencoder model with Skip connection and model without Skip connection were shown as graph and visual extract. The output result performance was increased, but the clustering performance was decreased. This result indicates that the neural network models such as autoencoders are not sure that each layer has learned the characteristics of the data well if the output result is good. Lastly, the performance degradation of clustering was compensated by using both latent code and skip connection. This study is a prior study to solve the Hanja Unicode problem by clustering.
https://doi.org/10.3745/KTSDE.2020.9.12.403 인용 PDF KSCI

2D and 3D Hand Pose Estimation Based on Skip Connection Form (스킵 연결 형태 기반의 손 관절 2D 및 3D 검출 기법)

Ku, Jong-Hoe;Kim, Mi-Kyung;Cha, Eui-Young
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.24 no.12
- /
- pp.1574-1580
- /
- 2020
Traditional pose estimation methods include using special devices or images through image processing. The disadvantage of using a device is that the environment in which the device can be used is limited and costly. The use of cameras and image processing has the advantage of reducing environmental constraints and costs, but the performance is lower. CNN(Convolutional Neural Networks) were studied for pose estimation just using only camera without these disadvantage. Various techniques were proposed to increase cognitive performance. In this paper, the effect of the skip connection on the network was experimented by using various skip connections on the joint recognition of the hand. Experiments have confirmed that the presence of additional skip connections other than the basic skip connections has a better effect on performance, but the network with downward skip connections is the best performance.
https://doi.org/10.6109/jkiice.2020.24.12.1574 인용 PDF KSCI

Performance comparative evaluation of Two-level skip connection for nested U-Net-based noise cancellation (Nested U-Net 기반 잡음 제거를 위한 two-level skip connection 제안 및 성능 비교 평가)

Hwang, Seorim;Byun, Joon;Heo, Junyeong;Cha, Jaebin;Park, Youngcheol
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2022.06a
- /
- pp.228-230
- /
- 2022
본 논문은 최근 잡음 제거에서 우수한 성능을 보인 Nested U-Net의 성능을 최적화하기 위하여 두 단계로 이루어진 two-level skip connection (TLS)을 제안하였다. 이때, 인코더와 디코더의 경로를 다르게 하여 다양한 형태의 TLS을 제안하고 각 형태의 성능을 비교 평가하였다. 또한, 가장 좋은 성능을 보인 두 개의 경로를 조합하여 최종 Nested U-Net 기반 모델을 제안하였다. 제안된 모델은 다른 잡음 제거 모델과 비교하여 객관적인 평가 지표에서 매우 우수한 성능을 보인다.
PDF

Single Image Super-resolution using Recursive Residual Architecture Via Dense Skip Connections (고밀도 스킵 연결을 통한 재귀 잔차 구조를 이용한 단일 이미지 초해상도 기법)

Chen, Jian;Jeong, Jechang
- Journal of Broadcast Engineering
- /
- v.24 no.4
- /
- pp.633-642
- /
- 2019
Recently, the convolution neural network (CNN) model at a single image super-resolution (SISR) have been very successful. The residual learning method can improve training stability and network performance in CNN. In this paper, we propose a SISR using recursive residual network architecture by introducing dense skip connections for learning nonlinear mapping from low-resolution input image to high-resolution target image. The proposed SISR method adopts a method of the recursive residual learning to mitigate the difficulty of the deep network training and remove unnecessary modules for easier to optimize in CNN layers because of the concise and compact recursive network via dense skip connection method. The proposed method not only alleviates the vanishing-gradient problem of a very deep network, but also get the outstanding performance with low complexity of neural network, which allows the neural network to perform training, thereby exhibiting improved performance of SISR method.
https://doi.org/10.5909/JBE.2019.24.4.633 인용 PDF KSCI KPUBS HTML

A Model Compression for Super Resolution Multi Scale Residual Networks based on a Layer-wise Quantization (계층별 양자화 기반 초해상화 다중 스케일 잔차 네트워크 압축)

Hwang, Jiwon;Bae, Sung-Ho
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2020.07a
- /
- pp.540-543
- /
- 2020
기존의 초해상도 딥러닝 기법은 모델의 깊이가 깊어지면서, 좋은 성능을 내지만 점점 더 복잡해지고 있고, 실제로 사용하는데 있어 많은 시간을 요구한다. 이를 해결하기 위해, 우리는 딥러닝 모델의 가중치를 양자화 하여 추론시간을 줄이고자 한다. 초해상도 모델은 feature extraction, non-linear mapping, reconstruction 세 부분으로 나누어져 있으며, 레이어 사이에 많은 skip-connection 이 존재하는 특징이 있다. 따라서 양자화 시 최종 성능 하락에 미치는 영향력이 레이어 별로 다르며, 이를 감안하여 강화학습으로 레이어 별 최적 bit 를 찾아 성능 하락을 최소화한다. 본 논문에서는 Skip-connection 이 많이 존재하는 MSRN 을 사용하였으며, 결과에서 feature extraction, reconstruction 부분과 블록 내 특정 위치의 레이어가 항상 높은 bit 를 가짐을 알 수 있다. 기존에 영상 분류에 한정되어 사용되었던 혼합 bit 양자화를 사용하여 초해상도 딥러닝 기법의 모델 사이즈를 줄인 최초의 논문이며, 제안 방법은 모바일 등 제한된 환경에 적용 가능할 것으로 생각된다.
PDF

SDCN: Synchronized Depthwise Separable Convolutional Neural Network for Single Image Super-Resolution

Muhammad, Wazir;Hussain, Ayaz;Shah, Syed Ali Raza;Shah, Jalal;Bhutto, Zuhaibuddin;Thaheem, Imdadullah;Ali, Shamshad;Masrour, Salman
- International Journal of Computer Science & Network Security
- /
- v.21 no.11
- /
- pp.17-22
- /
- 2021
Recently, image super-resolution techniques used in convolutional neural networks (CNN) have led to remarkable performance in the research area of digital image processing applications and computer vision tasks. Convolutional layers stacked on top of each other can design a more complex network architecture, but they also use more memory in terms of the number of parameters and introduce the vanishing gradient problem during training. Furthermore, earlier approaches of single image super-resolution used interpolation technique as a pre-processing stage to upscale the low-resolution image into HR image. The design of these approaches is simple, but not effective and insert the newer unwanted pixels (noises) in the reconstructed HR image. In this paper, authors are propose a novel single image super-resolution architecture based on synchronized depthwise separable convolution with Dense Skip Connection Block (DSCB). In addition, unlike existing SR methods that only rely on single path, but our proposed method used the synchronizes path for generating the SISR image. Extensive quantitative and qualitative experiments show that our method (SDCN) achieves promising improvements than other state-of-the-art methods.
https://doi.org/10.22937/IJCSNS.2021.21.11.3 인용 PDF KSCI

Face Emotion Recognition using ResNet with Identity-CBAM (Identity-CBAM ResNet 기반 얼굴 감정 식별 모듈)

Oh, Gyutea;Kim, Inki;Kim, Beomjun;Gwak, Jeonghwan
- Proceedings of the Korea Information Processing Society Conference
- /
- 2022.11a
- /
- pp.559-561
- /
- 2022
인공지능 시대에 들어서면서 개인 맞춤형 환경을 제공하기 위하여 사람의 감정을 인식하고 교감하는 기술이 많이 발전되고 있다. 사람의 감정을 인식하는 방법으로는 얼굴, 음성, 신체 동작, 생체 신호 등이 있지만 이 중 가장 직관적이면서도 쉽게 접할 수 있는 것은 표정이다. 따라서, 본 논문에서는 정확도 높은 얼굴 감정 식별을 위해서 Convolution Block Attention Module(CBAM)의 각 Gate와 Residual Block, Skip Connection을 이용한 Identity- CBAM Module을 제안한다. CBAM의 각 Gate와 Residual Block을 이용하여 각각의 표정에 대한 핵심 특징 정보들을 강조하여 Context 한 모델로 변화시켜주는 효과를 가지게 하였으며 Skip-Connection을 이용하여 기울기 소실 및 폭발에 강인하게 해주는 모듈을 제안한다. AI-HUB의 한국인 감정 인식을 위한 복합 영상 데이터 세트를 이용하여 총 6개의 클래스로 구분하였으며, F1-Score, Accuracy 기준으로 Identity-CBAM 모듈을 적용하였을 때 Vanilla ResNet50, ResNet101 대비 F1-Score 0.4~2.7%, Accuracy 0.18~2.03%의 성능 향상을 달성하였다. 또한, Guided Backpropagation과 Guided GradCam을 통해 시각화하였을 때 중요 특징점들을 더 세밀하게 표현하는 것을 확인하였다. 결과적으로 이미지 내 표정 분류 Task에서 Vanilla ResNet50, ResNet101을 사용하는 것보다 Identity-CBAM Module을 함께 사용하는 것이 더 적합함을 입증하였다.
https://doi.org/10.3745/PKIPS.y2022m11a.559 인용 PDF

Search Result 36, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)