DOI QR코드

DOI QR Code

Deep Learning based Raw Audio Signal Bandwidth Extension System

딥러닝 기반 음향 신호 대역 확장 시스템

  • Kim, Yun-Su (Dept. of Information and Communication Engineering, Changwon National University) ;
  • Seok, Jong-Won (Dept. of Information and Communication Engineering, Changwon National University)
  • Received : 2020.11.30
  • Accepted : 2020.12.16
  • Published : 2020.12.31

Abstract

Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.

대역 확장(Bandwidth Extension)이란 채널 용량 부족 혹은 이동통신 기기에 탑재된 코덱의 특성으로 인해 부호화 및 복호화 과정에서 대역 제한(band limited)되거나 손상된 협대역 신호(NB, Narrow Band)를 복원, 확장하여 광대역 신호(WB, Wide Band)로 전환 시켜주는 것을 의미한다. 대역 확장 연구는 주로 음성 신호 위주로 대역 복제(SBR, Spectral Band Replication), IGF(Intelligent Gap Filling)과 같이 고대역을 주파수 영역으로 변환하여 복잡한 특징 추출 과정을 거쳐 이를 바탕으로 사라지거나 손상된 고대역을 복원한다. 본 논문에서는 딥러닝 모델 중 오토인코더(Autoencoder)를 바탕으로 1차원 합성곱 신경망(CNN, Convolutional Neural Network)들의 잔차 연결을 활용하여 복잡한 사전 전처리 과정 없이 일정한 길이의 시간 영역 신호를 입력시켜 대역 확장 시킨 음향 신호를 출력하는 모델을 제안한다. 또한 음성 영역에 제한되지 않는 음악을 포함한 여러 종류의 음원을 포함하는 데이터셋에 훈련시켜도 손상된 고대역을 복원할 수 있음을 확인하였다.

Keywords

References

  1. M. Dietz, L. Liljeryd, K. Kjorling, and O. Kunz, "Spectral Band Replication, a Novel Approach in Audio Coding," in Audio Engineering Society 112th Convention, p.553, 2002.
  2. Volodymyr K., S. Zayd Enam, Stefano E., "Audio Super Resolution using Neural Networks" Presented at the 5th International Conference on Learning Representations(ICLR), 2017, arXiv: 1708.00853v1
  3. Yu Gu, Z. Ling, Li-Rong Dai, "Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks," INTERSPEECH, 2016. DOI: 10.21437/Interspeech.2016-678
  4. Ian Goodfellow et al., "Generative Adversarial Nets," Advances in Neural Information Processing Systems, vol.27, pp.2672-2680. 2014.
  5. Hyo-Jin Cho et al, Seong-Hyeon Shin, Seung Kwon Beack, Taejin Lee, Hochong Park, "Audio High-Band Coding based on Autoencoder with Side Information," Journal of Broadcast Engineering (JBE), Vol.24, No.3, pp.387-394, 2019. DOI: 10.5909/JBE.2019.24.3.387
  6. B. Pramod, T. Massimiliano, E. Nicholas, "Artificial Bandwidth Extension with Memory Inclusion Using Semi-supervised Stacked Auto-encoders," INTERSPEECH, pp.1185-1189, 2018. DOI: 10.21437/Interspeech.2018-2213
  7. Olaf R., Philipp F., Thomas B., "U-Net: Convolutional Networks for Biomedical Image Segmentation," Medical Image Computing and Computer-Assisted Intervention(MICCAI), pp. 234-241, 2015.
  8. C. Szegedy et al., "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, 2015. DOI: 2015, 10.1109/CVPR.2015.7298594.
  9. Sugn K. Visvesh S., "Bandwidth Extension on Raw Audio via Generative Adversarial Networks," 2019, arXiv:1903.09027
  10. W. Shi et al., "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1874-1883, 2016. DOI: 10.1109/CVPR.2016.207.
  11. Veaux, Christophe; Yamagishi, Junichi; MacDonald, Kirsten. "CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit," University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2017. DOI: 10.7488/ds/1994
  12. Diederik P Kingma, Max Welling, "Auto-Encoding Variational Bayes," 2014, arXiv preprint arXiv:1312.6114