그림 1. 오토 인코더의 기본 구조 Fig. 1. Basic structure of autoencoder
그림 2. 제안하는 오토 인코더 구조 Fig. 2. Structure of the proposed autoencoder
그림 3. GLU 구조 Fig. 3. GLU structure
그림 4. Latent 벡터 X 값의 2차원 분포도 Fig. 4. 2D scatter diagram of latent vector X
그림 5. 평가 데이터의 스펙트로그램 (a) 원본 신호, (b) 제안 방법으로 복원한 신호, (c) SBR로 복원한 신호 Fig. 5. Spectrogram of test data (a) original, (b) decoded signal by proposed method and (c) decoded signal by SBR
그림 6. MUSHRA 청취 평가 결과 Fig. 6. Result of MUSHRA test
표 1. 제안하는 방법에서 사용하는 신경망의 세부 구조 Table 1. Detail of network structure in the proposed method
References
- ISO/IEC 11172-3, "Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s - Part 3," 1993.
- M. Dietz, L. Liljeryd, K. Kjorling, and O. Kunz, "Spectral band replication, a novel approach in audio coding," 112th Conv. Audio Eng. Soc., May 2002.
- C. R. Helmrich, et al., "Spectral envelope reconstruction via IGF for audio transform coding," Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Brisbane, Australia, pp. 389-393, 2015.
- L. Jiang, R. Hu, X. Wang, W. Tu, and M. Zhang, "Nonlinear prediction with deep recurrent neural networks for non-blind audio bandwidth extension," China Communication, vol. 15, no. 1, pp. 72-85. Jan. 2018. https://doi.org/10.1109/cc.2018.8290807
- K. Schmidt and B. Edler, "Blind bandwidth extension based on convolutional and recurrent deep neural networks," Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Calgary, Canada, pp. 5444-5448, 2018.
- G. E. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, 313.5786, pp. 504-507, 2006. https://doi.org/10.1126/science.1127647
- Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, 521.7553, pp. 436-444, 2015. https://doi.org/10.1038/nature14539
- Y. N. Dauphin, et al., "Language modeling with gated convolutional networks," Proc. of the 34th Int. Conf. on Machine Learning, vol 70, Sydney, Australia, pp. 933-941, 2017.
- D. P. Kingma and J. L. Ba, "Adam: A method for stochastic optimization," Proc. of Int. Conf. on Learning Representation, San Diego, USA, 2015.
- C. Veaux, et al., "Superseded-CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit," 2016.
- M. Goto, "Development of the RWC music database," Proc. of Int. Congress on Acoustics, vol. 1, pp. 553-556, April 2004.
- ISO/IEC JTC1/SC29/WG11 N9927, "Workplan for subjective testing of Unified Speech and Audio Coding proposals," April 2008.
- S. Beack, et al., "Single-mode-based Unified Speech and Audio Coding by extending the linear prediction domain coding mode," ETRI Journal, vol. 39, no. 3, pp. 310-318, 2017. https://doi.org/10.4218/etrij.17.0116.0397
- ITU-R BS.1534-3, "Method for the subjective assessment of intermediate quality level of audio systems," 2015.