References
- X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, "U2-net: Going deeper with nested u-structure for salient object detection," Pattern Recognition, vol. 106, p. 107404, April 2020. doi: http s://doi.org/10.1016/j.patcog.2020.107404
- S. Zhao, T. H. Nguyen, and B. Ma, "Monaural speech enhancement with complex convolutional block attention module and joint time frequency losses," in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6648-6652, 2021.
- X. Hao, X. Su, R. Horaud, and X. Li, "Fullsubnet: A full-band and subband fusion model for real-time single-channel speech enhancement," in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6633-6637, 2021.
- X. Xiang, X. Zhang, and H. Chen, "A nested u-net with self-attention and dense connectivity for monaural speech enhancement," IEEE Signal Processing Letters, vol. 29, pp. 105-109, 2022. doi: https://doi.org/10.1109/LSP.2021.3128374
- S.-R. Hwang, S.-W. Park, and Y.-C. Park, "Performance comparison evaluation of real and complex networks for deep neural network-based speech enhancement in the frequency domain." The Journal of the Acoustical Society of Korea, vol. 41, no. 1, pp. 30-37, 2022. doi: http s://doi.org/10.7776/ASK.2022.41.1.030
- Y. Xian, Y. Sun, W. Wang, and S. M. Naqvi, "A multi-scale feature recalibration network for end-to-end single channel speech enhancement," IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 1, pp. 143-155, 2021. doi: https://doi.org/10.1109/JSTSP.2020.3045846
- H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y. Iwamoto, X. Han, Y.- W. Chen, and J. Wu, "UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation." in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055-1059, 2020.
- H.-S. Choi, J-H Kim, J. Huh, A. Kim, J.-W. Ha, and K. Lee, "Phase-aw are speech enhancement with deep complex u-net," Proc. ICLR. 2019. doi: https://doi.org/10.48550/arXiv.1903.03107
- H. Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708, 2017.
- J. W. Lyons, DARPA TIMIT acoustic-phonetic continuous speech corpus, 1993.
- E. Vincent, J. Barker, S. Watanabe, J. Le Roux, F. Nesta, and M. Matassoni, "The second 'chime'speech separation and recognition challenge: Datasets, tasks and baselines," in IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, pp. 126-130, 2013.
- J. Barker, R. Marxer, E. Vincent, and S. Watanabe, "The third 'chime' speech separation and recognition challenge: Dataset, taskand baselines," in IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, pp. 504-511, 2015.
- A. Varga and H. J. M. Steeneken, "Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech commun., vol. 12, no. 3, p. 247-251, 1993. doi: https://doi.org/10.1016/0167-63 93(93)90095-3
- ETSI, 202 396-1: Speech quality performance in the presence of backgr ound noise, 2009.
- Y. Hu and P. C. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 1, pp. 229-238, 2008. doi: https://doi.or g/10.1109/TASL.2007.911054