과제정보
이 과제는 부산대학교 기본연구지원사업(2년)에 의하여 연구되었음.
참고문헌
- Allen, Jont B, and Lawrence R Rabiner (1977). A unified approach to short-time Fourier analysis and synthesis, Proceedings of the IEEE, 65, 1558-1564. https://doi.org/10.1109/PROC.1977.10770
- Cochran WT, Cooley JW, Favin DL et al. (1967). What is the fast Fourier transform?, Proceedings of the IEEE, 55, 1664-1674. https://doi.org/10.1109/PROC.1967.5957
- Dehak N, Kenny PJ, Dehak R, Dumouchel P, and Ouellet P (2010). Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, 19, 788-798. https://doi.org/10.1109/TASL.2010.2064307
- Deng J, Guo J, Xue N, and Zafeiriou S (2019). Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 4690-4699.
- Desplanques B, Thienpondt J, and Demuynck K (2020). Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification, In Interspeech, 2020, Available from: arXiv preprint arXiv:2005.07143
- Gao SH, Cheng MM, Zhao K, Zhang XY, Yang MH, and Torr P (2019). Res2net: A new multi-scale backbone architecture, IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 652-662. https://doi.org/10.1109/TPAMI.2019.2938758
- Garcia-Romero D and Espy-Wilson CY (2011). Analysis of i-vector length normalization in speaker recognition systems, Twelfth Annual Conference of the International Speech Communication Association, 2011, 249-252.
- He K, Zhang X, Ren S, and Sun J (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 770-778.
- Hu J, Li S, and Gang S (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 7132-7141.
- Kingma DP and Ba J (2014). Adam: A method for stochastic optimization, In International Conference on Learning Representations (ICLR), 2015, Available from: arXiv preprint arXiv:1412.6980
- Krizhevsky A (2009). Learning multiple layers of features from tiny images (Technical report), University of Toronto, Toronto.
- Nagrani A, Chung JS, Xie W, and Zisserman A (2020). Voxceleb: Large-scale speaker verification in the wild, Computer Speech & Language, 60, 101027.
- Okabe K, Takafumi K, and Koichi S (2018). Attentive statistics pooling for deep speaker embedding, In Proc. Interspeech, 2018, 2252-2256, Available from: arXiv preprint arXiv:1803.10963
- Park DS, Chan W, Zhang Y, Chiu CC, Zoph B, Cubuk ED, and Le QV (2019). Specaugment: A simple data augmentation method for automatic speech recognition, Communications for Statistical Applications and Methods, 27, 431-443. https://doi.org/10.21437/Interspeech.2019-2680
- Peddinti V, Povey D, and Khudanpur S (2015). A time delay neural network architecture for efficient modeling of long temporal contexts, Sixteenth annual conference of the international speech communication association.
- Reynolds DA, Thomas FQ, and Robert BD (2000). Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, 10, 19-41. https://doi.org/10.1006/dspr.1999.0361
- Viikki O and Kari L (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition, Speech Communication, 25, 133-147. https://doi.org/10.1016/S0167-6393(98)00033-8