Acknowledgement
본 연구는 2024년도 연세대학교 인공지능 대학원 AI창의자율연구프로그램 지원비를 받아 수행된 연구임.
References
- J. Yi, C. Wang, J. Tao, X. Zhang, C. Y. Zhang, and Y. Zhao, "Audio Deepfake Detection: A Survey." ArXiv (Cornell University), 28 Aug. 2023, https://doi.org/10.48550/arxiv.2308.14970
- K. H. Jung and C. H. Kim,"Beware of Voice Cloning: Deep Voice Crime Steals 400 Billion Won." Moneytoday, 11 Feb. 2023, news.mt.co.kr/mtview.php?no=2023020913433930492.
- J. W. Jung et al., "AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 23 May 2022, https://doi.org/10.1109/icassp43922.2022.9747766.
- A. Hamza et al., "Deepfake Audio Detection via MFCC Features Using Machine Learning." IEEE Access, Vol.10, pp.134018-134028, 2022, https://doi.org/10.1109. https://doi.org/10.1109
- M. Lataifeh and A. Elnagar, "Ar-DAD: Arabic Diversified Audio Dataset," Data in Brief, Nov. pp.106503, 2020, https://doi.org/10.1016/j.dib.2020.106503.
- C. Borrelli, P. Bestagini, F. Antoacci, A. Sarti, and S. Tubaro, "Synthetic Speech Detection through Short-Term and Long-Term Prediction Traces," EURASIP Journal on Information Security, Vol. No.1, 6 Apr. 2021, https://doi.org/10.1186/s13635-021- 00116-3.
- A. K. Singh and P. Singh, "Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics," 2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), Sept. 2021, https://doi.org/10.1109/mipr51284.2021.00076.
- A. Chintha et al., "Recurrent Convolutional Structures for Audio Spoof and Video Deepfake Detection," IEEE Journal of Selected Topics in Signal Processing, Vol.14, No.5, pp.1024-1037, 2020, https://doi.org/10.1109/jstsp.2020.2999185.
- X. Liu, M. Liu, L. Wang, K. A. Lee, H. Zhang, and J. Dang, "Leveraging Positional-Related Local-Global Dependency for Synthetic Speech Detection," 4 June 2023, https://doi.org/10.1109/icassp49357.2023.10096278.
- H. Tak, J. W. Jung, J. Patino, M. Kamble, M. Todisco, and N. Evans, "End-To-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection," ArXiv (Cornell University), 1 Jan. 2021, https://doi.org/10.48550/arxiv.2107.12710.
- H. Tak, J. W. Jung, J. Patino, M. Todisco, and N. Evans, "Graph Attention Networks for Anti-Spoofing." ArXiv (Cornell University), 30 Aug. 2021, https://doi.org/10.21437/interspeech.2021-993.
- J. W. Jung, S. B. Kim, H. J. Shim, and J. H. Kim, and H. J. Yu, "Improved RawNet with Feature Map Scaling for Text-Independent Speaker Verification Using Raw Waveforms." ArXiv (Cornell University), 25 Oct. 2020, https://doi.org/10.21437/interspeech.2020-1011.
- H. Tak, J. Patino, M. Todisco, A. Nautsch, N. Evans, and A. Larcher, "End-To-End Anti-Spoofing with RawNet2." HAL (Le Centre Pour La Communication Scientifique Directe), 6 June 2021, https://doi.org/10.1109/icassp39728.2021.9414234.
- X. Wang et al., "Heterogeneous Graph Attention Network," The World Wide Web Conference, 13 May 2019, https://doi.org/10.1145/3308558.3313562.
- P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, "Graph Attention Networks," arXiv (Cornell University), Feb. 2018, https://doi.org/10.48550/arXiv.1710.10903.
- J. Hu, L. Shen, S. Albanie, G. Sun, E. Wu, "Squeeze-and-Excitation Networks," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, https://doi.org/10.48550/arXiv.1709.01507.
- P. Duffer, M. Schmitt, and H. Schutze, "Position Information in Transformers: An Overview," Computational Linguistics, Vol.48, No.3, pp.733-763, 2022, https://doi.org/10.1162/coli_a_00445.
- A. Vaswani et al., "Attention is All you Need," arXiv (Cornell University), Vol.30, pp.5998-6008, 2017. https://doi.org/10.48550/arXiv.1706.03762.
- X. Wang et al., "ASVspoof 2019: A Large-Scale Public Database of Synthesized, Converted and Replayed Speech," ArXiv (Cornell University), 4 Nov. 2019, https://doi.org/10.48550/arxiv.1911.01601.
- T. Kinnunen et al., "T-DCF: A Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification," Odyssey 2018 the Speaker and Language Recognition Workshop, 26 June 2018, www.isca-speech.org/archive/Odyssey_2018/pdfs/68.pdf, https://doi.org/10.21437/odyssey.2018-44.
- X. Wang and J. Yamagishi, "A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection," ArXiv (Cornell University), 30 Aug. 2021, https://doi.org/10.21437/interspeech.2021-702.