과제정보
이 성과는 2023년도 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (No. RS-2023-00208284).
참고문헌
- Abdel-Hamid O, Mohamed AR, Jiang H, Deng L, Penn G, and Yu D (2014). Convolutional neural networks for speech recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 1533-1545. https://doi.org/10.1109/TASLP.2014.2339736
- Brown JC (1991). Calculation of a constant Q spectral transform, The Journal of the Acoustical Society of America, 89, 425-434. https://doi.org/10.1121/1.400476
- Chapelle O, Weston J, Bottou L, and Vapnik V (2000). Vicinal risk minimization, Advances in Neural Information Processing Systems, 13, Cambridge MA, USA.
- Cheng X, Xu M, and Zheng TF (2019). Replay detection using CQT-based modified group delay feature and ResNeWt network in ASVspoof 2019. In Proceedings of 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China, 540-545.
- Choi HJ and Kwak IY (2021). Data augmentation in voice spoofing problem, The Korean Journal of Applied Statistics, 34, 449-460. https://doi.org/10.5351/KJAS.2021.34.3.449
- Delgado H, Todisco M, Sahidullah M, Evans N, Kinnunen T, Lee KA, and Yamagishi J (2017). ASVspoof 2017 Version 2.0: Meta-data analysis and baseline enhancement, Odyssey 2018-The Speaker and Language Recognition Workshop.
- DeVries T and Taylor GW (2017). Improved regularization of convolutional neural networks with Cutout, Available from: arXiv preprint arXiv
- Dua M, Jain C, and Kumar S (2021). LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems, Journal of Ambient Intelligence and Humanized Computing, 13, 1985-2000. https://doi.org/10.1007/s12652-021-02960-0
- Fong R and Vedaldi A (2019). Occlusions for effective data augmentation in image classification. In Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 4158-4166.
- Goodfellow I, Warde-Farley D, Mirza M, et al. (2013). Maxout networks, In Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, Georgia, USA, 1319-1327.
- Haut JM, Paoletti ME, Plaza J, Plaza A, and Li J (2019). Hyperspectral image classification using random occlusion data augmentation, IEEE Geoscience and Remote Sensing Letters, 16, 1751-1755. https://doi.org/10.1109/LGRS.2019.2909495
- Hsu CY, Lin LE, and Lin CH (2021). Age and gender recognition with random occluded data augmentation on facial images, Multimedia Tools and Applications, 80, 11631-11653. https://doi.org/10.1007/s11042-020-10141-y
- Ioffe S and Szegedy C (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning, 37, 448-456.
- Yang J, Das RK, and Li H (2018). Extended constant-Q cepstral coefficients for detection of spoofing attacks. In Proceedings of 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Honolulu, HI, USA, 1024-1029.
- Ke Y, Hoiem D, and Sukthankar R (2005). Computer vision for music identification. In Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, USA, 597-604.
- Kim G, Han DK, and Ko H (2021). Specmix: A mixed sample data augmentation method for training with time-frequency domain features, Available from: arXiv preprint arXiv:2108.03020
- Kinnunen T, Delgado H, Evans N, et al. (2020). Tandem assessment of spoofing countermeasures and automatic speaker verification: Fundamentals, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2195-2210. https://doi.org/10.1109/TASLP.2020.3009494
- Krizhevsky A, Sutskever I, and Hinton GE (2012). Imagenet classification with deep convolutional neural networks, Communications of the ACM, 60, 84-90. https://doi.org/10.1145/3065386
- Lavrentyeva G, Novoselov S, Malykh E, Kozlov A, Kudashev O, and Shchemelinin V (2017). Audio replay attack detection with deep learning frameworks, In Interspeech 2017 (pp. 82-86).
- Lavrentyeva, G, Novoselov S, Tseren A, Volkova M, Gorlanov A, and Kozlov A (2019). STC antispoofing systems for the ASVspoof2019 challenge, Interspeech 2019, 1033-1037.
- Madhu A and Kumaraswamy S (2019). Data augmentation using generative adversarial network for environmental sound classification. In Proceedings of 27th IEEE European Signal Processing Conference (EUSIPCO), A Coruna, Spain, 1-5.
- Nam H, Kim SH, and Park YH (2022). FilterAugment: An acoustic environmental data augmentation method. In Proceedings of ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore,4308-4312.
- Nagarsheth P, Khoury E, Patil K, and Garland M (2017). Replay attack detection using DNN for channel discrimination, Interspeech 2017, 97-101.
- Park DS, Chan W, Zhang Y, Chiu C-C, Zoph B, Cubuk ED, and Le QV (2019). SpecAugment: A simple data augmentation method for automatic speech recognition, Available from: arXiv preprint arXiv:1904.08779
- Shim HJ, Jung JW, Kim JH, and Yu HJ (2022). Attentive max feature map and joint training for acoustic scene classification. In Proceedings of ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 1036-1040.
- Singh KK, Yu H, Sarmasi A, Pradeep G, Lee YJ (2018). Hide-and-Seek: A data augmentation technique for weakly-supervised localization and beyond, Available from: arXiv preprint arXiv:1811.02545
- Sukthankar R, Ke Y, and Hoiem D (2006). Semantic learning for audio applications: A computer vision approach. In Proceedings of 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06), New York, NY, USA, 112-112.
- Tomilov A, Svishchev A, Volkova M, Chirkovskiy A, Kondratev A, and Lavrentyeva G (2021). STC Antispoofing Systems for the ASVspoof2021 Challenge. In Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, (pp. 61-67).
- Wei S, Zou S, and Liao F (2020). A comparison on data augmentation methods based on deep learning for audio classification, Journal of Physics: Conference Series, 1453, 012085.
- Witkowski M, Kacprzak S, Zelasko P, Kowalczyk K, and Galka J (2017). Audio replay attack detection using high-frequency features, Interspeech 2017, 27-31.
- Wu X, He R, Sun Z, and Tan T (2018). A light cnn for deep face representation with noisy labels, IEEE Transactions on Information Forensics and Security, 13, 2884-2896. https://doi.org/10.1109/TIFS.2018.2833032
- Wu Z, Kinnunen T, Evans N, Yamagishi J, Hanilci C, Sahidullah Md, and Sizov A (2015). ASVspoof 2015: The first automatic speaker verification spoofing and countermeasures challenge, Sixteenth Annual Conference of the International Speech Communication Association, 2037-2041.
- Yun S, Han D, Chun S, Oh SJ, Yoo Y, and Choe J (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV), Seoul, Korea, 6023-6032.
- Zhang C, Yu C, and Hansen JH (2017). An investigation of deep-learning frameworks for speaker verification antispoofing, IEEE Journal of Selected Topics in Signal Processing, 11, 684-694. https://doi.org/10.1109/JSTSP.2016.2647199
- Zhang H, Cisse M, Dauphin YN, and Lopez-Paz D (2017). Mixup: Beyond empirical risk minimization, Available from: arXiv preprint arXiv
- Zhong Z, Zheng L, Kang G, Li S, and Yang Y (2020). Random erasing data augmentation, In Proceedings of the AAAI conference on artificial intelligence, Hilton New York Midtown, NY, USA, 13001-13008.