Acknowledgement
이 논문은 한국연구재단 지역대학 우수과학자 지원 사업(NRF-2020R1I1A3052136)에 의해 연구되었음
References
- A. J. Hunt and A. W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database", Proceedings of the IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP) 1996, pp. 373-376, 1996, DOI: 10.1109/ICASSP.1996.541110
- C. H. Kwon, "Performance comparison of state-of-the-art vocoder technology based on deep learning in a Korean TTS system", The Journal of the Convergence on Culture Technology (JCCT), Vol. 6, No. 2, pp. 509-514, 2020, DOI: 10.17703/JCCT.2020.6.2.509
- M. S. Jo and C. H. Kwon, "A multi-speaker speech synthesis system using x-vector", The Journal of the Convergence on Culture Technology (JCCT), Vol. 7, No. 4, pp. 675-681, 2021, DOI:10.17703/JCCT.2021.7.4.675
- Y. Jia, Y. Zhang, R. Weiss, et al., "Transfer learning from speaker verification to multispeaker text-to-speech synthesis", ArXiv:https://arxiv.org/pdf/1806.04558.pdf, Jan. 2019
- A. Papir, I. Wan, Q. Wang, et al., "Generalized end-to-end loss for speaker verification", ArXiv: https://arxiv.org/pdf/1710.10467.pdf, Nov. 2020
- J. Shen, R. Pang, R. J. Weiss, et al., "Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions", Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018, pp. 4779-4783, 2018, DOI: 10.1109/ICASSP.2018.8461368
- E. Elsen, N. Kalchbrenner, K. Simonyan, et al,, "Efficient neural audio synthesis", ArXiv. https://arxiv.org/pdf/1802.08435.pdf, June 2018.
- K. H. Kim, "A study on multi-speaker TTS using speaker recognition technology", Master Thesis, Graduate School of Daejeon Univ. 2022
- C. Jemine, "Real-time voice cloning", Master Thesis, Liege University, 2019
- Real-time Voice Cloning, https://github.com/CorentinJ/Real-Time-Voice-Cloning
- A. Nagrani, J. S. Chung, A. Zisserman, "VoxCeleb: A large-scale speaker identification dataset", Proceedings of the Interspeech 2017, pp. 2616-2620, 2017, DOI:10.1109/ICASSP.2018.8461375
- H. Zen, V. Dang, R. Clark, et al., "LibriTTS: A corpus derived from LibriSpeech for text-to-speech", Proceedings of the Interspeech 2019, pp. 1526-1530, 2019, DOI:10.21437/Interspeech.2019-2441
- J. W. Ha, K. H. Nam, J. Kang, et al., "ClovaCall: Korean goal-oriented dialog speech corpus for automatic speech recognition of contact centers", Proceedings of the Interspeech 2020, pp. 409-413, 2020, DOI:10.21437/Interspeech.2020-1136
- Zeroth-Korean, Korean open source speech corpus for speech recognition by Zeroth project, https://www.openslr.org/40/
- 한국전자통신연구원, 음성 학습 데이터, https://aiopen.etri.re.kr/service_dataset.php?category=voice
- Korean Forced Aligner, https://github.com/hyung8758/Korean_FA
- D. Povey, A. Ghoshal, G. Boulianne, et al., "The Kaldi speech recognition toolkit", Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2011, 2011
- 한국지능정보사회진흥원, AI Hub, 한국어 자유 발화 음성 데이터, https://aihub.or.kr/aidata/105